4. Clause combining 2

Anuncio
Informática Aplicada a la Traducción
Building and Using
Translation Memories
4. Building a Parallel Corpus
4. Clause combining 2
4.1 What is a Parallel Corpus?
• A “Parallel Corpus” consists of a set of sentences (or other
segments of text) in one language, each linked to the
translation of that sentence into another language.
1
4. Building a Parallel Corpus
4. Clause combining 2
4.1 What is a Parallel Corpus?
• In the context of a Translation Memory system, such
parallel corpora are called “Translation Memories”.
• In this class and the next, we will explore:
– Various tools for converting two texts into a
parallel corpus.
– Various uses for parallel corpora apart from TMs.
4. Building a Parallel Corpus
4. Clause combining 2
4.2 Software for Building a parallel corpus
• There are various tools available for converting two texts into
a parallel corpus.
1.
Manual editing in a spreadsheet (e.g., Microsoft Excel)
2.
Commercial Sentence Alignment systems
a)
WinAlign (part of Trados, thus expensive, but good)
http://blog.quillslanguage.com/2008/11/trados-winalign/
b) DejaVu contains a sentence aligner
3.
Free/Open Source systems
a)
Microsoft Bilingual Sentence Aligner
b) LF Aligner http://sourceforge.net/projects/aligner/
c)
Bitext2tmx http://bitext2tmx.sourceforge.net
d) More at: http://www.cse.unt.edu/~rada/wa/#softwareSA
2
4. Building a Parallel Corpus
4. Clause combining 2
4.2 Software for Building a parallel corpus
WinAlign:
4. Building a Parallel Corpus
4. Clause combining 2
4.2 Software for Building a parallel corpus
DejaVu:
3
4. Building a Parallel Corpus
4. Clause combining 2
4.2 Software for Building a parallel corpus
Automatic Sentence Alignment
• Given two texts, the system works out which segment of text
1 corresponds to which sentence of text 2:
?
When iPhone is locked, nothing
happens if you touch the
screen.
?
Bloquear el iPhone: Pulse el
botón de encendido/apagado.
Cuando el iPhone está
bloqueado, no ocurre nada si
toca la pantalla.
?
El iPhone puede seguir
recibiendo llamadas, mensajes
de texto y otras actualizaciones.
4. Building a Parallel Corpus
4. Clause combining 2
4.2 Software for Building a parallel corpus
Automatic Sentence Alignment
Basically, find the sentence close to the source sentence which
contains the most words which translate the words of the
source sentence (using a translation dictionary)
When
iPhone
is
Locked
nothing
happens
if
you
touch
the
screen.
Bloquear
el
iPhone
Pulse
el
botón
de
encendido
apagado
3 out of 11 English words
present in Spanish sentence
3 out of 9 Spanish words
present in English sentence
Thus, weak match (30%)
4
4. Building a Parallel Corpus
4. Clause combining 2
4.2 Software for Building a parallel corpus
Automatic Sentence Alignment
Basically, find the sentence close to the source sentence which
contains the most words which translate the words of the
source sentence (using a translation dictionary)
When
iPhone
is
Locked
nothing
happens
if
you
touch
the
screen.
Cuando
el
iPhone
está
bloqueado,
no
ocurre
nada
si
toca
la
pantalla.
10 out of 11 English words
present in Spanish sentence
10 out of 12 Spanish words
present in English sentence
Thus, strong match (87%)
Aligning a Parallel Corpus
4. Clause combining 2
4.3 Various uses for parallel corpora apart from TMs
•
•
Aligned translation corpora are the input to Translation
Memory systems
However, they can also be used in other ways by the
translator
–
–
In place of a terminological dictionary: look up a term in one
language to see how it is translated into the other language, also
showing the context of use of the term (i.e., how it is used in a
sentence as a whole).
As a source of word frequency data: One can ask corpus
management software to tell you which words are “key” to this
corpus (most important to this kind of text). These are terms that
you should probably put in your translation lexicon.
5
Descargar