Example Based Machine Translation
Published on Nov 14, 2015
EBMT Example Based Machine Translation was most notably attributed to Nagao and his famous translation by Analogy paper in 1984. EBMT retrieves similar examples (pairs of source phrases, sentences , or texts and their translations) from a database examples, adapting the examples to translate new input.
Based on the intuition that humans make use of previously seen translation examples to translate unseen input. The EBMT system consists of two databases: An example database and a thesaurus and also three translation modules: analysis, example-based transfer, and generation.
Translation performed using database of examples extracted from corpora. There are four stages of work in EBMT, namely, example acquisition, example base management, example application and target sentence synthesis.The underlying principle for EBMT is as simple as this: remember everything translated in the past and use everything available to facilitate the translation of the next utterance. We know computers are the most fantastic machines to memorize such things as text pairs and their frequencies, and we thus have reason to believe that EBMT represents the MT approach with the greatest potential. It opens a wide area of research opportunities.
Basic Steps in Translation and MT Approaches
Basically there are 3 steps in machine translation:Analysis,Transfer and generation.Transfer is the core process,preceeding it we have analysis and succeeding it we have generation. Analysis is of 3 types:1.Morphological analysis (input text are classified as to POS(e.g. noun, verb, etc.) )2.syntactic analysis (identification of phrase structures etc.)3.semantic analysis (resolution of lexical and structural ambiguities) .
Transfer is of two types:1.Lexical transfer (selection of vocabulary equivalents)2.Structural (transformation of source text structures into target texts). Generation is of three types:Semantic generation,Syntactic generation,Morphological generation . Basically there are 2 types of approaches:RBMT(Rule-Based),Corpus-Based. In RBMT, the core process is mediated by bilingual dictionaries and rules for converting SL structures into TL structures, and/or by dictionaries and rules for deriving ‘intermediary representations’ from which output can be generated. The preceding stage of analysis interprets (surface) input SL strings into appropriate ‘translation units’ (e.g. canonical noun and verb forms) and relations (e.g. dependencies and syntactic units).
The succeeding stage of synthesis (or generation) derives TL texts from the TL structures or representations produced by the core ‘transfer’ (or ‘interlingual’) process.. In SMT, the core process involves a ‘translation model’ which takes as input SL words or word sequences (‘phrases’) and produces as output TL words or word sequences. The following stage involves a ‘language model’ which synthesises the sets of TL words in ‘meaningful’ strings which are intended to be equivalent to the input sentences.
In SMT the preceding ‘analysis’ stage is represented by the (trivial) process of matching individual words or word sequences of input SL text against entries in the translation model. More important is the essential preparatory stage of aligning SL and TL texts from a corpus and deriving the statistical frequency data for the ‘translation model’ (or adding statistical data from a corpus to a pre-existing ‘translation model’.) The monolingual ‘language model’ may or may not be derived from the same corpus as the ‘translation model’.
Essential for any translation – a consequence of the aim to maintain ‘meaning equivalence’ – is access to information about correspondences of vocabulary in the SL and the TL. The information contained in a database may be derived from a variety of resources (bilingual and monolingual texts, bilingual and monolingual dictionaries, grammars, thesauri, etc.). The quality of translation is improved as more examples are added to the database.During translation, the input sentence is matched against the example database and corresponding target language examples are recombined to produce a final translation. There are three tasks in EBMT:
- Matching fragments against existing examples
-Identifying the corresponding translation fragments
-Recombining them to give the target text
MT systems are EBMT systems if the core ‘transfer’ (or SL-TL conversion) process involves the matching of SL fragments (sentences, phrases, strings) from an input text, the matching of such fragments against a database of bilingual example texts (in the form of strings, templates, tree representations), and the extraction of equivalent TL fragments (as partial potential translations). The databases of EBMT systems are derived primarily from bilingual corpora of (mainly) human translations, and are pre-processed in forms appropriate for the matching and extraction processes performed during translation (i.e. ‘run-time’ processes). The processes of analysis (decomposition) and synthesis (recombination) are designed, respectively, to prepare input text for matching against the database and to produce text from database output.
Stages in EBMT:
In general, there are four stages of work in EBMT, namely, example acquisition, example base management, example application and target sentencesynthesis. Example acquisition is about how to acquire examples from parallel bilingual corpus (i.e., existing translation), and example base management is about how examples are stored and maintained. The example application concerns itself with how examples are used to facilitate translation,which involves the decomposition of an input sentence into examples andthe conversion of source texts into target texts in terms of existing translation. The sentence synthesis is to compose a target sentence by putting the converted examples into a smoothly readable order, aiming at enhancing the readability of the target sentence after conversion.
EBMT includes 4 stages
2.Example base management
4.Target sentence synthesis
Translators like Open MaTrEx,Cuneiform,Omega T uses EBMT
Verbalis uses EBMT
The Verbalis system is example-based and makes extensive use of analogical reasoning. As our powerful technology is able to capture the meaning of source documents and preserve this idiomatically in the target language, the quality of our results is far better than the 'gisting' level of many other approaches to automated translation. It offer a world-wide commercial service to translate technical, financial and software documentation from either: German to English or English to German.
The system has three core elements:
1.A database that stores complete sentences of translated material comprising a whole range of information that can be accessed by the translation system - this comprises sentence-level source and target-language examples (a representation that is richly annotated with information regarding clause, chunking configuration, morpho-syntax and translation relationships);
2.An analogical dictionary that attempts to model all the various semantic relationships into which lexical items can enter with one another - it contains the meaning relations between words in the domain being translated;
3.A recombination mechanism which, guided by the analogical dictionary, combines sub-components of two or more examples across the database.
Key points of differentiation from other approaches include:
1.High-quality, high-speed and high-volume translation
2.Knowledge base of examples
4.Ability to exploit partial matches in the selection process
Used for speech translation(DIPLOMAT)
Translating DVD subtitles(Research area)
.An introduction to machine translation", www.hutchinsweb.me.uk/IntroMT-4.pdf
.C Kit,"Example-Based Machine Translation New paradigm",www.personal.cityu.edu.hk/~ctckit/papers/EBMT-review-CUHK.pdf
[3HUTCHINS,"Towards a definition of example-based machine translation",citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.6595
More Seminar Topics:
Measuring Universal Intelligence,
Fast And Secure Protocol,
Case Based Reasoning System,
Mobile Number Portability,
Mobile Phone Cloning,
Nano Cars Into The Robotics,
Privacy Preserving Data Publishing,
Example Based Machine Translation,