Hostname: page-component-6766d58669-7fx5l Total loading time: 0 Render date: 2026-05-20T21:23:36.428Z Has data issue: false hasContentIssue false

End-to-end statistical machine translation with zero or small parallel texts

Published online by Cambridge University Press:  15 June 2016

ANN IRVINE
Affiliation:
Johns Hopkins University e-mail: annirvine@gmail.com
CHRIS CALLISON-BURCH
Affiliation:
University of Pennsylvania e-mail: ccb@cis.upenn.edu

Abstract

We use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable