Hostname: page-component-77f85d65b8-pkds5 Total loading time: 0 Render date: 2026-03-30T04:28:00.016Z Has data issue: false hasContentIssue false

Recent advances in machine translation using comparable corpora

Published online by Cambridge University Press:  15 June 2016

REINHARD RAPP
Affiliation:
University of Mainz e-mail: reinhardrapp@gmx.de
SERGE SHAROFF
Affiliation:
University of Leeds e-mail: s.sharoff@leeds.ac.uk
PIERRE ZWEIGENBAUM
Affiliation:
LIMSI, CNRS, Université Paris-Saclay e-mail: pz@limsi.fr
Rights & Permissions [Opens in a new window]

Abstract

This paper highlights some of the recent developments in the field of machine translation using comparable corpora. We start by updating previous definitions of comparable corpora and then look at bilingual versions of continuous vector space models. Recently, neural networks have been used to obtain latent context representations with only few dimensions which are often called word embeddings. These promising new techniques cannot only be applied to parallel but also to comparable corpora. Subsequent sections of the paper discuss work specifically targeting at machine translation using comparable corpora, as well as work dealing with the extraction of parallel segments from comparable corpora. Finally, we give an overview on the design and the results of a recent shared task on measuring document comparability across languages.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2016