Evaluation

Philipp Koehn

doi:10.1017/CBO9780511815829.009

8 - Evaluation

from II - Core Methods

Published online by Cambridge University Press: 05 June 2012

Philipp Koehn

Show author details

Philipp Koehn: Affiliation:
University of Edinburgh

Book contents

Get access

Summary

How good are statistical machine translation systems today? This simple question is very hard to answer. In contrast to other natural language tasks, such as speech recognition, there is no single right answer that we can expect a machine translation system to match. If you ask several different translators to translate one sentence, you will receive several different answers.

Figure 8.1 illustrates this quite clearly for a short Chinese sentence. All ten translators came up with different translations for the sentence. This example from a 2001 NIST evaluation set is typical: translators almost never agree on a translation, even for a short sentence.

So how should we evaluate machine translation quality? We may ask human annotators to judge the quality of translations. Or, we may compare the similarity of the output of a machine translation system with translations generated by human translators. But ultimately, machine translation is not an end in itself. So, we may want to consider how much machine-translated output helps people to accomplish a task, e.g., get the salient information from a foreign-language text, or post-edit machine translation output for publication.

This chapter presents a variety of evaluation methods that have been used in the machine translation community. Machine translation evaluation is currently a very active field of research, and a hotly debated issue.

Information

Type: Chapter
Information: Statistical Machine Translation , pp. 217 - 246

DOI: https://doi.org/10.1017/CBO9780511815829.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.