Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-07T07:26:13.048Z Has data issue: false hasContentIssue false

Emerging trends: translationese

Published online by Cambridge University Press:  14 May 2025

Kenneth Church*
Affiliation:
Northeastern University, Boston, MA, USA
Boyang Li
Affiliation:
Nanyang Technological University, Singapore, Singapore
Peter Vickers
Affiliation:
Northeastern University, Boston, MA, USA
Shiran Dudy
Affiliation:
Northeastern University, Boston, MA, USA
Richard Yue
Affiliation:
Northeastern University, Boston, MA, USA
*
Corresponding author: Kenneth Church; Email: k.church@northeastern.edu
Rights & Permissions [Opens in a new window]

Abstract

Audits of multilingual resources are reporting shockingly poor quality: “less than 50% … acceptable quality.” There is too much translationese in too many of our multilingual resources, e.g., Wikipedia, XNLI, FLORES, WordNet. We view translationese as a form of noise that makes it hard to generalize from a benchmark based on translation to a real task of interest that does not involve translation. Worse, too much of this translationese is in the “wrong” direction. Directionality matters. Professional translators translate from their weaker language into their stronger language. Unfortunately, many of our resources translate in the other direction, from a stronger (higher-resource) language into a weaker (lower-resource) language. In Wikipedia, for example, there is more translation out of English than into English. We recommend more investments in high-quality data, and less in translation, especially in the “wrong” direction.

Information

Type
Emerging Trends
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Poor quality (based on Tables 11–14 of Caswell et al. (2021))

Figure 1

Table 2. Source text is more predictable (fluent) than target text (for these models)

Figure 2

Figure 1. For the first 10 values in D, source (black S) is usually above target (red T).

Figure 3

Figure 2. The pattern in Figure 1 holds over D. Ratio (S/D) is usually above 1 (red).

Figure 4

Figure 3. The trace depends on at least three factors: (a) model, (b) language, and (c) translationese. An analysis of variance (ANOVA) shows that (a) accounts for more of the variance than (b), and (b) accounts for more than (c).

Figure 5

Figure 4. There are small but significant differences in traces between texts in the source language and translationese. These differences are shown for many batches over nine languages using the NLLB model.

Figure 6

Table 3. Three test sets that were translated from English to other languages

Figure 7

Table 4. Examples of poorly translated Chinese in XNLI, followed by back-translations to English to demonstrate the poor quality of the Chinese

Figure 8

Table 5. Words with extreme VAD (Valance, Arousal, Dominance) values. Very large and very small values are highlighted in bold

Figure 9

Table 6. French version of Table 5 (assumes VAD values are universal)

Figure 10

Table 7. Some resources for languages are in Table 8. Semantic Scholar is a promising opportunity with less translationese than Wikipedia

Figure 11

Figure 5. High-resource languages have more pages in Wikipedia (left) and are more likely to be translated to other languages (right). Data is borrowed from Tables 7–8.

Figure 12

Table 8. Much of Wikipedia translates in the “wrong” direction from high-resource languages near the top of the table into low-resource languages near the bottom

Figure 13

Table 9. LID (cld3 Botha et al. (2017)) performs (too) well on FLORES (Goyal et al.2022) dev set (997 rows per language)