Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-29T05:26:01.861Z Has data issue: false hasContentIssue false

Meemi: A simple method for post-processing and integrating cross-lingual word embeddings

Published online by Cambridge University Press:  13 October 2021

Yerai Doval*
Affiliation:
Grupo COLE, Escola Superior de Enxeñaría Informática, Universidade de Vigo, Ourense Vigo, Spain
Jose Camacho-Collados
Affiliation:
School of Computer Science and Informatics, Cardiff University, Cardiff CF24 3AA, UK
Luis Espinosa-Anke
Affiliation:
School of Computer Science and Informatics, Cardiff University, Cardiff CF24 3AA, UK
Steven Schockaert
Affiliation:
School of Computer Science and Informatics, Cardiff University, Cardiff CF24 3AA, UK
*
*Corresponding author. E-mail: yerai.doval@uvigo.es
Rights & Permissions [Opens in a new window]

Abstract

Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces and the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. Step by step integration of two monolingual embedding spaces: (1) obtaining isolated monolingual spaces, (2) aligning these spaces through an orthogonal linear transformation and (3) map both spaces using an unconstrained linear transformation learned on the averages of translation pairs.

Figure 1

Table 1. Precision at k ($P@K$) performance of different cross-lingual embedding models in the bilingual dictionary induction task

Figure 2

Table 2. Dictionary induction results for distant language pairs using FastText pre-trained monolingual embeddings as input using precision at k ($P@K$)

Figure 3

Table 3. Cross-lingual word similarity results in terms of Pearson (r) and Spearman ($\rho$) correlation. Languages codes: English-EN, Spanish-ES, Italian-IT, German-DE and Farsi-FA

Figure 4

Table 4. Cross-lingual word similarity results for distant language pairs using FastText pre-trained monolingual embeddings as input in terms of Pearson (r) and Spearman ($\rho$) correlation. Language codes: English-EN, Arabic-AR, Hebrew-HE, Estonian-ET, Polish-PL and Chinese-ZH

Figure 5

Table 5. Monolingual word similarity results in terms of Pearson (r) and Spearman ($\rho$) correlation

Figure 6

Table 6. Cross-lingual hypernym discovery results in terms of Mean Reciprocal Rank (MRR), Mean Average Precision (MAP) and precision at 5 ($P@5$). In this case, VecMap = VecMap$_{\text{ortho}}$

Figure 7

Table 7. Accuracy, or the number of correct classifications (entailment, contradiction or neutral) over the total number of tests instances, on the XNLI task using different cross-lingual embeddings as features

Figure 8

Table 8. Word translation examples from English and Spanish, comparing VecMap with the bilingual and multilingual variants of Meemi. For each source word, we show its five nearest cross-lingual synonyms. Bold translations are correct, according to the source test dictionary (cf. Section 5.1.1)

Figure 9

Figure 2. Absolute improvement (in terms of Pearson correlation percentage points) by applying the Meemi over the two base orthogonal models VecMap and MUSE on the cross-lingual word similarity task, with different training dictionary sizes. As data points in the X-axis we selected 100, 1000, 3000, 5000 and 8000 word pairs in the dictionary.

Figure 10

Table 9. Dictionary induction results obtained with the multilingual extension of Meemi over (VecMap$_\textrm{ortho}$) in terms of precision at k ($P@K$). The sequence in which source languages are added to the multilingual models is: Spanish, Italian, German, Finnish, Farsi and Russian (English is the target). The x indicates the use of the test language in each case (if the test language is already included, the following language in the sequence is added). We also include the scores of the original VecMap$_\textrm{ortho}$ as baseline