Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-07T18:24:42.581Z Has data issue: false hasContentIssue false

Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

Published online by Cambridge University Press:  04 October 2021

Mohamed Chebel
Affiliation:
LIPAH Research Laboratory, Faculty of Sciences of Tunis, Tunis EL Manar University, Tunisia
Chiraz Latiri*
Affiliation:
LIPAH Research Laboratory, Faculty of Sciences of Tunis, Tunis EL Manar University, Tunisia
Eric Gaussier
Affiliation:
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
*
*Corresponding author. E-mail: chiraz.latiri@gnet.tn

Abstract

Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.

Information

Type
Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable