Skip to main content
×
Home
    • Aa
    • Aa

CO-graph: A new graph-based technique for cross-lingual word sense disambiguation

  • ANDRES DUQUE (a1), LOURDES ARAUJO (a1) and JUAN MARTINEZ-ROMO (a1)
Abstract
Abstract

In this paper, we present a new method based on co-occurrence graphs for performing Cross-Lingual Word Sense Disambiguation (CLWSD). The proposed approach comprises the automatic generation of bilingual dictionaries, and a new technique for the construction of a co-occurrence graph used to select the most suitable translations from the dictionary. Different algorithms that combine both the dictionary and the co-occurrence graph are then used for performing this selection of the final translations: techniques based on sub-graphs (communities) containing clusters of words with related meanings, based on distances between nodes representing words, and based on the relative importance of each node in the whole graph. The initial output of the system is enhanced with translation probabilities, provided by a statistical bilingual dictionary. The system is evaluated using datasets from two competitions: task 3 of SemEval 2010, and task 10 of SemEval 2013. Results obtained by the different disambiguation techniques are analysed and compared to those obtained by the systems participating in the competitions. Our system offers the best results in comparison with other unsupervised systems in most of the experiments, and even overcomes supervised systems in some cases.

Copyright
References
Hide All
Agirre E., and Soroa A. 2009. Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2009), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 33–41.
Agirre E., Lopez de Lacalle O., and Soroa A., 2014. Random walks for knowledge-based word sense disambiguation. Computational Linguistics 40 (1): 5784.
Apidianaki M. 2008. Translation-oriented word sense induction based on parallel corpora. In Proceedings of the 6th International Language Resources and Evaluation (LREC-08), Marrakech, Morocco, May. European Language Resources Association (ELRA).
Apidianaki M., 2009. Data-driven semantic analysis for multilingual wsd and lexical selection in translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2009), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 77–85.
Apidianaki M. 2013. Limsi: cross-lingual word sense disambiguation using translation sense clustering. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics.
Banea C., and Mihalcea R. 2011. Word sense disambiguation with multilingual features. In Proceedings of the Ninth International Conference on Computational Semantics (IWCS -2011), Association for Computational Linguistics, pp. 25–34.
Biemann C., 2006. Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the 1st Workshop on Graph Based Methods for Natural Language Processing, TextGraphs-1, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 73–80.
Blei D. M., Ng A. Y., and Jordan M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3: 9931022, March.
Brin S., and Page L. 1998. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, Elsevier Science Publishers B. V., pp. 107117.
Carpuat M., 2013. Nrc: a machine translation approach to cross-lingual word sense disambiguation (semeval-2013 task 10). In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 188–192.
Chan Y. S., Ng H. T., and Chiang D. 2007. Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), pp. 33–40.
Dandala B., Mihalcea R., and Bunescu R. 2013. Multilingual word sense disambiguation using wikipedia. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing.
Diab M. T. and Resnik P. 2002. An unsupervised method for word sense tagging using parallel corpora. In ACL, pp. 255–262.
Dijkstra E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1 (1): 269271.
Fellbaum C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
Fernandez-Ordonez E., Mihalcea R., and Hassan S. 2012. Unsupervised word sense disambiguation with multilingual representations. In LREC, pp. 847–851.
Guo W., and Diab M., 2010. Coleur and colslm: a wsd approach to multilingual lexical substitution, tasks 2 and 3 semeval 2010. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 129–133.
Ide N. and Veronis J., 1998. Word sense disambiguation: the state of the art. Computational Linguistics 24 : 140.
Ion R., and Tufis D., 2004. Multilingual word sense disambiguation using aligned wordnets. Romanian Journal of Information Science and Technology 7 (1–2): 183200.
Kazakov D., and Shahid A. R. 2010. Retrieving lexical semantics from multilingual corpora. In Polibits, pp. 25–28.
Kazakov D., and Shahid A. R. 2013. Using parallel corpora for word sense disambiguation. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2013), Shoumen, Bulgaria, INCOMA Ltd.
Koehn P. 2005. Europarl: a parallel corpus for statistical machine translation. In MT summit, volume 5.
Lefever E., and Hoste V., 2010a. Semeval-2010 task 3: cross-lingual word sense disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 15–20.
Lefever E., and Hoste V. 2010b. Construction of a benchmark data set for cross-lingual word sense disambiguation. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), Valletta, Malta, May. European Language Resources Association (ELRA).
Lefever E., and Hoste V., 2013. Semeval-2013 task 10: cross-lingual word sense disambiguation. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 158166.
Lefever E., Hoste V., and De Cock M., 2011. Parasense or how to use parallel corpora for word sense disambiguation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2 (HLT2011), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 317–322.
Màrquez L., Exsudero G., Martínez, D., and Rigau G. 2006. Supervised corpus-based methods for wsd. In Word Sense Disambiguation: Algorithms and Applications, vol. 33, pp. 167216. Text, Speech and Language Technology. Dordrecht, The Netherlands: Springer.
Martinez-Romo J., Araujo L., Borge-Holthoefer J., Arenas A., Capitán J. A., and Cuesta J. A. 2011. Disentangling categorical relationships through a graph of co-occurrences. Physical Review E 84: 046108, October.
Mihalcea R., 2005. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data iza ling. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-2005), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 411–418.
Mihalcea R. 2006. Knowledge-based methods for wsd. In Word Sense Disambiguation: Algorithms and Applications, vol. 33, pp. 107132. Text, Speech and Language Technology. Dordrecht, The Netherlands: Springer.
Navigli R., and Lapata M. 2010. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (4): 678692, April.
Navigli R., and Ponzetto S. P. 2010. Babelnet: building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 216–225.
Navigli R., and Ponzetto S. P., 2012. Joining forces pays off: multilingual joint word sense disambiguation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2012), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 1399–1410.
Och F. J., and Ney H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29 (1): 1951, March.
Pons P., and Latapy M., 2005. Computing communities in large networks using random walks. Lecture Notes in Computer Science 3733 : 284.
Reese S., Boleda G., Cuadros M., Padr L., and Rigau G. 2010. Wikicorpus: a word-sense disambiguated multilingual wikipedia corpus. In N. Calzolari K. Choukri B. Maegaard J. Mariani J. Odijk S. Piperidis M. Rosner, and Tapias D., (eds.), LREC. European Language Resources Association.
Resnik P., and Yarowsky D., 1999. Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Natural Language Engineering 5 (2): 113–133.
Resnik P. 2004. Exploiting hidden meanings: using bilingual text for monolingual annotation. In International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), pp. 283–299.
Rudnick A., Liu C., and Gasser M., 2013. Hltdi: Cl-wsd using markov random fields for semeval-2013 task 10. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 171–177.
Schmid H., 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, Volume 12, Manchester, UK, pp. 44–49.
Schütze H. 1998. Automatic word sense discrimination. Computational Linguistics 24 (1): 97123, March.
Silberer C., and Ponzetto S. P., 2010. Uhd: cross-lingual word sense disambiguation using multilingual co-occurrence graphs. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 134–137.
Steinberger R., Pouliquen B., Widiger A., Ignat C., Erjavec T., and Tufi D. 2006. The jrc-acquis: a multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pp. 2142–2147.
Tan L., and Bond F., 2013. Xling: matching query sentences to a parallel corpus using topic models for wsd. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 167–170.
Van Gompel M., 2010. Uvt-wsd1: a cross-lingual word sense disambiguation system. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 238–241.
Van Gompel M., and van den Bosch A., 2013. Wsd2: parameter optimisation for memory-based cross-lingual word-sense disambiguation. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 183–187.
Vickrey D., Biewald L., Teyssier M., and Koller D. 2005. Word-sense disambiguation for machine translation. In EMNLP, pp. 771–778.
Vilariño D., Balderas C., Pinto D., Rodríguez M., and León S., 2010. Fcc: modeling probabilities with giza++ for task #2 and #3 of semeval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 112–116.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 2
Total number of PDF views: 14 *
Loading metrics...

Abstract views

Total abstract views: 429 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th October 2017. This data will be updated every 24 hours.