Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features

ABDULGABBAR SAIF; MOHD JUZAIDDIN AB AZIZ; NAZLIA OMAR

doi:10.1017/S1351324915000376

Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features

Published online by Cambridge University Press: 21 October 2015

ABDULGABBAR SAIF ,

MOHD JUZAIDDIN AB AZIZ and

NAZLIA OMAR

Show author details

ABDULGABBAR SAIF: Affiliation:
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia e-mail: agmss79@siswa.ukm.edu.my, juzaiddin@ukm.edu.my, nazlia@ukm.edu.my
MOHD JUZAIDDIN AB AZIZ: Affiliation:
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia e-mail: agmss79@siswa.ukm.edu.my, juzaiddin@ukm.edu.my, nazlia@ukm.edu.my
NAZLIA OMAR: Affiliation:
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia e-mail: agmss79@siswa.ukm.edu.my, juzaiddin@ukm.edu.my, nazlia@ukm.edu.my

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The alignment of WordNet and Wikipedia has received wide attention from researchers of computational linguistics, who are building a new lexical knowledge source or enriching the semantic information of WordNet entities. The main challenge of this alignment is how to handle the synonymy and ambiguity issues in the contents of two units from different sources. Therefore, this paper introduces mapping method that links an Arabic WordNet synset to its corresponding article in Wikipedia. This method uses monolingual and bilingual features to overcome the lack of semantic information in Arabic WordNet. For evaluating this method, an Arabic mapping data set, which contains 1,291 synset–article pairs, is compiled. The experimental analysis shows that the proposed method achieves promising results and outperforms the state-of-the-art methods that depend only on monolingual features. The mapped method has also been used to increase the coverage of Arabic WordNet by inserting new synsets from Wikipedia.

Information

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 1 , January 2017 , pp. 53 - 91

DOI: https://doi.org/10.1017/S1351324915000376 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abouenour, L., Bouzoubaa, K. and Rosso, P. 2010. Using the Yago ontology as a resource for the enrichment of Named Entities in Arabic WordNet. Workshop on Language Resources and Human Language Technologies for Semitic Languages Status, Updates, and Prospects (LREC-2010) Conference, Malta, pp. 27–31.Google Scholar

Agirre, E. and Soroa, A. 2009. Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics. Athens, Greece, pp. 33–41.Google Scholar

Al-Asal, M. S., and Smadi, O. M. 2012. Arabicization and Arabic expanding techniques used in science lectures in two Arab universities. Perspectives in the Arts and Humanities Asia 2 (1): 15–38.Google Scholar

Alhanini, Y. and Ab Aziz, M. J. 2011. The enhancement of Arabic stemming by using light stemming and dictionary-based stemming. Journal of Software Engineering and Applications 4 (9): 522–26.Google Scholar

Alkhalifa, M. and Rodríguez, H. 2008. Automatically extending named entities coverage of Arabic WordNet using Wikipedia. International Journal on Information and Communication Technologies 1 (1): 1–17.Google Scholar

Atserias, J., Climent, S., Rigau, G. and Rodriguez, H. 1997. Combining multiple methods for the automatic construction of multilingual WordNets. In Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP-1997), Association for Computational Linguistics (ACL). Tzigov Chark, pp. 143–49.Google Scholar

Cilibrasi, R. L. and Vitanyi, P. M. B. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19 (3): 370–83.CrossRef Google Scholar

Cucerzan, S. 2007. Large-scale Named Entity disambiguation based on Wikipedia data. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2007), Association for Computational Linguistics. Prague, Czech Republic, pp. 708–16.Google Scholar

Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., and Fellbaum, C. 2006. Building a wordnet for arabic. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-2006), Citeseer. Genoa - Italy, pp. 29–34.Google Scholar

Fellbaum, C. 1998. WordNet: an Electrical Lexical Database. Cambridge, MA: The MIT Press.CrossRef Google Scholar

Fernando, S. and Stevenson, M. 2012. Mapping WordNet synsets to Wikipedia articles. In Calzolari, N., Choukri, K., Declerck, T., Doğan, M. U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC-2012), European Language Resources Association (ELRA). Istanbul, Turkey, pp. 590–96.Google Scholar

Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), Morgan Kaufmann. Hyderabad, India, pp. 1606–11.Google Scholar

Hassan, S. and Mihalcea, R. 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. Singapore, pp. 1192–201.Google Scholar

Hassan, S. and Mihalcea, R. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI 2011 (25th AAAI Conference on Artificial Intelligence), Association for the Advancement of Artificial Intelligence. San Francisco, pp. 884–89.Google Scholar

Kashgary, A. D. 2011. The paradox of translating the untranslatable: Equivalence vs. non-equivalence in translating from Arabic into English. Journal of King Saud University-Languages and Translation 23 (1): 47–57.Google Scholar

Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49 (2): 265–83.Google Scholar

Li, Y., Bandar, Z. A. and McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15 (4): 871–82.Google Scholar

Matuschek, M. and Gurevych, I. 2013. Dijkstra-WSA: a graph-based approach to word sense alignment. Transactions of the Association for Computational Linguistics 1 (1): 151–64.Google Scholar

Medelyan, O., Milne, D., Legg, C. and Witten, I. H. 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67 (9): 716–54.Google Scholar

Mihalcea, R. 2007. Using Wikipedia for automatic word sense disambiguation. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics. Rochester, New York, pp. 196–203.Google Scholar

Milne, D. and Witten, I. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, Association for the Advancement of Artificial Intelligence. Chicago, USA: AAAI Press, pp. 25–30.Google Scholar

Navigli, R. and Ponzetto, S. P. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193 (12): 217–50.Google Scholar

Niemann, E. and Gurevych, I. 2011. The people's web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In Proceedings of the 9th International Conference on Computational Semantics (IWCS-2011), Citeseer. Oxford, UK, pp. 205–14.Google Scholar

Paul, P. 1978. Longman Dictionary of Contemporary English. England: Longman Group Limited.Google Scholar

Pilehvar, M. T. and Navigli, R. 2014. A robust approach to aligning heterogeneous lexical resources. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-2014), Association for Computational Linguistics. Baltimore, Maryland, pp. 468–78.Google Scholar

Pirró, G. and Euzenat, J. 2010. A feature and information theoretic framework for semantic similarity and relatedness. In Proceedings of the 9th International Semantic Web Conference (ISWC-2010), Springer. Shanghai, China, pp. 615–30.Google Scholar

Ponzetto, S. P. and Navigli, R. 2010. Knowledge-rich word sense disambiguation rivaling supervised systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics. Uppsala, Sweden, pp. 1522–31.Google Scholar

Pradet, Q., de Chalendar, G., and Desormeaux, J. B. 2014. WoNeF, an improved, expanded and evaluated automatic French translation of WordNet. In Proceedings of the 7th Global WordNetConference, Tartu, Estonia, pp. 32–39.Google Scholar

Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 448–53.Google Scholar

Rodríguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Martí, M. A., Black, W., Elkateb, S., Kirk, J., and Pease, A. 2008. Arabic wordnet: Current state and future extensions. In Proceedings of the 4th Global WordNet Conference, Citeseer. Szeged, Hungary, pp. 1–20.Google Scholar

Roget, P. M. 1911. Roget'S International Thesaurus, 1st ed. New York, USA: Thomas Y. Crowell Co.Google Scholar

Ruiz-Casado, M., Alfonseca, E., and Castells, P. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. Advances in Web Intelligence, pp. 380–86. Lodz, Poland: Springer.Google Scholar

Saif, A., Ab Aziz, M. J., and Omar, N. 2013. Measuring the compositionality of Arabic multiword expressions. Soft Computing Applications and Intelligent Systems, pp. 245–56. Shah Alam, Malaysia: Springer.Google Scholar

Sánchez, D., Batet, M. and Isern, D. 2011. Ontology-based information content computation. Knowledge-Based Systems 24 (2): 297–303.Google Scholar

Seco, N., Veale, T. and Hayes, J. 2004. An intrinsic information content metric for semantic similarity in WordNet. 16th European Conference on Artificial Intelligence (ECAI-2004), Including Prestigious Applicants of Intelligent Systems, IOS Press. Valencia, Spain, pp. 1089–90.Google Scholar

Suchanek, F. M., Kasneci, G. and Weikum, G. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, ACM. Banff, Canada, pp. 697–706.Google Scholar

Toral, A., Munoz, R. and Monachini, M. 2008. Named entity wordnet. In Proceedings of the 6th International Conference on Language Resources and Evaluation, Citeseer. Marrakech, Marocco, pp. 741–47.Google Scholar

Vossen, P. 1998. A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers.CrossRef Google Scholar

Vossen, P., Soroa, A., Zapirain, B. and Rigau, G. 2012. Cross-lingual event-mining using wordnet as a shared knowledge interface. 6th Global WordNet Conference, Publ. Tribun EU. Matsue, Japan, pp. 382–89.Google Scholar

Wolf, E. and Gurevych, I. 2010. Aligning sense inventories in wikipedia and wordnet. In Proceedings of the 1st Workshop on Automated Knowledge Base Construction, Citeseer. Grenoble, France, pp. 24–28.Google Scholar

Wu, Z. and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics. Las Cruces, New Mexico, pp. 133–38.Google Scholar

Zesch, T. and Gurevych, I. 2010. Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural Language Engineering 16 (1): 25–59.Google Scholar

Zhang, Z., Gentile, A. L. and Ciravegna, F. 2012. Recent advances in methods of lexical semantic relatedness–a survey. Natural Language Engineering 1 (1): 1–69.Google Scholar

Zhou, Z., Wang, Y. and Gu, J. 2008. A new model of information content for semantic similarity in WordNet. 2nd International Conference on Future Generation Communication and Networking Symposia (FGCNS-2008), IEEE. Hainan Island, China, pp. 85–89.Google Scholar

Article contents

Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests