Skip to main content
×
Home

Recent advances in methods of lexical semantic relatedness – a survey

  • ZIQI ZHANG (a1), ANNA LISA GENTILE (a1) and FABIO CIRAVEGNA (a1)
Abstract
Abstract

Measuring lexical semantic relatedness is an important task in Natural Language Processing (NLP). It is often a prerequisite to many complex NLP tasks. Despite an extensive amount of work dedicated to this area of research, there is a lack of an up-to-date survey in the field. This paper aims to address this issue with a study that is focused on four perspectives: (i) a comparative analysis of background information resources that are essential for measuring lexical semantic relatedness; (ii) a review of the literature with a focus on recent methods that are not covered in previous surveys; (iii) discussion of the studies in the biomedical domain where novel methods have been introduced but inadequately communicated across the domain boundaries; and (iv) an evaluation of lexical semantic relatedness methods and a discussion of useful lessons for the development and application of such methods. In addition, we discuss a number of issues in this field and suggest future research directions. It is believed that this work will be a valuable reference to researchers of lexical semantic relatedness and substantially support the research activities in this field.

Copyright
References
Hide All
Agirre E., Alfonseca E., Hall K., Kravalova J., Paşca M., and Soroa A. 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'09), pp. 1927. Stroudsburg, PA, USA: Association for Computational Linguistics.
Al-Mubaid H. and Nguyen H. 2006. A cluster-based approach for semantic similarity in the biomedical domain. In Proceedings of the 28th International Conference of IEEE Engineering in Medicine and Biology Society, New York, USA, August 30–September 3, pp. 2713–7.
Altschul S., Madden T., Schäffer A., Zhang J., Zhang Z., Miller W., and Lipman D. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (17): 3389–402.
Alvarez M. and Liam S. 2007. A graph modeling of semantic similarity between words. In Proceedings of the International Conference on Semantic Computing (ICSC'07), pp. 355–62. Washington, DC, USA: IEEE Computer Society.
Banerjee S. and Pedersen T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 805–10. San Francisco, CA, USA: Morgan Kaufmann.
Bär D., Zesch T., and Gurevych I. 2011. A reflective view on text similarity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing 2011 (RANLP 2011), Hissar, Bulgaria, pp. 515–20.
Batet M., Sánchez D. and Valls A. 2011. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44 (1), 118–25.
Bhattacharya A., Bhowmick A. and Singh A. 2010. Finding top-k similar pairs of objects annotated with terms from an ontology. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM'10), pp. 214–32. Berlin, Germany: Springer-Verlag.
Bizer C., Lehmann J., Kobilarov G., Auer S., Becker C., Cyganiak R., and Hellmann S. 2009. DBpedia – a crystallization point for the web of data. Journal of Web Semantics 7 (3), 154–65.
Bollegala D., Matsuo Y. and Ishizuka M. 2007. An integrated approach to measuring semantic similarity between words using information available on the web. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 340–7. Stroudsburg, PA, USA: Association for Computational Linguistics.
Boutet E., Lieberherr D., Tognolli M., Schneider M., and Bairoch A. 2007. UniProtKB/Swiss-Prot. Methods in Molecular Biology 406, 89112.
Budanitsky A. and Hirst G. 2006. Evaluating WordNet-based measures of lexical semantic relatedness. Journal of Computational Linguistics 32 (1), 1347.
Camon E., Magrane M., Barrell D., Lee V., Dimmer E., Maslen J., Binns D., Harte N., Lopez R., and Apweiler R. 2004. The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Research 32(Database), D262–6.
Chen H., Lin M. and Wei Y. 2006. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–16. Stroudsburg, PA, USA: Association for Computational Linguistics.
Cherry J., Adler C., Ball C., Chervitz S., Dwight S., Hester E., Jia Y., Juvik G., Roe T., Schroeder M., Weng S., and Botstein D. 1998. SGD: saccharomyces genome database. Nucleic Acids Research 26 (1), 73–9.
Chinchor N. 2001. Message Understanding Conference (MUC) 7. LDC2001T02, Philadelphia, Penn: Linguistic Data Consortium.
Chinchor N. and Sundheim B. 2003. Message Understanding Conference (MUC) 6. LDC Catalog No.: LDC2003T13. Philadelphia, PA: Linguistic Data Consortium.
Cilibrasi R. and Vitanyi P. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19 (3), 370–83.
Collins A. and Loftus E. 1975. A spreading-activation theory of semantic processing. Psychological Review 82 (6), 407–28.
Couto F., Silva M. and Coutinho P. 2005. Semantic similarity over the Gene Ontology: family correlation and selecting disjunctive ancestors. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05), pp. 343–4. New York, NY, USA: ACM.
Cramer I. and Finthammer M. 2008. An evaluation procedure for WordNet-based lexical chaining: methods and issues. In Proceedings of the 4th Global WordNet Meeting, pp. 120–46. Szeged, Hungary: University of Szeged.
Cucerzan S. 2007. Large-scale named entity disambiguation based on Wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–16. Stroudsburg, PA, USA: Association for Computational Linguistics.
Curran J. and Moens M. 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition (ULA'02), pp. 5966. Stroudsburg, PA, USA: Association for Computational Linguistics.
Degtyarenko K., Matos P., Ennis M., Hastings J., Zbinden M., McNaught A., Alcntara R., Darsow M., Guedj M., and Ashburner M. 2007. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research 36(Database), D344–50.
Dolan B., Quirk C. and Brockett C. 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04), pp. 350–6. Stroudsburg, PA, USA: Association for Computational Linguistics.
Egozi O., Markovitch S. and Gabrilovich E. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Transactions of Information Systems 29 (2), 8:1–8: 34.
Fellbaum C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA, USA: MIT Press.
Finkelstein F., Gabrilovich E., Matias Y., Rivlin E., Solan Z., Wolfman G., and Ruppin E. 2002. Placing search in context: the concept revisited. ACM Transactions of Information Systems 20 (1), 116–31.
Firth J. R. 1957. A synopsis of linguistic theory, 1930–1955. In Studies in Linguistic Analysis (special volume of the Philological Society), pp. 132. Harlow, UK: Longman.
Gabrilovich E. 2007. Wikipedia preprocessor (WikiPrep). http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/#references. Accessed March 16, 2012).
Gabrilovich E. and Markovitch S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI'07), pp. 1606–11. San Francisco, CA, USA: Morgan Kaufmann.
Gangemi A. and Presutti V. 2010. Towards a pattern science for the semantic web. Emantic Web Journal 1 (1–2), 61–8.
Gentleman R. (2005). Visualizing and distances using GO. http://bioconductor.org/packages/2.0/bioc/vignettes/GOstats/inst/doc/GOvis.pdf. Accessed March 16, 2012.
Gouws S., van Rooyen G-J, and Engelbrecht H. A. 2010. Measuring conceptual similarity by spreading activation over Wikipedia's hyperlink structure. In Proceedings of the COLING 2010, 2nd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources, Beijing, China, pp. 4654.
Gracia J. and Mena E. 2008. Web-based measure of semantic relatedness. In Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE'08), pp. 136150. Berlin, Germany: Springer-Verlag.
Gurevych I. 2005. Using the structure of a conceptual network in computing semantic relatedness. In Proceedings of the 2nd International Joint Conference on Natural Language Processing, pp. 767–78. Berlin, Germany: Springer-Verlag.
Gurevych I. and Niederlich H. 2005. Computing semantic relatedness in German with revised information content metrics. In Proceedings of ÖntoLex 2005 – Ontologies and Lexical Resources (IJCNLP'05) Workshop, pp. 2833. Berlin, Germany: Springer-Verlag.
Halavais A. and Lackaff D. 2008. An analysis of topical coverage of Wikipedia. Journal of Computer-Mediated Communication 13 (2), 429–40.
Han X. and Zhao J. 2010. Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 50–9. Stroudsburg, PA, USA: Association for Computational Linguistics.
Harman D. and Liberman M. 1993. TIPSTER vol. 1. Philadelphia, PA, USA: Linguistic Data Consortium.
Harrington B. 2010. A semantic network approach to measuring relatedness. In Proceedings of the 23rd International Conference on Computational Linguistics, pp. 356–64. Stroudsburg, PA, USA: Association for Computational Linguistics.
Hassan S. and Mihalcea R. 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1192–201. Stroudsburg, PA, USA: Association for Computational Linguistics.
Haveliwala T. 2002. Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web (WWW'02), pp. 517–26. New York, NY, USA: ACM.
Hirst G. and St-Onge D. 1998. Lexical chains as representation of context for the detection and correction malapropisms. In FellBaum C. (ed.), WordNet: An Electronic Lexical Database (Language, Speech, and Communication), pp. 305–32. Cambridge, MA, USA: MIT Press.
Holloway T., Bozicevic M. and Börner K. 2007. Analyzing and visualizing the semantic coverage of Wikipedia and its authors. Journal of Complexity, Special issue on Understanding Complex Systems 12 (3), 3040.
Hope D. 2008. Java WordNet::Similarity (beta). http://www.sussex.ac.uk/Users/drh21/. Accessed March 16, 2012.
Hughes T. and Ramage D. 2007. Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 581–9. Stroudsburg, PA, USA: Association for Computational Linguistics.
Hunter S., Apweiler R., Attwood K., Bairoch A., Bateman A., Binns D., Bork P., and Das U. 2009. InterPro: the integrative protein signature database. Nucleic Acids Research 37(Database), D211–5.
Jarmasz M. and Szpakowicz S. 2003. Roget's thesaurus and semantic similarity. In Proceedings of Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, September 10–12, pp. 212–9.
Jiang J. and Conrath D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research on Computational Linguistics, Taiwan, pp. 1933.
Jones K. 1973. Index term weighting. Information Storage and Retrieval 9 (11), 619–33.
Kanehisa M. and Goto S. 2006. KEGG: Kyoto encyclopedia of genes and genomes. Artificial Intelligence 28 (1), 2730.
Kilgarriff A. 2007. Googleology is bad science. Journal of Computational Linguistics 33 (1), 147–51.
Kliegr T., Chandramouli K., Nemrava J., Svatek V., and Izquierdo E. 2008. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Mining Held in Conjunction with the ACM SIGKDD 2008 (MDM'08), pp. 817. New York, NY, USA: ACM.
Kohler S., Schulz M., Krawitz P., Bauer S., Dolken S., Ott C., Mundlos C., Horn C., Horn D., Mundlos S., and Robinson P. 2009. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. American Journal of Human Genetics 85 (4), 457–64.
Kozima H. and Furugori T. 1993. Similarity between words computed by spreading activation on an English dictionary. In Proceedings of the 6th Conference on European Chapter of the Association for Computational Linguistics (EACL '93), pp. 232–9. Stroudsburg, PA, USA: Association for Computational Linguistics.
Kucera H. and Francis W. 1967. Computational Analysis of Present-Day American English. Providence, RI, USA: Brown University Press.
Kunze C. and Lemnitzer L. 2002. GermaNet – representation, visualization, application. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'02), Las Palmas, Spain, pp. 1485–91. Paris, France: ELRA.
Leacock C. and Chodorow M. 1998. Combining local context and WordNet similarity for word sense identification. In FellBaum C. (ed.), WordNet: An Electronic Lexical Database, pp. 305–32. Cambridge, MA, USA: MIT Press.
Lee L. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL'99), pp. 2532. Stroudsburg, PA, USA: Association for Computational Linguistics.
Lee J., Kim M. and Lee Y. 1993. Information retrieval based on conceptual distance in IS-A hierarchies. Journal of Documentation 49 (2), 188207.
Lee H., Peirsman Y., Chang A., Chambers N., Surdeanu M., and Jurafsky D. 2011. Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task (CONLL Shared Task '11), pp. 2834. Stroudsburg, PA, USA: Association for Computational Linguistics.
Lee M., Pincombe B. and Welsh M. 2005. An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 1254–9. Chicago, USA: Lawrence Erlbaum.
Lei Z. and Dai Y. 2006. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction. BMC Bioinformatics 7, 491.
Lesk M. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC '86), pp. 24–6. New York, NY, USA: ACM.
Li Y., Bandar Z. and McLean D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15 (4), 871–82.
Li J., Gong B., Chen X., Liu T., Wu C., Zhang F., Li C., Li X., Rao S., and Li X. 2011. DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinformatics 12, 266.
Li L., Hu X., Hu B., Wang J., and Zhou Y. 2009. Measuring sentence similarity from different aspects. In Proceedings of the 8th International Conference on Machine Learning and Cybernetics (ICMLC 2009), Baoding, China, pp. 2244–9.
Li Y., McLean D., Bandar Z., O'Shea J., and Crockett K. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18 (8), 1138–50.
Li B., Wang J., Feltus F., Zhou J., and Luo F. 2010. Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins. In Proceedings of the 11th International Conference on Bioinformatics and Computational Biology, pp. 166–72. Las Vegas, NV, USA: CSREA Press.
Lin D. 1998a. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '98), pp. 768–74. Stroudsburg, PA, USA: Association for Computational Linguistics.
Lin D. 1998b. An information-theoretic definition of similarity. In Proceedings of the 5th International Conference on Machine Learning, (ICML '98), pp. 296304. San Francisco, CA, USA: Morgan Kaufmann.
Liu H. and Chen Y. 2010. Computing semantic relatedness between named entities using Wikipedia. In Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence (AICI '10), pp. 388–92. Washington, DC, USA: IEEE Computer Society.
Liu X., Zhou Y. and Zheng R. 2007. Measuring semantic similarity in Wordnet. In Proceedings of the 6th International Conference on Machine Learning and Cybernetics, pp. 3431–5. New York, NY, USA: IEEE.
Lord P., Stevens R., Brass A. and Goble C. 2003a. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19 (10), 1275–83.
Lord P., Stevens R., Brass A. and Goble C. 2003b. Semantic similarity measures as tools for exploring the Gene Ontology. In Proceedings of Pacific Symposium on Biocomputing, Lihue, HI, USA, January 3–7, pp. 601–12.
Maguitman A., Menczer F., Roinestad H. and Vespignani A. 2005. Algorithmic detection of semantic similarity. In Proceedings of the 14th International Conference on World Wide Web (WWW '05), pp. 107116. New York, NY, USA: ACM.
Marcus M., Marcinkiewicz M. and Santorini B. 1993. Building a large annotated corpus of English: the Penn treebank. Journal of Computational Linguistics 19 (2), 313–30.
Matsuo Y., Sakaki T., Uchiyama K. and Ishizuka M. 2006. Graph-based word clustering using a web search engine. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP '06), pp. 542–50. Stroudsburg, PA, USA: Association for Computational Linguistics.
McInnes B., Pedersen T. and Pakhomov S. 2009. UMLS-interface and UMLS-similarity: open source software for measuring paths and semantic similarity. In Proceedings of AMIA Annual Symposium, San Francisco, CA, USA, November 4–18, pp. 431–5.
McKusick V. 1998. Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders, 12th ed.Baltimore, MD: The Johns Hopkins University Press.
McQuilton P., St.Pierre S., Thurmond J., and the FlyBase Consortium. 2011. FlyBase 101 – the basics of navigating flyBase. Nucleic Acids Research 39, 19.
Meyer C. and Gurevych I. 2010. How web communities analyze human language: word senses in Wiktionary. In Proceedings of the 2nd Web Science Conference, Raleigh, NC, April 26–27.
Mihalcea R., Corley C. and Strapparava C. 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06), pp. 775–80. Palo Alto, CA,USA: AAAI Press.
Mihalcea R. and Moldovan D. 1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, (ACL '99), pp. 152–8. Stroudsburg, PA, USA: Association for Computational Linguistics.
Miller G. and Charles W. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1), 128.
Milne D., Medelyan O. and Witten I. 2006. Mining domain-specific thesauri from Wikipedia: a case study. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI'06), pp. 442–8. Washington, DC, USA: IEEE Computer Society.
Milne D. and Witten I. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, pp. 2530. Palo Alto, CA, USA: AAAI Press.
Mitchell A., Strassel S., Przybocki M., Davis J., Doddington D., Grishman R., Meyers A., Brunstain A., Ferro L., and Sundheim B. 2003. TIDES Extraction (ACE) 2003 Multilingual Training Data. LDC Catalog Number: LDC2004T09, pp. 2530. Philadelphia, PA: Linguistic Data Consortium.
Mohler M. and Mihalcea R. 2009. Text-to-text semantic similarity for automatic short answer grading. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL '09), pp. 567–75. Stroudsburg, PA, USA: Association for Computational Linguistics.
Morris J. and Hirst G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Journal of Computational Linguistics, 17 (1), 2148.
Morris J. and Hirst G. 2004. Non-classical lexical semantic relations. In Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics (CLS '04), pp. 4651. Stroudsburg, PA, USA: Association for Computational Linguistics.
Navarro E., Sajous F., Gaume B., Prévot L., ShuKai H., Tzu-Yi K., Magistry P., and Chu-Ren H. 2009. Wiktionary and NLP: improving synonymy networks. In Proceedings of the 2009 Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources (People's Web '09), pp. 1927. Stroudsburg, PA, USA: Association for Computational Linguistics.
Navigli R. 2006. Meaningful clustering of senses helps boost word sense disambiguation performance. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44), pp. 105–12. Stroudsburg, PA, USA: Association for Computational Linguistics.
Navigli R. 2009. Word sense disambiguation: a survey. ACM Computing Survey 41 (2), 10:1–10:69.
Othman R., Deris S. and Illias R. 2007. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences. Journal of Biomedical Informatics 41 (1), 529–38.
Pakhomov S., Coden A. and Chute C. 2004. Creating a test corpus of clinical notes manually tagged for part-of-speech information. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA '04), pp. 62–5. Geneva, Switzerland: Association for Computational Linguistics.
Pakhomov S., Mcinnes B., Adam T., Liu Y., Pedersen T., and Melton G. 2010. Semantic similarity and relatedness between clinical terms: an experimental study. Proceedings of AMIA 2010 Symposium, 572–6. Washington, DC, USA: American Medical Informatics.
Pantel P., Crestan E., Borkovsky A., Popescu A., and Vyas V. 2009. Web-scale distributional similarity and entity set expansion. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP '09), pp. 938–47. Berlin, Germany: Association for Computational Linguistics.
Patwardhan S. and Pedersen T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. Proceedings of the EACL 2006 Workshop Making Sense of Sense – Bringing Computational Linguistics and Psycholinguistics Together, pp. 18. Stroudsburg, PA, USA: Association for Computational Linguistics.
Pedersen T., Pakhomov S., Patwardhan S. and Chute C. 2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40 (3), 288–99.
Pedersen T., Patwardhan S. and Michelizzi J. 2004. WordNet::Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004 (HLT-NAACL–Demonstrations '04), pp. 3841. Stroudsburg, PA, USA: Association for Computational Linguistics.
Pekar V. and Staab S. 2002. Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In Proceedings of the 19th International Conference on Computational Linguistics – vol. 1, (COLING'02), pp. 17. Stroudsburg, PA, USA: Association for Computational Linguistics.
Pesquita C., Faria D., Falcão A., Lord P., and Couto F. 2009. Semantic similarity in biomedical ontologies. PLoS Computational Biology 5 (7): e1000443. 112.
Petrakis E., Varelas G., Hliaoutakis A. and Raftopoulou P. 2006. Design and evaluation of semantic similarity measures for concepts stemming from the same or different ontologies. In Proceedings of the 4th Workshop on Multimedia Semantics (WMS'06), Chania, Crete, June 19–21, pp. 4452.
Pirrò G. 2009. A semantic similarity metric combining features and intrinsic information content. Data Knowledge Engineering 68 (11), 1289–308.
Pirrò G., and Seco N. 2008. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II: On the Move to Meaningful Internet Systems (OTM '08), pp. 1271–88. Berlin, Germany: Springer-Verlag.
Ponzetto S. and Strube M. 2006. Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL '06), pp. 192–9. Stroudsburg, PA, USA: Association for Computational Linguistics.
Ponzetto S. and Strube M. 2007. An API for measuring the relatedness of words in Wikipedia. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL '07), pp. 4952. Stroudsburg, PA, USA: Association for Computational Linguistics.
Ponzetto S. and Strube M. 2011. Taxonomy induction based on a collaboratively built knowledge repository. Journal of Artificial Intelligence 175 (9–10), 17371756.
Pozo A., Pazos F. and Valencia A. 2008. Defining functional distances over Gene Ontology. BMC Bioinformatics 9, 50.
Rada R., Mili H., Bicknell E. and Blettner M. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems Management and Cybernetics 19 (1), 1730.
Radinsky K., Agichtein E., Gabrilovich E. and Markovitch S. 2011. A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW '11), pp. 337–46. New York, NY, USA: ACM.
Resnik P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI'95), pp. 448–53. San Francisco, CA, USA: Morgan Kaufmann.
Richardson R. and Smeaton A. 1995. Using WordNet in a knowledge-based approach to information retrieval. Technical Report CA-0196, School of Computer Applications, Dublin City University.
Riddle T. 2006. Parse::MediaWikiDump. http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.6/lib/Parse/MediaWikiDump.pm. Accessed March 16, 2012.
Riensche R., Baddeley B., Sanfilippo A., Posse C., and Gopalan B. 2007. XOA: web-enabled cross-ontological analytics. In Proceedings of the 1st International Workshop on Service-Oriented Technologies for Biological Databases and Toolsat in the ICWS/SCC Conference, pp. 99105. Washington, DC, USA: IEEE Computer Society.
Rodrìguez M., and Egenhofer M. 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15 (2), 442–56.
Rose T., Stevenson M. and Whitehead M. 2002. The Reuters corpus volume 1-from yesterdays news to tomorrows language resources. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pp. 2931. Paris, France: ELRA.
Rubenstein H. and Goodenough J. 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10), 627–33.
Ruiz-Casado M., Alfonseca E. and Castells P. 2005. Using context-window overlapping in synonym discovery and ontology extension. Proceedings of the International Conference on Recent Advances in Natural Language Processing.
Sahami M. and Heilman T. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (WWW '06), pp. 377–86. New York, NY, USA: ACM.
Schickel-Zuber V. and Faltings B. 2007. OSS: a semantic similarity function based on hierarchical ontologies. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI'07), pp. 551–6. San Francisco, CA, USA: Morgan Kaufmann.
Schlicker A., Domingues F., Rahnenführer J. and Lengauer T. 2006. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7, 302.
Seco N., Veale T. and Hayes J. 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, August 22–27, pp. 1089–90.
Sevilla J., Segura V., Podhorski A., Guruceaga E., Mato J., Martinez-Cruz L., Corrales F., and Rubio A. 2005. Correlation between gene expression and GO semantic similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2 (4), 330–8.
Sheng H., Chen H., Yu T. and Feng Y. 2010. Linked data-based semantic similarity and data mining. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2010), pp. 104–8. New York, NY: IEEE Systems, Man, and Cybernetics Society.
Shima H. 2011. WS4J. http://code.google.com/p/ws4j/. Accessed March 16, 2012.
Speer N., Spieth C. and Zell A. 2004. A memetic clustering algorithm for the functional partition of genes based on the Gene Ontology. In Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, October 7–8, pp. 252–9. New York, NY, USA: IEEE.
Staab S., Braun C., Bruder I., Düsterhöft A., Heuer A., Klettke M., Neumann G., Prager B., Pretzel J., Schnurr H., Studer R., Uszkoreit H., and Wrenger B. 1999. GETESS: searching the web exploiting German texts. In Proceedings of the 3rd International Conference on Cooperative Information Agents III (CIA'99), pp. 113–24. Berlin, Germany: Springer-Verlag.
Strube M. and Ponzetto S. 2006. WikiRelate! computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06), pp. 1419–24. Palo Alto, CA, USA: AAAI Press.
Sussna M. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93), pp. 6774. New York, NY, USA: ACM.
Szarvas G., Zesch T. and Gurevych I. 2011. Combining heterogeneous knowledge resources for improved distributional semantic models. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'11), pp. 289303. Tokyo, Japan: Springer-Verlag.
The BNC Consortium. 2007. The British National Corpus, Version 3 (BNC XML edition). http://www.natcorp.ox.ac.uk/. Accessed March 16, 2012. Distributed by Oxford University Computing Services on behalf of the BNC Consortium.
The Gene Ontology Consortium. 2005. Gene Ontology: tool for the unification of biology. Nature Genetics 25 (1), 25–9.
Tsatsaronis G., Varlamis I. and Vazirgiannis M. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37 (1), 140.
Turdakov D. and Velikhov P. 2008. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. Proceedings of the Spring Young Researcher's Colloquium On Database and Information Systems (CEUR workshop proceedings), St. Petersburg, Russia. Available at CEUR-WS.org.
Turney P. and Pantel P. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–88.
Tversky A. 1977. Features of similarity. Psychological Review 84 (4), 327–52.
Vapnik V. 1998. Statistical Learning Theory. Chichester, UK: Wiley.
Wang J., Du Z., Payattakool R., Yu P., and Chen C. 2007. A new method to measure the semantic similarity of GO terms. BMC Bioinformatics 23 (10), 1274–81.
Wang T. and Hirst G. 2011. Refining the notions of depth and density in WordNet-based semantic similarity measures. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1003–11. Stroudsburg, PA, USA: Association for Computational Linguistics.
Weeds E. 2003. Measures and Applications of Lexical Distributional Similarity. PhD thesis, University of Sussex.
Wojtinnek P. and Pulman S. 2011. Semantic relatedness from automatically generated semantic networks. In Proceedings of the 9th International Conference on Computational Semantics (IWCS '11), pp. 390–4. Oxford, UK: Association for Computational Linguistics.
Wu Z. and Palmer M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL '94), pp. 133–8. Stroudsburg, PA, USA: Association for Computational Linguistics.
Wu H., Su Z., Mao F., Olman V., and Xu Y. 2005. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Research 33 (9), 2822–37.
Wu X., Zhu L., Guo J., Zhang D., and Lin K. 2006. Prediction of yeast protein – protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Research 34 (7), 2137–50.
Yang D. and Powers D. 2005. Measuring semantic similarity in the taxonomy of WordNet. In Proceedings of the 28th Australasian Conference on Computer Science (ACSC '05), pp. 315–22. Darlinghurst, Australia: Australian Computer Society.
Yang D. and Powers D. 2006. Verb similarity on the taxonomy of Wordnet. In Proceedings of the 3rd International WordNet Conference (GWC-06). Masaryk, Czech Republic: Masaryk University.
Yang X. and Su J. 2007. Coreference resolution using semantic relatedness information from automatically discovered patterns. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 528–35. Stroudsburg, PA, USA: Association for Computational Linguistics.
Yazdani M. and Popescu-Belis A. 2010. A random walk framework to compute textual semantic similarity: a unified model for three benchmark tasks. In Proceedings of the 2010 IEEE 4th International Conference on Semantic Computing (ICSC '10), pp. 424–9. Washington, DC, USA: IEEE Computer Society.
Ye P., Peyser B., Pan X., Boeke J., Spencer F., and Bader J. 2005. Gene function prediction from congruent synthetic lethal interactions in yeast. Molecular Systems Biology 1:2005.0026. pp. 112.
Yeh E., Ramage D., Manning C., Agirre E., and Soroa A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the ACL 2009 Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-4), pp. 41–9. Stroudsburg, PA, USA: Association for Computational Linguistics.
Yu H., Gao L., Tu K. and Guo Z. 2005. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene 352, 7581.
Zesch T. and Gurevych I. 2006. Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances (LD '06), pp. 1624. Stroudsburg, PA, USA: Association for Computational Linguistics.
Zesch T. and Gurevych I. 2007. Analysis of the Wikipedia category graph for NLP applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT), pp. 18Stroudsburg, PA, USA: Association for Computational Linguistics.
Zesch T. and Gurevych I. 2010a. Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words. Natural Language Engineering 16 (1), 2559.
Zesch T. and Gurevych I. 2010b. The more the better? Assessing the influence of Wikipedia's growth on semantic relatedness measures. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC'10). Paris, France: European Language Resources Association (ELRA).
Zesch T., Müller C. and Gurevych I. 2008a. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 1646–52. Paris, France: European Language Resources Association (ELRA).
Zesch T., Müller C. and Gurevych I. 2008b. Using Wiktionary for computing semantic relatedness. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI'08), pp. 861–6. Palo Alto, CA, USA: AAAI Press.
Zhang Z., Gentile A. and Ciravegna F. 2011. Harnessing different knowledge sources to measure semantic relatedness under a uniform model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11), pp. 9911002. Stroudsburg, PA, USA: Association for Computational Linguistics.
Ziegler C., Simon K. and Lausen G. 2006. Automatic computation of semantic proximity using taxonomic knowledge. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM '06), pp. 465–74. New York, NY, USA: ACM.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 6
Total number of PDF views: 75 *
Loading metrics...

Abstract views

Total abstract views: 551 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th November 2017. This data will be updated every 24 hours.