Skip to main content
×
Home
    • Aa
    • Aa

Recent advances in methods of lexical semantic relatedness – a survey

  • ZIQI ZHANG (a1), ANNA LISA GENTILE (a1) and FABIO CIRAVEGNA (a1)
Abstract
Abstract

Measuring lexical semantic relatedness is an important task in Natural Language Processing (NLP). It is often a prerequisite to many complex NLP tasks. Despite an extensive amount of work dedicated to this area of research, there is a lack of an up-to-date survey in the field. This paper aims to address this issue with a study that is focused on four perspectives: (i) a comparative analysis of background information resources that are essential for measuring lexical semantic relatedness; (ii) a review of the literature with a focus on recent methods that are not covered in previous surveys; (iii) discussion of the studies in the biomedical domain where novel methods have been introduced but inadequately communicated across the domain boundaries; and (iv) an evaluation of lexical semantic relatedness methods and a discussion of useful lessons for the development and application of such methods. In addition, we discuss a number of issues in this field and suggest future research directions. It is believed that this work will be a valuable reference to researchers of lexical semantic relatedness and substantially support the research activities in this field.

Copyright
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

S. Altschul , T. Madden , A. Schäffer , J. Zhang , Z. Zhang , W. Miller , and D. Lipman 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (17): 3389–402.

M. Batet , D. Sánchez and A. Valls 2011. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44 (1), 118–25.

A. Bhattacharya , A. Bhowmick and A. Singh 2010. Finding top-k similar pairs of objects annotated with terms from an ontology. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM'10), pp. 214–32. Berlin, Germany: Springer-Verlag.

C. Bizer , J. Lehmann , G. Kobilarov , S. Auer , C. Becker , R. Cyganiak , and S. Hellmann 2009. DBpedia – a crystallization point for the web of data. Journal of Web Semantics 7 (3), 154–65.

A. Budanitsky and G. Hirst 2006. Evaluating WordNet-based measures of lexical semantic relatedness. Journal of Computational Linguistics 32 (1), 1347.

E. Camon , M. Magrane , D. Barrell , V. Lee , E. Dimmer , J. Maslen , D. Binns , N. Harte , R. Lopez , and R. Apweiler 2004. The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Research 32(Database), D262–6.

J. Cherry , C. Adler , C. Ball , S. Chervitz , S. Dwight , E. Hester , Y. Jia , G. Juvik , T. Roe , M. Schroeder , S. Weng , and D. Botstein 1998. SGD: saccharomyces genome database. Nucleic Acids Research 26 (1), 73–9.

R. Cilibrasi and P. Vitanyi 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19 (3), 370–83.

A. Collins and E. Loftus 1975. A spreading-activation theory of semantic processing. Psychological Review 82 (6), 407–28.

J. Curran and M. Moens 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition (ULA'02), pp. 5966. Stroudsburg, PA, USA: Association for Computational Linguistics.

K. Degtyarenko , P. Matos , M. Ennis , J. Hastings , M. Zbinden , A. McNaught , R. Alcntara , M. Darsow , M. Guedj , and M. Ashburner 2007. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research 36(Database), D344–50.

O. Egozi , S. Markovitch and E. Gabrilovich 2011. Concept-based information retrieval using explicit semantic analysis. ACM Transactions of Information Systems 29 (2), 8:1–8: 34.

F. Finkelstein , E. Gabrilovich , Y. Matias , E. Rivlin , Z. Solan , G. Wolfman , and E. Ruppin 2002. Placing search in context: the concept revisited. ACM Transactions of Information Systems 20 (1), 116–31.

A. Halavais and D. Lackaff 2008. An analysis of topical coverage of Wikipedia. Journal of Computer-Mediated Communication 13 (2), 429–40.

S. Hassan and R. Mihalcea 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1192–201. Stroudsburg, PA, USA: Association for Computational Linguistics.

S. Hunter , R. Apweiler , K. Attwood , A. Bairoch , A. Bateman , D. Binns , P. Bork , and U. Das 2009. InterPro: the integrative protein signature database. Nucleic Acids Research 37(Database), D211–5.

K. Jones 1973. Index term weighting. Information Storage and Retrieval 9 (11), 619–33.

A. Kilgarriff 2007. Googleology is bad science. Journal of Computational Linguistics 33 (1), 147–51.

T. Kliegr , K. Chandramouli , J. Nemrava , V. Svatek , and E. Izquierdo 2008. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Mining Held in Conjunction with the ACM SIGKDD 2008 (MDM'08), pp. 817. New York, NY, USA: ACM.

S. Kohler , M. Schulz , P. Krawitz , S. Bauer , S. Dolken , C. Ott , C. Mundlos , C. Horn , D. Horn , S. Mundlos , and P. Robinson 2009. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. American Journal of Human Genetics 85 (4), 457–64.

L. Lee 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL'99), pp. 2532. Stroudsburg, PA, USA: Association for Computational Linguistics.

J. Lee , M. Kim and Y. Lee 1993. Information retrieval based on conceptual distance in IS-A hierarchies. Journal of Documentation 49 (2), 188207.

Z. Lei and Y. Dai 2006. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction. BMC Bioinformatics 7, 491.

M. Lesk 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC '86), pp. 24–6. New York, NY, USA: ACM.

J. Li , B. Gong , X. Chen , T. Liu , C. Wu , F. Zhang , C. Li , X. Li , S. Rao , and X. Li 2011. DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinformatics 12, 266.

Y. Li , D. McLean , Z. Bandar , J. O'Shea , and K. Crockett 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18 (8), 1138–50.

D. Lin 1998a. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '98), pp. 768–74. Stroudsburg, PA, USA: Association for Computational Linguistics.

H. Liu and Y. Chen 2010. Computing semantic relatedness between named entities using Wikipedia. In Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence (AICI '10), pp. 388–92. Washington, DC, USA: IEEE Computer Society.

P. Lord , R. Stevens , A. Brass and C. Goble 2003a. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19 (10), 1275–83.

A. Maguitman , F. Menczer , H. Roinestad and A. Vespignani 2005. Algorithmic detection of semantic similarity. In Proceedings of the 14th International Conference on World Wide Web (WWW '05), pp. 107116. New York, NY, USA: ACM.

Y. Matsuo , T. Sakaki , K. Uchiyama and M. Ishizuka 2006. Graph-based word clustering using a web search engine. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP '06), pp. 542–50. Stroudsburg, PA, USA: Association for Computational Linguistics.

R. Mihalcea and D. Moldovan 1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, (ACL '99), pp. 152–8. Stroudsburg, PA, USA: Association for Computational Linguistics.

G. Miller and W. Charles 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1), 128.

D. Milne , O. Medelyan and I. Witten 2006. Mining domain-specific thesauri from Wikipedia: a case study. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI'06), pp. 442–8. Washington, DC, USA: IEEE Computer Society.

M. Mohler and R. Mihalcea 2009. Text-to-text semantic similarity for automatic short answer grading. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL '09), pp. 567–75. Stroudsburg, PA, USA: Association for Computational Linguistics.

J. Morris and G. Hirst 2004. Non-classical lexical semantic relations. In Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics (CLS '04), pp. 4651. Stroudsburg, PA, USA: Association for Computational Linguistics.

R. Navigli 2009. Word sense disambiguation: a survey. ACM Computing Survey 41 (2), 10:1–10:69.

P. Pantel , E. Crestan , A. Borkovsky , A. Popescu , and V. Vyas 2009. Web-scale distributional similarity and entity set expansion. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP '09), pp. 938–47. Berlin, Germany: Association for Computational Linguistics.

T. Pedersen , S. Pakhomov , S. Patwardhan and C. Chute 2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40 (3), 288–99.

T. Pedersen , S. Patwardhan and J. Michelizzi 2004. WordNet::Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004 (HLT-NAACL–Demonstrations '04), pp. 3841. Stroudsburg, PA, USA: Association for Computational Linguistics.

V. Pekar and S. Staab 2002. Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In Proceedings of the 19th International Conference on Computational Linguistics – vol. 1, (COLING'02), pp. 17. Stroudsburg, PA, USA: Association for Computational Linguistics.

C. Pesquita , D. Faria , A. Falcão , P. Lord , and F. Couto 2009. Semantic similarity in biomedical ontologies. PLoS Computational Biology 5 (7): e1000443. 112.

G. Pirrò 2009. A semantic similarity metric combining features and intrinsic information content. Data Knowledge Engineering 68 (11), 1289–308.

S. Ponzetto and M. Strube 2006. Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL '06), pp. 192–9. Stroudsburg, PA, USA: Association for Computational Linguistics.

S. Ponzetto and M. Strube 2007. An API for measuring the relatedness of words in Wikipedia. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL '07), pp. 4952. Stroudsburg, PA, USA: Association for Computational Linguistics.

S. Ponzetto and M. Strube 2011. Taxonomy induction based on a collaboratively built knowledge repository. Journal of Artificial Intelligence 175 (9–10), 17371756.

A. Pozo , F. Pazos and A. Valencia 2008. Defining functional distances over Gene Ontology. BMC Bioinformatics 9, 50.

R. Rada , H. Mili , E. Bicknell and M. Blettner 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems Management and Cybernetics 19 (1), 1730.

M. Rodrìguez , and M. Egenhofer 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15 (2), 442–56.

H. Rubenstein and J. Goodenough 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10), 627–33.

M. Sahami and T. Heilman 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (WWW '06), pp. 377–86. New York, NY, USA: ACM.

A. Schlicker , F. Domingues , J. Rahnenführer and T. Lengauer 2006. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7, 302.

J. Sevilla , V. Segura , A. Podhorski , E. Guruceaga , J. Mato , L. Martinez-Cruz , F. Corrales , and A. Rubio 2005. Correlation between gene expression and GO semantic similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2 (4), 330–8.

H. Sheng , H. Chen , T. Yu and Y. Feng 2010. Linked data-based semantic similarity and data mining. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2010), pp. 104–8. New York, NY: IEEE Systems, Man, and Cybernetics Society.

N. Speer , C. Spieth and A. Zell 2004. A memetic clustering algorithm for the functional partition of genes based on the Gene Ontology. In Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, October 7–8, pp. 252–9. New York, NY, USA: IEEE.

S. Staab , C. Braun , I. Bruder , A. Düsterhöft , A. Heuer , M. Klettke , G. Neumann , B. Prager , J. Pretzel , H. Schnurr , R. Studer , H. Uszkoreit , and B. Wrenger 1999. GETESS: searching the web exploiting German texts. In Proceedings of the 3rd International Conference on Cooperative Information Agents III (CIA'99), pp. 113–24. Berlin, Germany: Springer-Verlag.

A. Tversky 1977. Features of similarity. Psychological Review 84 (4), 327–52.

J. Wang , Z. Du , R. Payattakool , P. Yu , and C. Chen 2007. A new method to measure the semantic similarity of GO terms. BMC Bioinformatics 23 (10), 1274–81.

Z. Wu and M. Palmer 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL '94), pp. 133–8. Stroudsburg, PA, USA: Association for Computational Linguistics.

H. Wu , Z. Su , F. Mao , V. Olman , and Y. Xu 2005. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Research 33 (9), 2822–37.

X. Wu , L. Zhu , J. Guo , D. Zhang , and K. Lin 2006. Prediction of yeast protein – protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Research 34 (7), 2137–50.

M. Yazdani and A. Popescu-Belis 2010. A random walk framework to compute textual semantic similarity: a unified model for three benchmark tasks. In Proceedings of the 2010 IEEE 4th International Conference on Semantic Computing (ICSC '10), pp. 424–9. Washington, DC, USA: IEEE Computer Society.

P. Ye , B. Peyser , X. Pan , J. Boeke , F. Spencer , and J. Bader 2005. Gene function prediction from congruent synthetic lethal interactions in yeast. Molecular Systems Biology 1:2005.0026. pp. 112.

E. Yeh , D. Ramage , C. Manning , E. Agirre , and A. Soroa 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the ACL 2009 Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-4), pp. 41–9. Stroudsburg, PA, USA: Association for Computational Linguistics.

H. Yu , L. Gao , K. Tu and Z. Guo 2005. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene 352, 7581.

T. Zesch and I. Gurevych 2006. Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances (LD '06), pp. 1624. Stroudsburg, PA, USA: Association for Computational Linguistics.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 5
Total number of PDF views: 30 *
Loading metrics...

Abstract views

Total abstract views: 279 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 27th March 2017. This data will be updated every 24 hours.