Skip to main content

Weighting-based semantic similarity measure based on topological parameters in semantic taxonomy


Semantic measures are used in handling different issues in several research areas, such as artificial intelligence, natural language processing, knowledge engineering, bioinformatics, and information retrieval. Hierarchical feature-based semantic measures have been proposed to estimate the semantic similarity between two concepts/words depending on the features extracted from a semantic taxonomy (hierarchy) of a given lexical source. The central issue in these measures is the constant weighting assumption that all elements in the semantic representation of the concept possess the same relevance. In this paper, a new weighting-based semantic similarity measure is proposed to address the issues in hierarchical feature-based measures. Four mechanisms are introduced to weigh the degree of relevance of features in the semantic representation of a concept by using topological parameters (edge, depth, descendants, and density) in a semantic taxonomy. With the semantic taxonomy of WordNet, the proposed semantic measure is evaluated for word semantic similarity in four gold-standard datasets. Experimental results show that the proposed measure outperforms hierarchical feature-based semantic measures in all the datasets. Comparison results also imply that the proposed measure is more effective than information-content measures in measuring semantic similarity.

Hide All

This work was partially funded by the Ministry of Higher Education in Malaysia under the grant no. (FRGS/1/2016/ICT02/UKM/02/11). The first author would like to thank the University of Saba Region for its financial supports.

Hide All
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., and Soroa, A., 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, USA. Association for Computational Linguistics, pp. 19–27.
Al-Mubaid, H., and Nguyen, H. A. 2006. A cluster-based approach for semantic similarity in the biomedical domain. In Proceedings of the 28th Annual International Conference of the IEEE on Engineering in Medicine and Biology Society, 2006. New York, USA, pp. 2713–7.
Aouicha, M. B., Taieb, M. A. H., and Ezzeddine, M., 2016. Derivation of “is a” taxonomy from Wikipedia category graph. Engineering Applications of Artificial Intelligence 50 : 265–86.
Banerjee, S., and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 805–10.
Batet, M., Sánchez, D., and Valls, A., 2011. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44 (1): 118–25.
Cross, V., Yu, X., and Hu, X., 2013. Unifying ontological similarity measures: A theoretical and empirical investigation. International Journal of Approximate Reasoning 54 (7): 861–75.
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, ACM, pp. 406–14.
Firth, J. R., 1957. A Synopsis of Linguistic Theory, 1930–1955. In Studies in Linguistic Analysis. Oxford: Blackwell.
Gabrilovich, E., and Markovitch, S. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Hyderabad, India, Morgan Kaufmann, pp. 1606–11.
Gabrilovich, E., and Markovitch, S., 2009. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34 (2): 443–98.
Gentleman, R. 2005. Visualizing and distances using GO.
Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B., 2007. Topics in semantic representation. Psychological Review 114 (2): 211–44.
Gurevych, I. 2005. Using the structure of a conceptual network in computing semantic relatedness. In Natural Language Processing–IJCNLP 2005, pp. 767–78. Berlin: Springer.
Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., and Montmain, J., 2014. A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. Journal of Biomedical Informatics 48 : 3853.
Hassan, S. 2011. Measuring Semantic Relatedness Using Salient Encyclopedic Concept. PhD thesis, University of North Texas, Denton, TX, USA.
Hassan, S., and Mihalcea, R., 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, Association for Computational Linguistics, pp. 1192–201.
Hassan, S., and Mihalcea, R., 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI 2011 (25th AAAI Conference on Artificial Intelligence), San Francisco, Association for the Advancement of Artificial Intelligence, pp. 884–9.
Hill, F., Reichart, R., and Korhonen, A., 2015. SimLex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41 (4): 665–95.
Jiang, J. J., and Conrath, D. W. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference Research on Computational Linguistics (ROCLING 97), Taiwan, pp. 19–33.
Jiang, Y., Bai, W., Zhang, X., and Hu, J. 2016. Wikipedia-based information content and semantic similarity computation. Information Processing & Management 53 (1), 248–65.
Jiang, Y., Zhang, X., Tang, Y., and Nie, R., 2015. Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Information Processing & Management 51 (3): 215–34.
Lastra-Díaz, J. J., and García-Serrano, A., 2015a. A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems 89 : 509–26.
Lastra-Díaz, J. J., and García-Serrano, A., 2015b. A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence 46 : 140–53.
Lesk, M., 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual International Conference on Systems Documentation, Toronto, Canada, ACM, pp. 24–6.
Meng, L., Gu, J., and Zhou, Z., 2012. A New model of information content based on concept’s topology for measuring semantic similarity in WordNet. International Journal of Grid & Distributed Computing 5 (3): 8194.
Miller, G. A., and Charles, W. G., 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 128.
Pirró, G., 2009. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering 68 (11): 1289–308.
Radinsky, K., Agichtein, E., Gabrilovich, E., and Markovitch, S., 2011. A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, ACM, pp. 337–46.
Rubenstein, H., and Goodenough, J. B., 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10): 627–33.
Saif, A., Ab Aziz, M. J., and Omar, N., 2014. Evaluating knowledge-based semantic measures on Arabic. International Journal on Communications Antenna and Propagation 4 (5): 180–94.
Saif, A., Ab Aziz, M. J., and Omar, N., 2016. Reducing explicit semantic representation vectors using Latent Dirichlet Allocation. Knowledge-Based Systems 100 : 145–59.
Saif, A., Ab Aziz, M. J., and Omar, N., 2017. Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features. Natural Language Engineering 23 (1): 5391.
Sánchez, D., and Batet, M., 2012. A new model to compute the information content of concepts from taxonomic knowledge. International Journal on Semantic Web and Information Systems (IJSWIS) 8 (2): 3450.
Sánchez, D., Batet, M., and Isern, D., 2011. Ontology-based information content computation. Knowledge-Based Systems 24 (2): 297303.
Sánchez, D., Batet, M., Isern, D., and Valls, A., 2012. Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications 39 (9): 7718–28.
Seco, N., Veale, T., and Hayes, J., 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence, ECAI 2004, Including Prestigious Applicants of Intelligent Systems, Valencia, Spain, IOS Press, pp. 1089–90.
Steiger, J. H., 1980. Tests for comparing elements of a correlation matrix. Psychological Bulletin 87 (2): 245–51.
Sussna, M., 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the 2nd International Conference on Information and Knowledge Management, Washington, D.C., USA: ACM, pp. 67–74.
Taieb, H., Ben Aouicha, M., Tmar, M., and Hamadou, A. B., 2011. New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness. In IEEE 10th International Conference on, Cybernetic Intelligent Systems (CIS), 2011, London, UK: IEEE, pp. 51–8.
Taieb, M. A., Ben Aouicha, M., and Ben Hamadou, A., 2013. Computing semantic relatedness using Wikipedia features. Knowledge-Based Systems 50 : 260–78.
Taieb, M. A. H., Aouicha, M. B., and Hamadou, A. B., 2014. Ontology-based approach for measuring semantic similarity. Engineering Applications of Artificial Intelligence 36 : 238–61.
Wang, T., and Hirst, G., 2011. Refining the notions of depth and density in wordnet-based semantic similarity measures. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, UK, pp. 1003–11.
Wessa, P. 2016. Free Statistics Software, Office for Research Development and Education., Accessed November 9, 2016.
Wu, X., Zhu, L., Guo, J., Zhang, D.-Y., and Lin, K., 2006. Prediction of yeast protein–protein interaction network: Insights from the gene ontology and annotations. Nucleic Acids Research 34 (7): 2137–50.
Yuan, Q., Yu, Z., and Wang, K., 2013. A new model of information content for measuring the semantic similarity between concepts. In International Conference on Cloud Computing and Big Data (CloudCom-Asia), 2013, Fuzhou, China: IEEE, pp. 141–6.
Zhang, Z., Gentile, A. L., and Ciravegna, F., 2013. Recent advances in methods of lexical semantic relatedness–a survey. Natural Language Engineering 19 (4): 411–79.
Zhou, Z., Wang, Y., and Gu, J., 2008. A new model of information content for semantic similarity in WordNet. In Proceedings of the 2nd International Conference on Future Generation Communication and Networking Symposia (FGCNS’08), Hainan Island, China, IEEE, pp. 85–9.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed