Sparsity and normalization in word similarity systems

JEAN MARK GAWRON; KELLEN STEPHENS

doi:10.1017/S1351324915000261

Sparsity and normalization in word similarity systems

Published online by Cambridge University Press: 19 August 2015

JEAN MARK GAWRON

and

KELLEN STEPHENS

Show author details

JEAN MARK GAWRON: Affiliation:
Department of Linguistics, San Diego State University, San Diego, CA, USA e-mails: gawron@mail.sdsu.edu, KStephens@eplicaservices.com
KELLEN STEPHENS: Affiliation:
Department of Linguistics, San Diego State University, San Diego, CA, USA e-mails: gawron@mail.sdsu.edu, KStephens@eplicaservices.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We investigate the problem of improving performance in distributional word similarity systems trained on sparse data, focusing on a family of similarity functions we call Dice-family functions (Dice 1945Ecology26(3): 297–302), including the similarity function introduced in Lin (1998Proceedings of the 15th International Conference on Machine Learning, 296–304), and Curran (2004 PhD thesis, University of Edinburgh. College of Science and Engineering. School of Informatics), as well as a generalized version of Dice Coefficient used in data mining applications (Strehl 2000, 55). We propose a generalization of the Dice-family functions which uses a weight parameter α to make the similarity functions asymmetric. We show that this generalized family of functions (α systems) all belong to the class of asymmetric models first proposed in Tversky (1977Psychological Review84: 327–352), and in a multi-task evaluation of ten word similarity systems, we show that α systems have the best performance across word ranks. In particular, we show that α-parameterization substantially improves the correlations of all Dice-family functions with human judgements on three words sets, including the Miller–Charles/Rubenstein Goodenough word set (Miller and Charles 1991Language and Cognitive Processes6(1): 1–28; Rubenstein and Goodenough 1965Communications of the ACM8: 627–633).

Information

Type: Articles
Information: Natural Language Engineering , Volume 22 , Issue 3 , May 2016 , pp. 351 - 395

DOI: https://doi.org/10.1017/S1351324915000261 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., and Soroa, A. 2009. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of NAACL-HLT 09, Stroudsberg, PA. Association for Computational Linguistics, pp. 19–27.CrossRef Google Scholar

Agirre, E. and Soroa, A. 2009. Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Stroudsberg, PA . Association for Computational Linguistics, pp. 33–41.Google Scholar

Bordag, S. 2008. A comparison of co-occurrence and similarity measures as simulations of context. In Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing, Berlin: Springer, pp. 52–63.Google Scholar

Bouma, G. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference, Tubingen. Gunter Narr Verlag, pp. 31–40.Google Scholar

Bullinaria, J. A. and Levy, J. P. 2012. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods 44 (3): 890–907.CrossRef Google Scholar PubMed

Burnard, L. 1995. Users Reference Guide British National Corpus: Version 1.0. Oxford: Oxford University Computing Services.Google Scholar

Church, K. W. and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–29.Google Scholar

Curran, J. R. 2004. From Distributional to Semantic Similarity. PhD thesis, University of Edinburgh. College of Science and Engineering. School of Informatics.Google Scholar

Dagan, I., Lee, L. and Pereira, F. 1997. Similarity-based methods for word sense disambiguation. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Stroudsberg, PA. Association for Computational Linguistics, pp. 56–63.Google Scholar

Dagan, I., Lee, L. and Pereira, F. C. N. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning 34 (1): 43–69.Google Scholar

Dagan, I. 2000. Contextual word similarity. In Dale, R., Moisl, H. L., and Somers, H. L. (eds.), Handbook of Natural Language Processing, pp. 459–475. New York: Marcel Dekker.Google Scholar

Dice, L. R. 1945. Measures of the amount of ecologic association between species. Ecology 26 (3): 297–302.Google Scholar

Eisler, H. and Ekman, G. 1959. A mechanism of subjective similarity. Nordisk Psykologi 11 (1): 1–10.CrossRef Google Scholar

Evert, S. 2008. Corpora and collocations. In Lüdeling, A. and Kytö, M. (eds.), Corpus Linguistics: An International Handbook. Berlin: Mouton de Gruyter.Google Scholar

Ferreira da Silva, J., and Pereira Lopes, G. 1999. A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In Proceedings of the 6th Meeting on Mathematics of Language, University of Pennsylvania, Philadelphia, PA. Association for the Mathematics of Language, pp. 369–381.Google Scholar

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20 (1): 116–131.Google Scholar

Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R. and Wang, Z. 2005. New experiments in distributional representations of synonymy. In Proceedings of the 9th Conference on Computational Natural Language Learning, Stroudsberg, PA. Association for Computational Linguistics, pp. 25–32.Google Scholar

Gabrilovich, E. and Markovitch, S. 2009. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34 (2): 443–498.CrossRef Google Scholar

Gawron, J. M. 2011. Frame semantics. In Maienborn, C., von Heusinger, K., and Portner, P. (eds.), Semantics: An International Handbook of Natural Language Meaning, vol. 23. HSK Handbooks of Linguistics and Communication Science Series. Berlin: Mouton de Gruyter.Google Scholar

Grefenstette, G. 1994. Explorations in Automatic Thesaurus Discovery. New York: Springer Science and Business Media.Google Scholar

Hassan, S. and Mihalcea, R. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI Conference Artificial Intelligence, Palo Alto, CA. AAAI Press, pp. 884–889.Google Scholar

Haveliwala, T. H. 2003. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering 15 (4): 784–796.CrossRef Google Scholar

Heylen, K., Peirsman, Y., Geeraerts, D. and Speelman, D. 2008. Modelling word similarity: an evaluation of automatic synonymy extraction algorithms. In Proceedings of the 6th International Language Resources and Evaluation (LREC-2008), Marrakech, Morocco. European Language Resources Association, pp. 3243–3249.Google Scholar

Hughes, T. and Ramage, D. 2007. Lexical semantic relatedness with random graph walks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing/ Conference on Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 581–589.Google Scholar

Jaccard, P. 1912. The distribution of the ora in the alpine zone. New Phytologist 11 (2): 37–50.Google Scholar

Jiang, J. J. and Conrath, D. W. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics (ROCLING-10), Stroudsberg, PA. Association for Computational Linguistics.Google Scholar

Jimenez, S., Becerra, C. and Gelbukh, A. 2012. Soft cardinality: a parameterized similarity function for text comparison. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Stroudsberg, PA. Association for Computational Linguistics, pp. 449–453.Google Scholar

Landauer, T. K. and Dumais, S. T. 1994. Latent semantic analysis and the measurement of knowledge. In Kaplan, R., and Burstein, J. C. B. (eds.), Educational Testing Service Conference on Natural Language Processing Techniques and Technology in Assessment and Education, Ewing, NJ: Educational Testing Service.Google Scholar

Leacock, C., Miller, G. A. and Chodorow, M. 1998. Using corpus statistics and wordnet relations for sense identification. Computational Linguistics 24 (1): 147–165.Google Scholar

Lee, L. 1997. Similarity-Based Approaches to Natural Language Processing. PhD thesis, Harvard University.Google Scholar

Lee, L. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Stroudsberg, PA. Association for Computational Linguistics, pp. 25–32.Google Scholar

Lee, L. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, FL. Society for Artificial Intelligence and Statistics, pp. 65–72.Google Scholar

Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, Wisconsin. International Machine Learning Society, pp. 296–304.Google Scholar

Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.Google Scholar

McHale, M. 1998. A comparison of WordNet and Roget's taxonomy for measuring semantic similarity. In Workshop on Usage of WordNet in Natural Language Processing Systems, Stroudsberg, PA. COLING-ACL. Available from http://xxx.lanl.gov/abs/cmp-lg/9809003.Google Scholar

Miller, G. A. and Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 1–28.CrossRef Google Scholar

Nida, E. A. 1975. Componential Analysis of Meaning: An Introduction to Semantic Structures. The Hague: Mouton.Google Scholar

Nivre, J. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 5th International Conference on Computational Natural Language Learning (CONLL-2003), Stroudsberg, PA. Association of Computational Linguistics, pp. 149–160.Google Scholar

Pilehvar, M. T., Jurgens, D. and Navigli, R. 2013. Align, disambiguate and walk: a unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the ACL, Stroudsberg, PA. Association for Computational Linguistics, pp. 1341–1351.Google Scholar

Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada. International Joint Conferences on Artificial Intelligence, pp. 448–453.Google Scholar

Rosch, E. 1975. Cognitive reference points. Cognitive Psychology 7 (4): 532–547.Google Scholar

Rubenstein, H. and Goodenough, J. B. 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10): 627–633.CrossRef Google Scholar

Schütze, H. 1993. Part-of-speech induction from scratch. In Proceedings of the 31st annual meeting on Association for Computational Linguistics, Stroudsberg, PA. Association for Computational Linguistics, pp. 251–258.Google Scholar

Sjoberg, L. 1972. A cognitive theory of similarity. Goteborg Psychological Reports 10. Department of Psychology. University of Goteburg.CrossRef Google Scholar

Strehl, A. 2000. Relation-Based Clustering and Cluster Ensembles for High-Dimensional Data-Mining. PhD thesis, University of Texas, Austin, TX.Google Scholar

Turney, P., Littman, M. L., Bigham, J. and Shnayder, V. 2003. Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria. INCOMA, Ltd, pp. 482–489.Google Scholar

Turney, P. D. 2008. A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics, Proceedings of the Conference (COLING-2008), Manchester, UK. ACL-COLING, pp. 905–912.Google Scholar

Tversky, A. 1977. Features of similarity. Psychological Review 84 (4): 327–352.Google Scholar

van Rijsbergen, C. J. 1979. Information retrieval. Oxford: Butterworth-Heinemann.Google Scholar

Weeds, J. and Weir, D. 2005. Co-occurrence retrieval: a flexible framework for lexical distributional similarity. Computational Linguistics 31 (4): 439–475.Google Scholar

Yang, D. and Powers, D. M. 2005. Measuring semantic similarity in the texonomy of wordnet. In Proceedings of 28th Australasian Computer Science Conference, Newcastle, NSW, Australia. Australian Computer Society, pp. 315–322.Google Scholar

Yih, W.-T. and Qazvinian, V. 2012. Measuring word relatedness using heterogeneous vector space models. In Proceedings of the 2012 Conference of NACCL, Stroudsberg, PA. Association for Computational Linguistics, pp. 616–620.Google Scholar

Article contents

Sparsity and normalization in word similarity systems

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests