Skip to main content Accessibility help

A non-negative tensor factorization model for selectional preference induction



The distributional similarity methods have proven to be a valuable tool for the induction of semantic similarity. Until now, most algorithms use two-way co-occurrence data to compute the meaning of words. Co-occurrence frequencies, however, need not be pairwise. One can easily imagine situations where it is desirable to investigate co-occurrence frequencies of three modes and beyond. This paper will investigate tensor factorization methods to build a model of three-way co-occurrences. The approach is applied to the problem of selectional preference induction, and automatically evaluated in a pseudo-disambiguation task. The results show that tensor factorization, and non-negative tensor factorization in particular, is a promising tool for Natural Language Processing (nlp).



Hide All
Abe, N. and Li, H. 1996. Learning word association norms using tree cut pair models. In Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 311.
Acar, E. and Yener, B. 2009. Unsupervised multiway data analysis: A literature survey. IEEE Transactions on Knowledge and Data Engineering 21 (1): 620.
Bader, B. W. and Kolda, T. G. 2006a. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software 32 (4), December.
Bader, B. W. and Kolda, T. G. 2006b. Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, December.
Bader, B. W. and Kolda, T. G. 2009. Matlab tensor toolbox version 2.3., July.
Basili, R., Pazienza, M. T., and Velardi, P. 1992. Computational lexicons: the neat examples and the odd exemplars. In Proceedings of Applied Natural Language Processing Conference - ANLP, Trento, Italy, pp. 96103.
Basili, R., De Cao, D., Marocco, P., and Pennacchiotti, M. 2007. Learning selectional preferences for entailment or paraphrasing rules. In Proceedings of RANLP 2007, Borovets, Bulgaria.
Bhagat, R., Pantel, P., and Hovy, E. 2007. Ledir: an unsupervised algorithm for learning directionality of inference rules. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-07), pp. 161170, Prague, Czech Republic.
Bro, R. and De Jong, S. 1997. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics 11: 393401.
Bullinaria, J. A. and Levy, J. P. 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods 39: 510526.
Carroll, J. D. and Chang, J.-J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35: 283319.
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information & lexicography. Computational Linguistics 16 (1): 2229.
Clark, S. and Weir, D. 2001. Class-based probability estimation using a semantic hierarchy. In Proceedings of NAACL 2001, Pittsburgh, USA, pp. 95102.
Deprettere, F. (ed.) 1988. SVD and Signal Processing: Algorithms, Applications and Architectures. Amsterdam, The Netherlands: North-Holland Publishing.
Erk, K. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of ACL 2007, Prague, Czech Republic, pp. 216223.
Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3): 245288.
Grishman, R. and Sterling, J. 1992. Acquisition of selectional patterns. In Proceedings of COLING 1992, Nantes, France, pp. 658664.
Harshman, R. A. 1970. Foundations of the parafac procedure: models and conditions for an “explanatory” multi-mode factor analysis. In UCLA Working Papers in Phonetics, vol. 16, pp. 184, Los Angeles: University of California.
Hindle, D. and Rooth, M. 1993. Structural ambiguity and lexical relations. Computational Linguistics 19 (1): 103120.
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, Stockholm, Sweden, pp. 289296.
Kiers, H. A. L. and van Mechelen, I. 2001. Three-way component analysis: Principles and illustrative application. Psychological Methods 6: 84110.
Kiers, H. A. L. 2000. Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14: 105122.
Kolda, T. and Bader, B. 2006. The TOPHITS model for higher-order web link analysis. In Workshop on Link Analysis, Counterterrorism and Security, Bethesda, MD, USA.
Kolda, T. G. and Bader, B. W. 2009. Tensor decompositions and applications. SIAM Review 51 (3), September.
Landauer, T. and Dumais, S. 1997. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychology Review 104: 211240.
Landauer, T., Foltz, P., and Laham, D. 1998. An Introduction to Latent Semantic Analysis. Discourse Processes 25: 295–284.
Lawson, C. L. and Hanson, B. J. 1974. Solving Least Squares Problems. Englewood Cliffs, NJ: Prentice-Hall.
Lee, D. D. and Seung, H. S. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the 2000 Conference of the Advances in Neural information Processing Systems 13, Denver, CO, USA, pp. 556562.
Light, M. and Greiff, W. 2002. Statistical models for the induction and use of selectional preferences. Cognitive Science 26: 269281.
McCarthy, D. and Carroll, J. 2003. Disambiguating nouns, verbs and adjectives using automatically acquired selectional preferences. Computational Linguistics 29 (4): 639654.
Ordelman, R. J. F. 2002. Twente Nieuws Corpus (TwNC), August. Parlevink Language Technology Group, University of Twente, The Netherlands.
Pereira, F., Tishby, N., and Lee, L. 1993. Distributional clustering of English words. In 31st Annual Meeting of the ACL, Columbus, OH, USA, pp. 183190.
Resnik, P. S. 1993. Selection And Information: A Class-based Approach to Lexical Relationships. Ph.D. thesis, University of Pennsylvania.
Resnik, P. 1996. Selectional constraints: an information-theoretic model and its computational realization. Cognition 61: 127159, November.
Rooth, M., Riezler, S., Prescher, D., Carroll, G., and Beil, F. 1999. Inducing a semantically annotated lexicon via em-based clustering. In 37th Annual Meeting of the ACL, College Park, Maryland, USA, pp. 104111.
Shashua, A. and Hazan, T. 2005. Non-negative tensor factorization with applications to statistics and computer vision. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pp. 792799, New York, NY, USA: ACM.
Tucker, L. R. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31: 279311.
Turney, P. D. 2007. Empirical evaluation of four tensor decomposition algorithms. Technical Report ERB-1152, Ottawa, ON, Canada: National Research Council, Institute for Information Technology.
van Noord, G. 2006. At Last Parsing Is Now Operational. In Mertens, Piet, Fairon, Cedrick, Dister, Anne, and Watrin, Patrick (eds.), TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, pp. 2042, Leuven, Belgium, Leuven University Press.
Vasilescu, M. A. O. and Terzopoulos, D. 2002. Multilinear analysis of image ensembles: Tensorfaces. In European Conference on Computer Vision (ECCV '02), Copenhagen, Denmark, pp. 447460.
Welling, M. and Weber, M. 2001. Positive tensor factorization. Pattern Recognition Letters 22: 12551261.


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed