Skip to main content Accessibility help
×
×
Home

A generalised quantifier theory of natural language in categorical compositional distributional semantics with bialgebras

  • Jules Hedges (a1) and Mehrnoosh Sadrzadeh (a2)
Abstract

Categorical compositional distributional semantics is a model of natural language; it combines the statistical vector space models of words with the compositional models of grammar. We formalise in this model the generalised quantifier theory of natural language, due to Barwise and Cooper. The underlying setting is a compact closed category with bialgebras. We start from a generative grammar formalisation and develop an abstract categorical compositional semantics for it, and then instantiate the abstract setting to sets and relations and to finite-dimensional vector spaces and linear maps. We prove the equivalence of the relational instantiation to the truth theoretic semantics of generalised quantifiers. The vector space instantiation formalises the statistical usages of words and enables us to, for the first time, reason about quantified phrases and sentences compositionally in distributional semantics.

Copyright
Corresponding author
*Corresponding author. Email: julian.hedges@cs.ox.ac.uk
References
Hide All
Ajdukiewicz, K. (1935). Die syntaktische konnexitat. Studia Philosophica 1 127.
Bar-Hillel, Y. (1953). A quasi-arithmetical notation for syntactic description. Language 29 4758.
Bar-Hillel, Y., Gaifman, C. and Shamir, E. (1960). On categorial and phrase-structure grammars. Bulletin of the Research Council of Israel 9F, 116.
Barwise, J. and Cooper, R. (1981). Generalized quantifiers and natural language. Linguistics and Philosophy 4 159219.
Bonchi, F., Sobocinski, P. and Zanasi, F. (2014). Interacting bialgebras are Frobenius. In: Muscholl, A. (ed.) Proceedings of FoSSaCS 2014, vol. 8412, Grenoble, France Springer, 351365.
Bullinaria, J. A. and Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39 510526.
Buszkowski, W. (1988). Generative power of categorial grammars. In: Oehrle, R., Bach, E., and Wheeler, D. (eds.) Categorial Grammars and Natural Language Structures, Studies in Linguistics and Philosophy, vol. 32, Springer Netherlands, 6994.
Buszkowski, W. (2001). Lambek grammars based on pregroups. In: Logical Aspects of Computational Linguistics, Lecture Notes in Computer Science, vol. 2099, Springer Berlin Heidelberg, 95109.
Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory 2 113124.
Clark, S., Coecke, B. and Sadrzadeh, M. (2008). A compositional distributional model of meaning. In: Bruza, P., Lawless, W., Coecke, B. (eds.) Proceedings of the Second Symposium on Quantum Interaction (QI), Oxford University, College Publications, 133140.
Clark, S., Coecke, B. and Sadrzadeh, M. (2013). The Frobenius anatomy of relative pronouns. In: Kornai, A., Kuhlmann, M. (eds.), 13th Meeting on Mathematics of Language (MoL)., Sofia, Bulgaria, ACL, 4151.
Clark, S. and Pulman, S. (2007). Combining symbolic and distributional models of meaning. In: Bruza, P., Lawless, W., van Rijsbergen, C. J. (eds.) Proceedings of the AAAI Spring Symposium on Quantum Interaction, Technical Report SS-07-08, Stanford University, AAAI Press, 5255.
Coecke, B., Grefenstette, E. and Sadrzadeh, M. (2013). Lambek vs. Lambek: Functorial vector space semantics and string diagrams for Lambek calculus. Annals of Pure and Applied Logic 164(11) 10791100, special issue on Seventh Workshop on Games for Logic and Programming Languages (GaLoP VII).
Coecke, B., Sadrzadeh, M. and Clark, S. (2010). Mathematical foundations for distributed compositional model of meaning. Lambek Festschrift. Linguistic Analysis 36 345384.
Firth, J. (1957). A synopsis of linguistic theory 1930–1955. In: Palmer, F. R. (ed.) Studies in Linguistic Analysis Longmans, 168205.
Frege, G. (1948). On sense and reference. The Philosophical Review 57 209230.
Geffet, M. and Dagan, I. (2005). The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ‘05, Association for Computational Linguistics, 107114.
Grefenstette, E., Dinu, G., Zhang, Y., Sadrzadeh, M. and Baroni, M. (2013). Multi-step regression learning for compositional distributional semantics. In: 10th International Conference on Computational Semantics (IWCS). Postdam.
Grefenstette, E. and Sadrzadeh, M. (2011). Experimental support for a categorical compositional distributional model of meaning. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Computational Linguistics 41. MIT Press 13941404.
Grefenstette, E. and Sadrzadeh, M. (2015). Concrete models and empirical evaluations for the categorical compositional distributional model of meaning. Computational Linguistics 41 71118.
Harris, Z. (1954). Distributional structure. Word 10, 146162, Routledge.
Kartsaklis, D. (2015). Compositional Distributional Semantics with Compact Closed Categories and Frobenius Algebras. PhD thesis, Department of Computer Science, University of Oxford.
Kartsaklis, D., Kalchbrenner, N. and Sadrzadeh, M. (2014). Resolving lexical ambiguity in tensor regression models of meaning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, June 22–27, 2014, ACL 2014, Baltimore, MD, USA, 212217.
Kartsaklis, D. and Sadrzadeh, M. (2013). Prior disambiguation of word tensors for constructing sentence vectors. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics, 15901601.
Kartsaklis, D., Sadrzadeh, M. and Pulman, S. (2012). A unified sentence space for categorical distributional-compositional semantics: Theory and experiments. In: Proceedings of 24th International Conference on Computational Linguistics (COLING 2012): Posters, Mumbai, India, 549558.
Kartsaklis, D., Sadrzadeh, M., Pulman, S. and Coecke, B. (2013). Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In: Chubb, A., Eskandarian, J. and Harizanov, V. (eds.) Logic and Algebraic Structures in Quantum Computing and Information, Cambridge University Press. 199222.
Kelly, G. and Laplaza, M. (1980). Coherence for compact closed categories. Journal of Pure and Applied Algebra 19, 193213. http://www.sciencedirect.com/science/article/pii/0022404980901012
Kock, A. (1972). Strong functors and monoidal monads. Archiv der Mathematik 23 113120.
Lambek, J. (1958). The mathematics of sentence structure. American Mathematics Monthly 65 154170.
Lambek, J. (1997). Type grammars revisited. In: Proceedings of LACL 97, Lecture Notes in Artificial Intelligence, vol. 1582, Springer Verlag. 127.
Lambek, J. (2008). From Word to Sentence: A Computational Algebraic Approach to Grammar. Polimetrica.
Lambek, J. (2010). Compact monoidal categories from linguistics to physics. In: Coecke, B. (ed.) New Structures for Physics, Lecture Notes in Physics, Springer, 451469.
Landauer, T. and Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104 211240.
Lapesa, G. and Evert, S. (2014). A large scale evaluation of distributional semantic models: Parameters, interactions and model selection. Transactions of the Association for Computational Linguistics 2 531545.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on Computational linguistics, vol. 2, Association for Computational Linguistics, 768774.
Lund, K. and Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods Instruments and Computers 28 (2) 203208.
McCurdy, M. (2012). Graphical methods for Tannaka duality of weak bialgebras and weak Hopf algebras. Theory and Applications of Categories 26 (9) 233280.
Milajevs, D., Kartsaklis, D., Sadrzadeh, M. and Purver, M. (2014). Evaluating neural word representations in tensor-based compositional settings. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 708719.
Mitchell, J. and Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science 34 13881439.
Montague, R. (1970). English as a formal language. In: Visentini, B. (ed.) Linguaggi nella Società e nella Tecnica, Edizioni di Comunità, 189224.
Polajnar, T., Fagarasan, L. and Clark, S. (2014). Reducing dimensions of tensors in type-driven distributional semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 10361046. Association for Computational Linguistics.
Preller, A. (2013). From logical to distributional models. In: Proceedings of the 10th International Workshop on Quantum Physics and Logic, QPL 2013, Castelldefels (Barcelona), Spain, July 17–19, 113131.
Preller, A. (2014). Natural language semantics in biproduct dagger categories. Journal of Applied Logic 12 (1) 88108. https://doi.org/10.1016/j.jal.2013.08.001
Preller, A. and Lambek, J. (2007). Free compact 2-categories. Mathematical Structures in Computer Science 17 309340.
Preller, A. and Sadrzadeh, M. (2010). Bell states and negative sentences in the distributed model of meaning. In: Coecke, B., Panangaden, P., Selinger, P. (eds.) Proceedings of the 6th QPL Workshop on Quantum Physics and Logic, Electronic Notes in Theoretical Computer Science, University of Oxford. 141153.
Preller, A. and Sadrzadeh, M. (2011). Semantic vector models and functional models for pregroup grammars. Journal of Logic Language and Information 20 419443.
Rubenstein, H. and Goodenough, J. (1965). Contextual correlates of synonymy. Communications of the ACM 8 (10) 627633.
Rypacek, O. and Sadrzadeh, M. (2014). A low-level treatment of generalised quantifiers in categorical compositional distributional semantics. In: Joint Proceedings of the Second International Workshop on Natural Language and Computer Science (NLCS14) and First International Workshop on Natural Language Services for Reasoners (NLSR 2014), TR 2014/02, Center for Informatics and Systems of the University of Coimbra, 165177.
Sadrzadeh, M., Clark, S. and Coecke, B. (2013). Frobenius anatomy of word meanings i: Subject and object relative pronouns. Journal of Logic and Computation 23 12931317.
Sadrzadeh, M., Clark, S. and Coecke, B. (2014). Frobenius anatomy of word meanings 2: Possessive relative pronouns. Journal of Logic and Computation 26 785815.
Salton, G., Wong, A. and Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM 18 613620.
Schuetze, H. (1998). Automatic word sense discrimination. Computational Linguistics 24 (1) 97123.
Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics 32 (3) 379416.
van Benthem, J. (1987). Categorial grammar and lambda calculus. In: Skordev, Dimiter G. (ed.) Mathematical Logic and Its Applications, Springer, 3960.
Weeds, J., Weir, D. and McCarthy, D. (2004). Characterising measures of lexical distributional similarity. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING ‘04, Association for Computational Linguistics. 1015.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Mathematical Structures in Computer Science
  • ISSN: 0960-1295
  • EISSN: 1469-8072
  • URL: /core/journals/mathematical-structures-in-computer-science
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed