Skip to main content
×
×
Home

Computational learning of construction grammars

  • JONATHAN DUNN (a1)
Abstract

This paper presents an algorithm for learning the construction grammar of a language from a large corpus. This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset. The basic task of construction grammar induction is to identify the minimum set of constructions that represents the language in question with maximum descriptive adequacy. These constructions must (1) generalize across an unspecified number of units while (2) containing mixed levels of representation internally (e.g., both item-specific and schematized representations), and (3) allowing for unfilled and partially filled slots. Additionally, these constructions may (4) contain recursive structure within a given slot that needs to be reduced in order to produce a sufficiently schematic representation. In other words, these constructions are multi-length, multi-level, possibly discontinuous co-occurrences which generalize across internal recursive structures. These co-occurrences are modeled using frequency and the ΔP measure of association, expanded in novel ways to cover multi-unit sequences. This work provides important new evidence for the learnability of construction grammars as well as a tool for the automated corpus analysis of constructions.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Computational learning of construction grammars
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Computational learning of construction grammars
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Computational learning of construction grammars
      Available formats
      ×
Copyright
Corresponding author
Address for correspondence: 3300 South Federal Street, Chicago, IL 60616; web: www.jdunn.name; e-mail: jonathan.edwin.dunn@gmail.com
Footnotes
Hide All
*

The author would like to thank Shlomo Argamon and Joshua Trampier for their support and engagement throughout this project. This work was funded in part by the Oak Ridge Institute for Science and Education.

Footnotes
References
Hide All
Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky Wide Web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43, 209226.
Blunsom, P., & Cohn, T. (2010). Unsupervised induction of tree substitution grammars for dependency parsing. In Li, H., & Màrquez, L., , L. (Eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 12041213). Stroudsburg, PA: Association for Computational Linguistics.
Bod, R. (2006). Exemplar-based syntax: how to get productivity from examples. The Linguistic Review, 22, 291320.
Briscoe, T. (2000). Grammatical acquisition: inductive bias and coevolution of language and the language acquisition device. Language, 76(2), 245296.
Bryant, J. (2004). Scalable construction-based parsing and semantic analysis. In Porzel, R. (Ed.), Proceedings of the Second International Workshop on Scalable Natural Language Understanding (HLT-NAACL) (pp. 3340). Stroudsburg, PA: Association for Computational Linguistics.
Bybee, J. (2006). From usage to grammar: the mind’s response to repetition. Language, 82(4), 711733.
Bybee, J. (2010). Language, usage, and cognition. Cambridge: Cambridge University Press.
Chang, N., De Beule, J., & Micelli, V. (2012). Computational construction grammar: comparing ECG and FCG. In Steels, L. (Ed.), Computational issues in Fluid Construction Grammar (pp. 259288). Berlin: Springer.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N. (1975). Logical structure of linguistic theory. Philadelphia: Springer.
Clark, A. (2001). Unsupervised induction of stochastic context-free grammars using distributional clustering. In Daelemans, W. & Zajac, R. (Eds.), Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics.
da Silva, J., & Lopes, G. (1999). A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In Proceedings of the Sixth Meeting on the Mathematics of Language (pp. 369381). Stroudsburg, PA: Association for Computational Linguistics.
Daudaravičius, V., & Marcinkevičienė, R. (2004). Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics, 9(2), 321348.
Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25(4), 447464.
Dennis, S. (2005). An exemplar-based approach to unsupervised parsing. In Bara, B., Barsalou, L., & Bucciarelli, M. (Eds.), Proceedings of the 27th Annual Conference of the Cognitive Science Society (pp. 583588). Wheatridge, CO: Cognitive Science Society.
Dunn, J. (2015). Review of Levison, Michael; Lessard, Greg; Thomas, Craig; & Donald, Matthew. 2013. The Semantic Representation of Natural Language. Studies in Language 39(2), 492500.
Fillmore, C. (1988). The mechanisms of ‘Construction Grammar.’ In Axmaker, S., Jaisser, A., & Singmaster, H. (Eds.), Proceedings of the Fourteenth Annual Meeting of the Berkeley Linguistics Society (pp. 3555). Berkeley, CA: Berkeley Linguistics Society.
Firth, J. (1957). Papers in linguistics, 1934–1951. Oxford: Oxford University Press.
Forsberg, M., Johansson, R., Bäckström, L., Borin, L., Lyngfelt, B., Olofsson, J., & Prentice, J. (2014). From construction candidates to construction entries: an experiment using semi-automatic methods for identifying constructions in corpora.” Constructions and Frames, 6(1), 114135.
Goldberg, A. (2006). Constructions at work: the nature of generalization in language. Oxford: Oxford University Press.
Goldberg, A. (2009). The nature of generalization in language. Cognitive Linguistics, 20(1), 93127.
Goldberg, A., Casenhiser, D., & Sethuraman, N. (2004). Learning argument structure generalizations. Cognitive Linguistics, 15(3), 289316.
Goldsmith, J. (2001). Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27(2), 153198.
Goldsmith, J. (2006). An algorithm for the unsupervised learning of morphology. Natural Language Engineering, 12(4), 353371.
Gries, S. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403437.
Gries, S. (2012). Frequencies, probabilities, and association measures in usage- / exemplar-based linguistics: some necessary clarifications. Studies in Language, 11(3), 477510.
Gries, S. (2013). 50-something years of work on collocations: what is or should be next. International Journal of Corpus Linguistics, 18(1), 137165.
Gries, S., & Mukherjee, J. (2010). Lexical gravity across varieties of English: an ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics, 15(4), 520548.
Gries, S., & Stefanowitsch, A. (2004a). Extending collostructional analysis: a corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics, 9(1), 97129.
Gries, S., & Stefanowitsch, A. (2004b). Co-varying lexemes in the into-causative. In Achard, M. & Kemmer, S. (Eds.), Language, culture, and mind (pp. 225236). Stanford: CSLI.
Headden, W., Johnson, M., & McClosky, D. (2009). Improving unsupervised dependency parsing with richer contexts and smoothing. In Ostendorf, M., Collins, M., Narayanan, S., Oard, D., & Vanderwende, L. (Eds.), Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 101109). Stroudsburg, PA: Association for Computational Linguistics.
Heinz, J., de la Higuera, C., & van Zaanen, M. (2016). Grammatical inference for computational linguistics. San Rafael, CA: Morgan & Claypool.
Hilpert, M. (2008). New evidence against the modularity of grammar: constructions, collocations, and speech perception. Cognitive Linguistics, 19(3), 483503.
Hopper, P. (1987). Emergent grammar. In Aske, J., Beery, N., Michaelis, L., & Filip, H. (Eds.), Proceedings of the Thirteenth Annual Meeting of the Berkeley Linguistics Society (pp. 139157). Berkeley, CA: Berkeley Linguistics Society.
Istvan, N., & Vincze, V. (2014). VPCTagger: detecting Verb-Particle constructions with syntax-based methods. In Kordoni, V., Egg, M., Savary, A., Wehrli, E., & Evert, S. (Eds.), Proceedings of the 10th Workshop on Multiword Expressions (pp. 1725). Stroudsburg, PA: Association for Computational Linguistics.
Jelinek, F. (1990). Self-organizing language modeling for speech recognition. In Waibel, A. & Lee, K. (Eds.), Readings in speech recognition (pp. 450506). San Mateo, CA: Morgan Kaufmann.
Katzir, R. (2014). A cognitively plausible model for grammar induction. Journal of Language Modelling, 2(2), 213248.
Kay, P., & Fillmore, C. (1999). Grammatical constructions and linguistic generalizations: the What’s X Doing Y? construction. Language, 75(1), 133.
Klein, D., & Manning, C. (2002). A generative constituent-context model for improved grammar induction. In Isabelle, P. (Ed.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 128135). Stroudsburg, PA: Association for Computational Linguistics.
Langacker, R. (1987). Foundations of Cognitive Grammar. Stanford: Stanford University Press.
Langacker, R. (2006). On the continuous debate about discreteness. Cognitive Linguistics, 17(1), 107151.
Langacker, R. (2008). Cognitive Grammar: a basic introduction. Oxford: Oxford University Press.
Levison, M., Lessard, G., Thomas, C., & Donald, M. (2013). The semantic representation of natural language. New York: Bloomsbury.
Lidz, J., & Williams, A. (2009). Constructions on holiday. Cognitive Linguistics, 20(1), 177189.
Mareček, D., & Straka, M. (2013). Stop-probability estimates computed on a large corpus improve unsupervised dependency parsing. In Schuetze, H. (Ed.), Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (pp. 281290). Stroudsburg, PA: Association for Computational Linguistics.
Nirenburg, S., & Raskin, V. (2004). Ontological semantics. Cambridge, MA: MIT Press.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., & Marsi, E. (2007). MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95135.
O’Donnell, M., & Ellis, N. (2010). Towards an inventory of English verb argument constructions. In Sahlgren, M. & Knutsson, O. (Eds.), Proceedings of the Workshop on Extracting and Using Constructions in Computational Linguistics (pp. 916). Stroudsburg, PA: Association for Computational Linguistics.
Piao, S., Bianchi, F., Dayrell, C., D’Egidio, A., & Rayson, P. (2015). Development of the multilingual semantic annotation system. In Mihalcea, R. (Ed.), Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 12681274). Stroudsburg, PA: Association for Computational Linguistics.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing.
Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Sciences, 102(33), 1162911634.
Spitkovsky, V., Alshawi, H., & Jurafsky, D. (2013). Breaking out of local optima with count transforms and model recombination: a study in grammar induction. In Baldwin, T. & Korhonen, A. (Eds.), Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing (pp. 19831995). Stroudsburg, PA: Association for Computational Linguistics.
Steels, L. (2004). Constructivist development of grounded construction grammar. In Scott, D. (Ed.), Proceedings of the 42nd Meeting of the Association for Computational Linguistics (pp. 916). Stroudsburg, PA: Association for Computational Linguistics.
Steels, L. (2012). Design methods for fluid construction grammar. In Steels, L. (Ed), Computational issues in Fluid Construction Grammar (pp. 336). Berlin: Springer.
Stefanowitsch, A., & Gries, S. (2003). Collostructions: investigating the interaction between words and constructions. International Journal of Corpus Linguistics, 8(2), 209243.
Stefanowitsch, A., & Gries, S. (2005). Covarying lexemes. Corpus Linguistics and Linguistic Theory, 1(1), 143.
Tomasello, M. (2003). Constructing a language. Cambridge, MA: Harvard University Press.
Tsao, N., & Wible, D. (2013). Word similarity using constructions as contextual features. In Dagan, I. et al. (Eds.), Proceedings of the Joint Symposium on Semantic Processing: Textual Inference and Structures in Corpora (pp. 5159). Stroudsburg, PA: Association for Computational Linguistics.
van de Cruys, T. (2011). Two multivariate generalizations of pointwise mutual information. In Biemann, C. & Giesbrecht, E. (Eds.), Proceedings of the Workshop on Distributional Semantics and Compositionality (pp. 1620). Stroudsburg, PA: Association for Computational Linguistics.
van Zaanen, M. (2000). ABL: alignment-based learning. In Kay, M. (Ed.), Proceedings of the 18th International Conference on Computational Linguistics (pp. 961967). San Francisco, CA: Morgan Kaufmann Publishers.
Vincze, V., Zsibrita, J., & Istvan, N. (2013). Dependency parsing for identifying Hungarian light-verb constructions. In Chen, H. (Ed.), Proceedings of the International Joint Conference on Natural Language Processing (pp. 207215). Asian Federation of Natural Language Processing.
Wei, N., & Li, J. (2013). A new computing method for extracting contiguous phraseological sequences from academic text corpora. International Journal of Corpus Linguistics, 18(4), 506535.
Wible, D., & Taso, N. (2010). StringNet as a computational resource for discovering and investigating linguistic constructions. In Sahlgren, M. & Knutsson, O. (Eds.), Proceedings of the Workshop on Extracting and Using Constructions in Computational Linguistics (pp. 2531). Stroudsburg, PA: Association for Computational Linguistics.
Zadrozny, W., Szummer, M., Jarecki, S., Johnson, D., & Morhenstern, L. (1994). NL understanding with a grammar of constructions. In Nagao, M. (Ed.), Proceedings of the International Conference on Computational Linguistics (pp. 12891293). International Conference on Computational Linguistics.
Zuidema, W. (2006). What are the productive units of natural language grammar? A DOP approach to the automatic identification of constructions. In Proceedings of the 10th Conference on Computational Natural Language Learning, 2936.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Language and Cognition
  • ISSN: 1866-9808
  • EISSN: 1866-9859
  • URL: /core/journals/language-and-cognition
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 49
Total number of PDF views: 366 *
Loading metrics...

Abstract views

Total abstract views: 645 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 20th July 2018. This data will be updated every 24 hours.