Hostname: page-component-77f85d65b8-grvzd Total loading time: 0 Render date: 2026-04-12T23:26:53.895Z Has data issue: false hasContentIssue false

Generative Linguistics and Neural Networks at 60: Foundation, Friction, and Fusion

Published online by Cambridge University Press:  01 January 2026

Joe Pater*
Affiliation:
University of Massachusetts Amherst

Abstract

The birthdate of both generative linguistics and neural networks can be taken as 1957, the year of the publication of foundational work by both Noam Chomsky and Frank Rosenblatt. This article traces the development of these two approaches to cognitive science, from their largely autonomous early development in the first thirty years, through their collision in the 1980s around the past-tense debate (Rumelhart & McClelland 1986, Pinker & Prince 1988) and their integration in much subsequent work up to the present. Although this integration has produced a considerable body of results, the continued general gulf between these two lines of research is likely impeding progress in both: on learning in generative linguistics, and on the representation of language in neural modeling. The article concludes with a brief argument that generative linguistics is unlikely to fulfill its promise of accounting for language learning if it continues to maintain its distance from neural and statistical approaches to learning.

Information

Type
Perspectives
Copyright
Copyright © 2019 Linguistic Society of America

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adger, David. 2017. The autonomy of syntax. London: Queen Mary University, ms. Online: http://ling.auf.net/lingbuzz/003442.Google Scholar
Albright, Adam, and Hayes, Bruce. 2003. Rules vs. analogy in English past tenses: A computational/experimental study. Cognition 90. 119-61. DOI: 10.1016/S0010-0277(03)00146-X.CrossRefGoogle ScholarPubMed
Alderete, John, and Tupper, Paul. 2018. Connectionist approaches to generative phonology. The Routledge handbook of phonological theory, ed. by Hannahs, S. J. and Bosch, Anna R. K., 360-90. London: Routledge.Google Scholar
Alderete, John, Tupper, Paul; and Frisch, Stefan A.. 2013. Phonological constraint induction in a connectionist network: Learning OCP-Place constraints from data. Language Sciences 37. 5269. DOI: 10.1016/j.langsci.2012.10.002.CrossRefGoogle Scholar
Alhama, Raquel G., and Zuidema, Willem. 2018. Pre-wiring and pre-training: What does a neural network need to learn truly general identity rules? Journal of Artificial Intelligence Research 61. 927-46. DOI: 10.1613/jair.1.11197.CrossRefGoogle Scholar
Alishahi, Afra, Barking, Marie; and Chrupała, Grzegorz. 2017. Encoding of phonology in a recurrent neural model of grounded speech. Proceedings of the 21st Conference on Computational Natural Language Learning, 368-78. DOI: 10.18653/v1/K17-1037.CrossRefGoogle Scholar
Amodei, Dario, Ananthanarayanan, Sundaram, Anubhai, Rishita, Bai, Jingliang, Battenberg, Eric, Case, Carl, Casper, Jared; et al. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. Proceedings of Machine Learning Research (Proceedings of the 33rd International Conference on Machine Learning) 48. 173-82. Online: http://proceedings.mlr.press/v48/amodei16.pdf.Google Scholar
Anderson, James A., and Rosenfeld, Edward. 2000. Talking nets: An oral history of neural networks. Cambridge, MA: Bradford Books/MIT Press.CrossRefGoogle Scholar
Anderson, Stephen R. 1985. Phonology in the twentieth century: Theories of rules and theories of representations. Chicago: University of Chicago Press.Google Scholar
Andreas, Jacob, and Ghahramani, Zoubin. 2013. A generative model of vector space semantics. Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, 9199. Online: http://aclweb.org/anthology/W13-3211.Google Scholar
Bahdanau, Dzmitry, Cho, Kyunghyun; and Bengio, Yoshua. 2016. Neural machine translation by jointly learning to align and translate. Paper presented at the International Conference on Learning Representations (ICLR) 2015. arXiv:1409.0473 [cs.CL]. Online: https://arxiv.org/abs/1409.0473.Google Scholar
Bai, Shaojie, Kolter, J. Zico; and Koltun, Vladlen. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271 [cs.LG]. Online: https://arxiv.org/abs/1803.01271.Google Scholar
Bates, Elizabeth, Elman, Jeffrey L., Johnson, Mark H., Karmiloff-Smith, Annette, Parisi, Domenico; and Plunkett, Kim. 1998. Innateness and emergentism. A companion to cognitive science, ed. by Bechtel, William and Graham, George, 590601. Oxford: Blackwell. DOI: 10.1002/9781405164535.ch46.Google Scholar
Berent, Iris. 2013. The phonological mind. Cambridge: Cambridge University Press.CrossRefGoogle ScholarPubMed
Berent, Iris, Wilson, Colin, Marcus, Gary F.; and Bemis, Douglas K.. 2012. On the role of variables in phonology: Remarks on Hayes and Wilson 2008. Linguistic Inquiry 43(1). 97119. DOI: 10.1162/LING_a_00075.CrossRefGoogle Scholar
Bermúdez-Otero, Ricardo. 2016. Comment on ‘Chomsky 1957 and the past tense’. Phonolist. Online: http://blogs.umass.edu/phonolist/2016/06/28/discussion-chomsky-1957-on-the-english-past-tense/, accessed August 24, 2017.Google Scholar
Bernardy, Jean-Philippe, and Lappin, Shalom. 2017. Using deep neural networks to learn syntactic agreement. Linguistic Issues in Language Technology 15(2). Online: http://csli-lilt.stanford.edu/ojs/index.php/LiLT/article/view/94/79.CrossRefGoogle Scholar
Block, H. D. 1962. The perceptron: A model for brain functioning. Reviews of Modern Physics 34(1). 123-35. DOI: 10.1103/RevModPhys.34.123.CrossRefGoogle Scholar
Block, H. D. 1970. Review of Minsky & Papert 1988 [1969]. Information and Control 17(5). 501-22. DOI: 10.1016/S0019-9958(70)90409-2.Google Scholar
Block, H. D., Knight, B. W. Jr.; and Rosenblatt, Frank. 1962. Analysis of a four-layer series-coupled perceptron. II. Reviews of Modern Physics 34(1). 135-42. DOI: 10.1103/RevModPhys.34.135.CrossRefGoogle Scholar
Boeckx, Cedric. 2014. What principles and parameters got wrong. Linguistic variation in the minimalist framework, ed. by Picallo, M. Carme, 155-78. Oxford: Oxford University Press. DOI: 10.1093/acprof:oso/9780198702894.003.0008.Google Scholar
Boersma, Paul. 1997. How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences 21. 4358. Online: http://www.fon.hum.uva.nl/paul/papers/learningVariation.pdf.Google Scholar
Boersma, Paul. 2003. Review of Tesar & Smolensky 2000. Phonology 20(3). 436-46. DOI: 10.1017/S0952675704230111.Google Scholar
Boersma, Paul, and Hayes, Bruce. 2001. Empirical tests of the gradual learning algorithm. Linguistic Inquiry 32(1). 4586. DOI: 10.1162/002438901554586.CrossRefGoogle Scholar
Boersma, Paul, and Pater, Joe. 2016. Convergence properties of a gradual learner in harmonic grammar. In McCarthy & Pater, 389434.Google Scholar
Boersma, Paul, and van Leussen, Jan-Willem. 2017. Efficient evaluation and learning in multilevel parallel constraint grammars. Linguistic Inquiry 48(3). 349-88. DOI: 10.1162/ling_a_00247.CrossRefGoogle Scholar
Bond, Oliver, Corbett, Greville G., Chumakina, Marina; and Brown, Dunstan (eds.) 2016. Archi: Complexities of agreement in cross-theoretical perspective. Oxford: Oxford University Press. DOI: 10.1093/acprof:oso/9780198747291.001.0001.CrossRefGoogle Scholar
Bowman, Samuel R., Angeli, Gabor, Potts, Christopher; and Manning, Christopher D.. 2015. A large annotated corpus for learning natural language inference. Proceedings of the 2015 conference on Empirical Methods in Natural Language Processing, 632-42. Online: http://aclweb.org/anthology/D/D15/D15-1075.pdf.CrossRefGoogle Scholar
Bowman, Samuel R., Manning, Christopher D.; and Potts, Christopher. 2015. Tree-structured composition in neural networks without tree-structured architectures. Proceedings of the 2015 International Conference on Cognitive Computation: Integrating Neural and Symbolic Approaches, 3742. Online: http://ceur-ws.org/Vol-1583/CoCoNIPS_2015_paper_5.pdf.Google Scholar
Bresnan, Joan. 2001. Lexical-functional syntax. Oxford: Blackwell.Google Scholar
Bybee, Joan. 1988. Morphology as lexical organization. Theoretical morphology: Approaches in modern linguistics, ed. by Hammond, Michael and Noonan, Michael, 119-41. San Diego: Academic Press.Google Scholar
Bybee, Joan, and McClelland, James L.. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22. 381410. DOI: 10.1515/tlir.2005.22.2-4.381.CrossRefGoogle Scholar
Cedergren, Henrietta J., and Sankoff, David. 1974. Variable rules: Performance as a statistical reflection of competence. Language 50(2). 333-55. DOI: 10.2307/412441.CrossRefGoogle Scholar
Chalmers, David J. 1990. Syntactic transformations on distributed representations. Connection Science 2(1–2). 5362. DOI: 10.1080/09540099008915662.CrossRefGoogle Scholar
Cherry, Colin. 1957. On human communication: A review, a survey, and a criticism. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton.CrossRefGoogle Scholar
Chomsky, Noam. 1959. Review of Skinner 1957. Language 35(1). 2658. DOI: 10.2307/411334.CrossRefGoogle Scholar
Chomsky, Noam. 1964. Current issues in linguistic theory. The Hague: Mouton.Google Scholar
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.Google Scholar
Chomsky, Noam. 1975. Reflections on language. New York: Pantheon Books.Google Scholar
Chomsky, Noam. 1980. On cognitive structures and their development. Language and learning: The debate between Jean Piaget and Noam Chomsky, ed. by Piatelli-Palmarini, Massimo, 3654. London: Routledge and Kegan.Google Scholar
Chomsky, Noam, Gallego, Ángel J.; and Ott, Dennis. 2017. Generative grammar and the faculty of language: Insights, questions, and challenges. Catalan Journal of Linguistics, to appear. Manuscript version Online: https://ling.auf.net/lingbuzz/003507.Google Scholar
Chomsky, Noam, and Halle, Morris A.. 1968. The sound pattern of English. Cambridge, MA: MIT Press.Google Scholar
Christiansen, Morten H., and Chater, Nick (eds.) 2001. Connectionist psycholinguistics. Westport, CT: Ablex.Google ScholarPubMed
Coetzee, Andries W. 2008. Grammaticality and ungrammaticality in phonology. Language 84(2). 218-57. DOI: 10.1353/lan.0.0000.CrossRefGoogle Scholar
Coetzee, Andries W., and Pater, Joe. 2011. The place of variation in phonological theory. The handbook of phonological theory, 2nd edn., ed. by Goldsmith, John, Riggle, Jason, and Yu, Alan C. L., 401-31. Oxford: Wiley-Blackwell.Google Scholar
Doucette, Amanda. 2017. Inherent biases of recurrent neural networks for phonological assimilation and dissimilation. Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2017), 3540. Online: http://www.aclweb.org/anthology/W17-0705.Google Scholar
Doumas, Leonidas A. A., Puebla, Guillermo; and Martin, Andrea E.. 2017. How we learn things we don't know already: A theory of learning structured representations from experience. Edinburgh: University of Edinburgh, ms. bioRxiv 198804 (preprint). DOI: 10.1101/198804.CrossRefGoogle Scholar
Dresher, B. Elan. 1981. On the learnability of abstract phonology. The logical problem of language acquisition, ed. by Baker, C. L. and McCarthy, John J., 188210. Cambridge, MA: MIT Press.Google Scholar
Dresher, B. Elan. 1990. Review of Halle & Vergnaud 1987. Phonology 7. 171-88. DOI: 10.1017/S0952675700001160.Google Scholar
Dresher, B. Elan. 1999. Charting the learning path: Cues to parameter setting. Linguistic Inquiry 30(1). 2767. DOI: 10.1162/002438999553959.CrossRefGoogle Scholar
Dyer, Chris, Kuncoro, Adhiguna, Ballesteros, Miguel; and Smith, Noah A.. 2016. Recurrent neural network grammars. Proceedings of NAACL-HLT 2016, 199209. Online: https://www.aclweb.org/anthology/N16-1024.Google Scholar
Edelman, Shimon. 2017. Language and other complex behaviors: Unifying characteristics, computational models, neural mechanisms. Language Sciences 62. 91123. DOI: 10.1016/j.langsci.2017.04.003.CrossRefGoogle Scholar
Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14(2). 179211. DOI: 10.1207/s15516709cog1402_1.CrossRefGoogle Scholar
Elman, Jeffrey L. 1991. Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7. 195225. DOI: 10.1023/A:1022699029236.CrossRefGoogle Scholar
Elman, Jeffrey L., Bates, Elizabeth A., Johnson, Mark H., Karmiloff-Smith, Annette, Parisi, Domenico; and Plunkett, Kim. 1996. Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Embick, David, and Marantz, Alec. 2005. Cognitive neuroscience and the English past tense: Comments on the paper by Ullman et al. Brain and Language 93(2). 243-47. DOI: 10.1016/j.bandl.2004.10.003.CrossRefGoogle ScholarPubMed
Ettinger, Allyson, Rao, Sudha, Hal, Daumé III; and Bender, Emily M.. 2017. Towards linguistically generalizable NLP systems: A workshop and shared task. Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, 110. Online: http://aclweb.org/anthology/W17-5401.Google Scholar
Feldman, Yishai A. 1992. Finite-state machines. Encyclopedia of computer science and technology, vol. 25, ed. by Kent, Allen and Williams, James G., 73104. New York: Marcel Dekker.Google Scholar
Fitz, Hartmut, and Chang, Franklin. 2017. Meaningful questions: The acquisition of auxiliary inversion in a connectionist model of sentence production. Cognition 166. 225-50. DOI: 10.1016/j.cognition.2017.05.008.CrossRefGoogle Scholar
Frank, Robert, and Mathis, Donald. 2007. Transformational networks. Proceedings of the 3rd Workshop on Psychocomputational Models of Human Language Acquisition. Online: http://blogs.umass.edu/brain-wars/files/2017/06/cogsci-2007.pdf.Google Scholar
Frank, Robert, Mathis, Donald; and Badecker, William. 2013. The acquisition of anaphora by simple recurrent networks. Language Acquisition 20(3). 181227. DOI: 10.1080/10489223.2013.796950.CrossRefGoogle Scholar
Gasser, Michael, and Lee, Chan-Do. 1990. Networks and morphophonemic rules revisited. Technical report 307. Bloomington: Indiana University, Computer Science Department. Online: https://www.cs.indiana.edu/pub/techreports/TR307.pdf.Google Scholar
Gazdar, Gerald, Klein, Ewen, Pullum, Geoffrey K.; and Sag, Ivan A.. 1985. Generalized phrase structure grammar. Cambridge, MA: Harvard University Press.Google Scholar
Gibson, Edward, and Wexler, Kenneth. 1994. Triggers. Linguistic Inquiry 25(3). 407-54. Online: https://www.jstor.org/stable/4178869.Google Scholar
Goldberg, Yoav. 2012. Do baboons really care about letter-pairs? Monkey-reading, predictive patterns and machine learning. New York, ms. Online: https://www.cs.bgu.ac.il/~yoavg/uni/bloglike/baboons.html.Google Scholar
Goldberg, Yoav. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57. 345420. DOI: 10.1613/jair.4992.CrossRefGoogle Scholar
Goldberg, Yoav. 2017. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies 10(1). 1309. DOI: 10.2200/S00762ED1V01Y201703HLT037.CrossRefGoogle Scholar
Goldsmith, John. 1993. Harmonic phonology. The last phonological rule: Reflections on constraints and derivations, ed. by Goldsmith, John, 2160. Chicago: University of Chicago Press.Google Scholar
Goldwater, Sharon J., and Johnson, Mark. 2003. Learning OT constraint rankings using a maximum entropy model. Proceedings of the Stockholm Workshop on Variation within Optimality Theory, ed. by Spenader, Jennifer, Eriksson, Anders, and Dahl, Östen, 111-20.Google Scholar
Gould, Isaac. 2015. Syntactic learning from ambiguous evidence: Errors and end-states. Cambridge, MA: MIT Press.Google Scholar
Grainger, Jonathan, Dufau, Stéphane, Montant, Marie, Ziegler, Johannes C.; and Fagot, Joël. 2012. Orthographic processing in baboons (Papio papio). Science 336 (6078).245-48. DOI: 10.1126/science.1218152.CrossRefGoogle ScholarPubMed
Gupta, Prahlad, and Touretzky, David S.. 1994. Connectionist models and linguistic theory: Investigations of stress systems in language. Cognitive Science 18. 150. DOI: 10.1016/0364-0213(94)90019-1.CrossRefGoogle Scholar
Halle, Morris, and Mohanan, K. P.. 1985. Segmental phonology of Modern English. Linguistic Inquiry 16. 57116. Online: https://www.jstor.org/stable/4178420.Google Scholar
Halle, Morris, and Vergnaud, Jean-Roger. 1987. An essay on stress. Cambridge, MA: MIT Press.Google Scholar
Hare, Mary, Corina, David; and Cottrell, Garrison. 1989. A connectionist perspective on prosodic structure. Berkeley Linguistics Society 15. 114-25. DOI: 10.3765/bls.v15i0.1732.Google Scholar
Harris, Zellig. 1951. Methods in structural linguistics. Chicago: University of Chicago Press.Google Scholar
Hayes, Bruce. 1980. A metrical theory of stress rules. Cambridge, MA: MIT Press.Google Scholar
Hayes, Bruce, Tesar, Bruce; and Zuraw, Kie. 2013. OTSoft. Los Angeles: University of California, Los Angeles. Online: https://linguistics.ucla.edu/people/hayes/otsoft/.Google Scholar
Hayes, Bruce, and Wilson, Colin. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39(3). 379440. DOI: 10.1162/ling.2008.39.3.379.CrossRefGoogle Scholar
Hebb, D. O. 1949. The organization of behavior: A neuropsychological theory. New York: John Wiley and Sons.Google Scholar
Heinz, Jeffrey, and Idsardi, William. 2011. Sentence and word complexity. Science 333 (6040).295-97. DOI: 10.1126/science.1210358.CrossRefGoogle ScholarPubMed
Heinz, Jeffrey, and Idsardi, William. 2013. What complexity differences reveal about domains in language. Topics in Cognitive Science 5(1). 111-31. DOI: 10.1111/tops.12000.CrossRefGoogle ScholarPubMed
Hochreiter, Sepp, and Schmidhuber, Jürgen. 1997. Long short-term memory. Neural Computation 9(8). 1735-80. DOI: 10.1162/neco.1997.9.8.1735.CrossRefGoogle ScholarPubMed
Hummel, John E. 2010. Symbolic versus associative learning. Cognitive Science 34(6). 958-95. DOI: 10.1111/j.1551-6709.2010.01096.x.CrossRefGoogle ScholarPubMed
Hunter, Tim, and Dyer, Chris. 2013. Distributions on minimalist grammar derivations. Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), 111. Online: http://www.aclweb.org/anthology/W13-3001.Google Scholar
Jackendoff, Ray. 1975. Morphological and semantic regularities in the lexicon. Language 51(3). 639-71. DOI: 10.2307/412891.CrossRefGoogle Scholar
Jäger, Gerhard. 2007. Maximum entropy models and stochastic optimality theory. Architectures, rules, and preferences: A festschrift for Joan Bresnan, ed. by Grimshaw, Jane, Maling, Joan, Manning, Chris, Simpson, Jane, and Zaenen, Annie, 467-79. Stanford, CA: CSLI Publications.Google Scholar
Jäger, Gerhard, and Rogers, James. 2012. Formal language theory: Refining the Chomsky hierarchy. Philosophical Transactions of the Royal Society B: Biological Sciences 367(1598). 1956-70. DOI: 10.1098/rstb.2012.0077.CrossRefGoogle ScholarPubMed
Jarosz, Gaja. 2010. Implicational markedness and frequency in constraint-based computational models of phonological learning. Journal of Child Language 37(3). 565606. DOI: 10.1017/S0305000910000103.CrossRefGoogle ScholarPubMed
Jarosz, Gaja. 2013. Learning with hidden structure in optimality theory and harmonic grammar: Beyond robust interpretive parsing. Phonology 30(1). 2771. DOI: 10.1017/S0952675713000031.CrossRefGoogle Scholar
Jarosz, Gaja. 2015. Expectation driven learning of phonology. Amherst: University of Massachusetts Amherst, ms. Online: http://blogs.umass.edu/jarosz/2015/08/24/expectation-driven-learning-of-phonology/.Google Scholar
Jia, Robin, and Liang, Percy. 2017. Adversarial examples for evaluating reading comprehension systems. arXiv:1707.07328 [cs.CL]. Online: http://arxiv.org/abs/1707.07328.CrossRefGoogle Scholar
Johnson, Mark. 2013a. A gentle introduction to maximum entropy, log-linear, exponential, logistic, harmonic, Boltzmann, Markov random fields, conditional random fields, etc., models. Sydney: Macquarie University, ms. Online: http://web.science.mq.edu.au/~mjohnson/papers/Johnson12IntroMaxEnt.pdf.Google Scholar
Johnson, Mark. 2013b. Language acquisition as statistical inference. Sydney: Macquarie University, ms. Online: http://web.science.mq.edu.au/~mjohnson/papers/Johnson12ICLtalk.pdf.Google Scholar
Johnson, Mark, Pater, Joe, Staubs, Robert; and Dupoux, Emmanuel. 2015. Sign constraints on feature weights improve a joint model of word segmentation and phonology. Human Language Technologies: The 2015 annual conference of the North American Chapter of the ACL, 303-13. Online: http://aclweb.org/anthology/N/N15/N15-1034.pdf.CrossRefGoogle Scholar
Jordan, Michael. 2018. Artificial intelligence—The revolution hasn't happened yet. Medium, April 19, 2018. Online: https://link.medium.com/iO730w06CS.Google Scholar
Kager, René. 2005. Rhythmic licensing theory: An extended typology. Proceedings of the 3rd International Conference on Phonology, Seoul National University, 531.Google Scholar
Kaisse, Ellen M., and Shaw, Patricia A.. 1985. On the theory of lexical phonology. Phonology Yearbook 2. 130. DOI: 10.1017/S0952675700000361.CrossRefGoogle Scholar
Katz, Yarden. 2012. Noam Chomsky on where artificial intelligence went wrong: An extended conversation with the legendary linguist. The Atlantic, November 1, 2012. Online: https://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/.Google Scholar
Kenstowicz, Michael. 1994. Phonology in generative grammar. Malden, MA: Blackwell.Google Scholar
Keyser, Samuel J., Miller, George A.; and Walker, Edward. 1978. Cognitive Science, 1978: Report of the State of the Art Committee to the advisors of the Alfred P. Sloan Foundation. Online: http://www.cbi.umn.edu/hostedpublications/pdf/CognitiveScience1978_OCR.pdf.Google Scholar
Kiparsky, Paul. 1982. Lexical phonology and morphology. Linguistics in the morning calm, ed. by Yang, Seok, 391. Seoul: Hanshin.Google Scholar
Kirov, Christo. 2017. Recurrent neural networks as a strong domain-general baseline for morpho-phonological learning. Poster presented at the annual meeting of the Linguistic Society of America, Austin. Online: https://ckirov.github.io/papers/lsa2017.pdf.Google Scholar
Kirov, Christo, and Cotterell, Ryan. 2019. Recurrent neural networks in linguistic theory: Revisiting Pinker and Prince (1988) and the past tense debate. Transactions of the Association for Computational Linguistics, to appear.Google Scholar
Kleene, Stephen C. 1956. Representation of events in nerve nets and finite automata. Automata studies (Annals of mathematic studies 34), ed. by Shannon, Claude E. and McCarthy, John, 341. Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Lachter, Joel, and Bever, Thomas G.. 1988. The relation between linguistic structure and associative theories of language learning—A constructive critique of some connectionist learning models. Cognition 28. 195247. DOI: 10.1016/0010-0277(88)90033-9.CrossRefGoogle ScholarPubMed
Lakoff, George. 1988. A suggestion for a linguistics with connectionist foundations. Proceedings of the 1988 Connectionist Models Summer School, ed. by Touretzky, David, Hinton, Geoffrey, and Sejnowski, Terrance, 301-14. Online: http://www.escholarship.org/uc/item/5df11196.Google Scholar
Lakoff, George. 1993. Cognitive phonology. The last phonological rule: Reflections on constraints and derivations, ed. by Goldsmith, John, 117-45. Chicago: University of Chicago Press.Google Scholar
Lau, Jey Han, Clark, Alexander; and Lappin, Shalom. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science 41(5). 1202-41. DOI: 10.1111/cogs.12414.CrossRefGoogle ScholarPubMed
LeCun, Yann. 1988. A theoretical framework for back-propagation. Proceedings of the 1988 Connectionist Models Summer School, ed. by Touretzky, David, Hinton, Geoffrey, and Sejnowski, Terrance, 2128.Google Scholar
Legendre, Géraldine, Miyata, Yoshiro; and Smolensky, Paul. 1990. Can connectionism contribute to syntax? Harmonic grammar, with an application. Chicago Linguistic Society 26. 237-52.Google Scholar
Levy, Omer, Lee, Kenton, FitzGerald, Nicholas; and Zettlemoyer, Luke. 2018. Long short-term memory as a dynamically computed element-wise weighted sum. arXiv: 1805.03716 [cs.CL]. Online: https://arxiv.org/abs/1805.03716v1.CrossRefGoogle Scholar
Lewis, John D., and Elman, Jeffrey L.. 2001. Learnability and the statistical structure of language: Poverty of stimulus arguments revisited. Proceedings of the Boston University Conference on Language Development (BUCLD) 26. 359-70.Google Scholar
Liberman, Mark. 2004. The curious case of quasiregularity. Language Log, January 15, 2004. Online: http://itre.cis.upenn.edu/~myl/languagelog/archives/000344.html.Google Scholar
Lidz, Jeffrey, Snyder, William; and Pater, Joe (eds.) 2016. The Oxford handbook of developmental linguistics. Oxford: Oxford University Press.CrossRefGoogle Scholar
Linzen, Tal, Dupoux, Emmanuel; and Goldberg, Yoav. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4. 521-35. Online: http://aclweb.org/anthology/Q/Q16/Q16-1037.pdf.CrossRefGoogle Scholar
Malouf, Robert. 2017. Abstractive morphological learning with a recurrent neural network. Morphology 27(4). 431-58. Online: 10.1007/s11525-017-9307-x.CrossRefGoogle Scholar
Marcus, Gary F. 2001. The algebraic mind: Integrating connectionism and cognitive science. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
McCarthy, John J. 2002. A thematic guide to optimality theory. (Research surveys in linguistics.) Cambridge: Cambridge University Press. DOI: 10.1017/CBO9780511613333.Google Scholar
McCarthy, John J., and Pater, Joe (eds.) 2016. Harmonic grammar and harmonic serialism. Bristol, CT: Equinox.Google Scholar
McCarthy, John J., Pater, Joe; and Pruitt, Kathryn. 2016. Cross-level interactions in harmonic serialism. In McCarthy & Pater, 87138.Google Scholar
McClelland, James L., and Patterson, Karalyn. 2002. Rules or connections in past-tense inflections: What does the evidence rule out? Trends in Cognitive Sciences 6(11). 465-72. DOI: 10.1016/S1364-6613(02)01993-9.CrossRefGoogle ScholarPubMed
McCoy, R. Thomas, Frank, Robert; and Linzen, Tal. 2018. Revisiting the poverty of the stimulus: Hierarchical generalization without a hierarchical bias in recurrent neural networks. Proceedings of the 40th annual meeting of the Cognitive Science Society (CogSci 2018), 20962101. Online: http://mindmodeling.org/cogsci2018/papers/0399/index.html.Google Scholar
McCulloch, Warren S., and Pitts, Walter. 1943. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5(4). 115-33. DOI: 10.1007/BF02478259.CrossRefGoogle Scholar
Miller, George A. 2003. The cognitive revolution: A historical perspective. Trends in Cognitive Sciences 7(3). 141-44. DOI: 10.1016/S1364-6613(03)00029-9.CrossRefGoogle ScholarPubMed
Minsky, Marvin, and Papert, Seymour. 1988 [1969]. Perceptrons: An introduction to computational geometry. Expanded edn. Cambridge, MA: MIT Press.Google Scholar
Moreton, Elliott, Pater, Joe; and Pertsova, Katya. 2015. Phonological concept learning. Cognitive Science 41(1). 469. DOI: 10.1111/cogs.12319.CrossRefGoogle ScholarPubMed
Nagy, George. 1991. Neural networks—then and now. IEEE Transactions on Neural Networks 2(2). 316-18. Online: https://ieeexplore.ieee.org/iel4/72/2637/00080343.pdf.CrossRefGoogle ScholarPubMed
Nazarov, Aleksei, and Jarosz, Gaja. 2017. Learning parametric stress without domain-specific mechanisms. Proceedings of the 2016 Annual Meeting on Phonology. DOI: 10.3765/amp.v4i0.4010.CrossRefGoogle Scholar
Nazarov, Aleksei, and Pater, Joe. 2017. Learning opacity in stratal maximum entropy grammar. Phonology 34(2). 299324. DOI: 10.1017/S095267571700015X.CrossRefGoogle Scholar
Neelakantan, Arvind, Vilnis, Luke, Le, Quoc V., Sutskever, Ilya, Kaiser, Lukasz, Kurach, Karol; and Martens, James. 2015. Adding gradient noise improves learning for very deep networks. arXiv:1511.06807 [stat.ML]. Online: https://arxiv.org/abs/1511.06807v1.Google Scholar
Newell, Ben R., Dunn, John C.; and Kalish, Michael. 2011. Systems of category learning: Fact or fantasy? Psychology of learning and motivation, vol. 54: Advances in research and theory, ed. by Ross, Brian H., 167215. San Diego: Academic Press. DOI: 10.1016/B978-0-12-385527-5.00006-1.Google Scholar
Nilsson, Nils J. 2010. The quest for artificial intelligence: A history of ideas and achievements. Cambridge: Cambridge University Press.Google Scholar
Norvig, Peter. n.d. On Chomsky and the two cultures of statistical learning. Online: http://norvig.com/chomsky.html, accessed September 14, 2017.CrossRefGoogle Scholar
Novikoff, Albert B. J. 1962. On convergence proofs for perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12, 615-22. Brooklyn: Polytechnic Institute of Brooklyn.Google Scholar
Olazaran, Mikel. 1993. A sociological history of the neural network controversy. Advances in Computers 37. 335425. DOI: 10.1016/S0065-2458(08)60408-8.CrossRefGoogle Scholar
Olazaran, Mikel. 1996. A sociological study of the official history of the perceptrons controversy. Social Studies of Science 26(3). 611-59. DOI: 10.1177/030631296026003005.CrossRefGoogle Scholar
Palangi, Hamid, Smolensky, Paul, He, Xiaodong; and Deng, Li. 2017. Question-answering with grammatically-interpretable representations. arXiv:1705.08432 [cs.CL]. Online: https://arxiv.org/abs/1705.08432.CrossRefGoogle Scholar
Pater, Joe. 2008. Gradual learning and convergence. Linguistic Inquiry 39(2). 334-45. DOI: 10.1162/ling.2008.39.2.334.CrossRefGoogle Scholar
Pater, Joe. 2016a. Universal grammar with weighted constraints. In McCarthy & Pater, 146.Google Scholar
Pater, Joe. 2016b. Prince vs. Smolensky. Brain Wars, August 24, 2017. Online: http://blogs.umass.edu/brain-wars/the-debates/prince-vs-smolensky/.Google Scholar
Pater, Joe. 2017. Did Frank Rosenblatt invent deep learning in 1962? UMass Amherst Computational Phonology Lab Blog, June 15, 2017. Online: http://blogs.umass.edu/comphon/2017/06/15/did-frank-rosenblatt-invent-deep-learning-in-1962.Google Scholar
Pater, Joe (ed.) 2018. Perceptrons and Syntactic structures at 60: Collected presentations from the 2018 workshop. YouTube video playlist. Online: https://www.youtube.com/playlist?list=PL9UURLQttNX2Lfs0EoOlIa4ns0bhra8_Y.Google Scholar
Pater, Joe, and Moreton, Elliott. 2012. Structurally biased phonology: Complexity in learning and typology. The EFL Journal (The Journal of the English and Foreign Languages University, Hyderabad) 3(2). 144.Google Scholar
Pater, Joe, and Staubs, Robert. 2013. Modeling learning trajectories with batch gradient descent. Cambridge, MA: MIT, ms. Online: http://people.umass.edu/pater/pater-staubs-gradient-descent-2013.pdf.Google Scholar
Pater, Joe, Staubs, Robert, Jesney, Karen; and Smith, Brian. 2012. Learning probabilities over underlying representations. Proceedings of the twelfth meeting of the Special Interest Group on Computational Morphology and Phonology (SIGMORPHON2012), 6271. Online: http://aclweb.org/anthology/W/W12/W12-2308.pdf.Google Scholar
Pearl, Lisa, and Goldwater, Sharon. 2016. Statistical learning, inductive bias, and Bayesian inference in language acquisition. In Lidz et al., 664-95. DOI: 10.1093/oxfordhb/9780199601264.013.28.CrossRefGoogle Scholar
Pereira, Fernando. 2000. Formal grammar and information theory: Together again? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 358. 1239-53. DOI: 10.1098/rsta.2000.0583.CrossRefGoogle Scholar
Pinker, Steven. 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.Google Scholar
Pinker, Steven. 1999. Words and rules: The ingredients of language. New York: William Morrow.Google Scholar
Pinker, Steven, and Prince, Alan. 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28(1–2). 73193. DOI: 10.1016/0010-0277(88)90032-7.CrossRefGoogle ScholarPubMed
Pinker, Steven, and Ullman, Michael T.. 2002. The past and future of the past tense. Trends in Cognitive Sciences 6(11). 456-63. DOI: 10.1016/S1364-6613(02)01990-3.CrossRefGoogle ScholarPubMed
Plunkett, Kim, and Marchman, Virginia. 1993. From rote learning to system building: Acquiring verb morphology in children and connectionist nets. Cognition 48(1). 2169. DOI: 10.1016/0010-0277(93)90057-3.CrossRefGoogle ScholarPubMed
Pollard, Carl, and Sag, Ivan A.. 1994. Head-driven phrase structure grammar. Chicago: University of Chicago Press.Google Scholar
Potts, Christopher, Pater, Joe, Jesney, Karen, Bhatt, Rajesh; and Becker, Michael. 2010. Harmonic grammar with linear programming: From linear systems to linguistic typology. Phonology 27(1). 77117. DOI: 10.1017/S0952675710000047.CrossRefGoogle Scholar
Prickett, Brandon. 2017. Vanilla sequence-to-sequence neural nets cannot model reduplication. University of Massachusetts Open Working Papers in Linguistics. DOI: 10.7275/R5N877Z9.CrossRefGoogle Scholar
Prickett, Brandon, Traylor, Aaron; and Pater, Joe. 2018. Seq2Seq models with dropout can learn generalizable reduplication. Proceedings of the 15th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 93100. Online: http://aclweb.org/anthology/W18-5810.Google Scholar
Prince, Alan, and Smolensky, Paul. 2004. Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell. [Originally circulated in 1993.].CrossRefGoogle Scholar
Prince, Alan, Tesar, Bruce; and Merchant, Nazarré. 2015. OTWorkplace. New Brunswick, NJ: Rutgers University. Online: https://sites.google.com/site/otworkplace/.Google Scholar
Pullum, Geoffrey K. 2011. On the mathematical foundations of Syntactic structures. Journal of Logic, Language and Information 20(3). 277-96. DOI: 10.1007/s10849-011-9139-8.CrossRefGoogle Scholar
Pullum, Geoffrey K., and Scholz, Barbara C.. 2002. Empirical assessment of stimulus poverty arguments. The Linguistics Review 19. 950. DOI: 10.1515/tlir.19.1-2.9.Google Scholar
Roeper, Thomas, and Williams, Edwin (eds.) 1987. Parameter setting. (Studies in theoretical psycholinguistics 4.) Dordrecht: Springer.Google Scholar
Rosenblatt, Frank. 1957. The perceptron: A perceiving and recognizing automaton (Project PARA). Report 85-460-1. Ithaca, NY: Cornell Aeronautical Laboratory.Google Scholar
Rosenblatt, Frank. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6). 386408. DOI: 10.1037/h0042519.CrossRefGoogle ScholarPubMed
Rosenblatt, Frank. 1962. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington, DC: Spartan Books.Google Scholar
Rosenblatt, Frank. 1964. Analytic techniques for the study of neural nets. IEEE Transactions on Applications and Industry 83(74). 285-92. DOI: 10.1109/TAI.1964.5407758.CrossRefGoogle Scholar
Rosenblatt, Frank. 1967. Recent work on theoretical models of biological memory. Computer and information sciences, vol. 2, ed. by Tou, J. T., 3356. New York: Academic Press.Google Scholar
Rumelhart, David E., Hinton, Geoffrey E.; and Williams, Ronald J.. 1986. Learning internal representations by error propagation. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1, ed. by Rumelhart, David E., McClelland, James L., and PDP, the Group, Research, 318-62. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Rumelhart, David E., and McClelland, James L.. 1986. On learning the past tenses of English verbs. Parallel distributed processing: Explorations in the microstructures of cognition, vol. 2, ed. by Rumelhart, David E., McClelland, James L., and PDP, the Group, Research, 216-71. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Sanz, Ricardo. 2008. Top 100 most influential works in cognitive science. UPM Autonomous Systems Laboratory. Online: http://tierra.aslab.upm.es/public/index.php?option=com_content&task=view&id=141.Google Scholar
Schmidhuber, Jürgen. 2015. Deep learning in neural networks: An overview. Neural Networks 61. 85117. DOI: 10.1016/j.neunet.2014.09.003.CrossRefGoogle ScholarPubMed
See, Abigail. 2017. Four deep learning trends from ACL 2017. Online: http://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-1.html.Google Scholar
Seidenberg, Mark S., and Plaut, David C.. 2014. Quasiregularity and its discontents: The legacy of the past tense debate. Cognitive Science 38(6). 11901228. DOI: 10.1111/cogs.12147.CrossRefGoogle ScholarPubMed
Selkirk, Elisabeth O. 1981. On the nature of phonological representation. Advances in psychology, vol. 7: The cognitive representation of speech, ed. by Myers, Terry, Laver, John, and Anderson, John, 379-88. Amsterdam: North-Holland. DOI: 10.1016/S0166-4115(08)60213-7.Google Scholar
Shannon, Claude E., and Weaver, Warren. 1949. The mathematical theory of information. Champaign-Urbana: University of Illinois Press.Google Scholar
Sharkey, Noel (ed.) 1992. Connectionist natural language processing: Readings in connection science. Dordrecht: Springer. DOI: 10.1007/978-94-011-2624-3.CrossRefGoogle Scholar
Skinner, B. F. 1957. Verbal behavior. New York: Appleton-Century-Crofts.CrossRefGoogle Scholar
Smolensky, Paul. 1986. Information processing in dynamical systems: Foundations of harmony theory. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1, ed. by Rumelhart, David E., McClelland, James L., and PDP, the Group, Research, 194281. Cambridge, MA: MIT Press.Google Scholar
Smolensky, Paul. 1988. On the proper treatment of connectionism. Behavioral and Brain Sciences 11(1). 123. DOI: 10.1017/S0140525X00052432.CrossRefGoogle Scholar
Smolensky, Paul, Goldrick, Matthew; and Mathis, Donald. 2014. Optimization and quantization in gradient symbol systems: A framework for integrating the continuous and the discrete in cognition. Cognitive Science 38(6). 1102-38. DOI: 10.1111/cogs.12047.CrossRefGoogle ScholarPubMed
Smolensky, Paul, and Legendre, Géraldine. 2006. The harmonic mind: From neural computation to optimality-theoretic grammar. Cambridge, MA: MIT Press.Google Scholar
Socher, Richard, Perelygin, Alex, Wu, Jean Y., Chuang, Jason, Manning, Christopher D., Ng, Andrew Y.; and Potts, Christopher. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing, 1631-42. Online: http://aclweb.org/anthology/D/D13/D13-1170.pdf.CrossRefGoogle Scholar
Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya; and Salakhutdinov, Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1). 1929-58. Online: http://jmlr.org/papers/v15/srivastava14a.html.Google Scholar
Stabler, Edward P. 2013. Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5(3). 611-33. DOI: 10.1111/tops.12031.CrossRefGoogle ScholarPubMed
Staubs, Robert, Becker, Michael, Potts, Christopher, Pratt, Patrick, McCarthy, John J.; and Pater, Joe. 2010. OT-Help. Amherst: University of Massachusetts. Online: http://people.umass.edu/othelp/.Google Scholar
Staubs, Robert, and Pater, Joe. 2016. Learning serial constraint-based grammars. In McCarthy & Pater, 369-88.Google Scholar
Sutskever, Ilya, Vinyals, Oriol; and Le, Quoc V.. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems (NIPS) 27. 3104-12. Online: https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.Google Scholar
Tesar, Bruce. 2004. Using inconsistency detection to overcome structural ambiguity. Linguistic Inquiry 35(2). 219-53. DOI: 10.1162/002438904323019057.CrossRefGoogle Scholar
Tesar, Bruce, and Smolensky, Paul. 2000. Learnability in optimality theory. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Touretzky, David, and Wheeler, Deirdre. 1991. Sequence manipulation using parallel mapping networks. Neural Computation 3(1). 98109.CrossRefGoogle ScholarPubMed
Tupper, Paul F., and Shahriari, Bobak. 2016. Which learning algorithms can generalize identity-based rules to novel inputs? arXiv:1605.04002 [cs.CL]. Online: http://arxiv.org/abs/1605.04002.Google Scholar
Ullman, Michael T., and Walenski, Matthew. 2005. Moving past the past tense. Brain and Language 93(2). 248-52. DOI: 10.1016/j.bandl.2004.10.004.CrossRefGoogle Scholar
Vaux, Bert, and Nevins, Andrew (eds.) 2008. Rules, constraints, and phonological phenomena. Oxford: Oxford University Press.CrossRefGoogle Scholar
Werbos, Paul J. 1982. Applications of advances in nonlinear sensitivity analysis. System modeling and optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31 – September 4, 1981 (Lecture notes in control and information sciences 38), ed. by Drenick, R. F. and Kozin, F., 762-70. Berlin: Springer. DOI: 10.1007/BFb0006203.CrossRefGoogle Scholar
Wheeler, Deirdre, and Touretzky, David. 1993. A connectionist implementation of cognitive phonology. The last phonological rule: Reflections on constraints and derivations, ed. by Goldsmith, John, 146-72. Chicago: University of Chicago Press.Google Scholar
Wickelgran, Wayne A. 1969. Context-sensitive coding, associative memory, and serial order in (speech) behavior. Psychological Review 76(1). 115. DOI: 10.1037/h0026823.CrossRefGoogle Scholar
Willer Gold, Jana, Arsenijević, Boban, Batinić, Mia, Becker, Michael, Čordalija, Nermina, Kresić, Marijana, Leko, Nedžad; et al. 2017. When linearity prevails over hierarchy in syntax. Proceedings of the National Academy of Sciences of the United States of America 115(3). 495500. DOI: 10.1073/pnas.1712729115.CrossRefGoogle ScholarPubMed
Wilson, Colin, and Gallagher, Gillian. 2016. Beyond bigrams for surface-based phonotactic models: A case study of South Bolivian Quechua. Paper presented at SIGMORPHON, Berlin, August 11, 2016. Online: https://colincwilson.github.io/papers/WilsonGallagher_sigmorphon2016.pdf.Google Scholar
Wu, Yonghui, Schuster, Mike, Chen, Zhifeng, Le, Quoc V., Norouzi, Mohammad, Macherey, Wolfgang, Krikun, Maxim; et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144 [cs.CL]. Online: http://arxiv.org/abs/1609.08144.Google Scholar
Xiang, Ming, Dillon, Brian; and Phillips, Colin. 2009. Illusory licensing effects across dependency types: ERP evidence. Brain and Language 108(1). 4055. DOI: 10.1016/j.bandl.2008.10.002.CrossRefGoogle ScholarPubMed
Yang, Charles. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press.Google Scholar
Yogatama, Dani, Blunsom, Phil, Dyer, Chris, Grefenstette, Edward; and Ling, Wang. 2016. Learning to compose words into sentences with reinforcement learning. arXiv:1611.09100 [cs.CL]. Online: http://arxiv.org/abs/1611.09100.Google Scholar
Yu, Kristine H. 2017. Advantages of constituency: Computational perspectives on Samoan word prosody. Formal Grammar (FG 2017). DOI: 10.1007/978-3-662-56343-4_7.CrossRefGoogle Scholar
Zuraw, Kie. 2010. A model of lexical variation and the grammar with application to Tagalog nasal substitution. Natural Language and Linguistic Theory 28(2). 417-72. DOI: 10.1007/s11049-010-9095-z.CrossRefGoogle Scholar