Hostname: page-component-5d59c44645-n6p7q Total loading time: 0 Render date: 2024-02-23T23:29:10.627Z Has data issue: false hasContentIssue false

Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing*

Published online by Cambridge University Press:  01 May 2013

Gaja Jarosz*
Yale University


This paper explores the relative merits of constraint ranking vs. weighting in the context of a major outstanding learnability problem in phonology: learning in the face of hidden structure. Specifically, the paper examines a well-known approach to the structural ambiguity problem, Robust Interpretive Parsing (RIP; Tesar & Smolensky 1998), focusing on its stochastic extension first described by Boersma (2003). Two related problems with the stochastic formulation of RIP are revealed, rooted in a failure to take full advantage of probabilistic information available in the learner's grammar. To address these problems, two novel parsing strategies are introduced and applied to learning algorithms for both probabilistic ranking and weighting. The novel parsing strategies yield significant improvements in performance, asymmetrically improving performance of OT learners. Once RIP is replaced with the proposed modifications, the apparent advantage of HG over OT learners reported in previous work disappears (Boersma & Pater 2008).

Research Article
Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)



This work has benefited from discussion with a number of colleagues, including Joe Pater, Paul Boersma, Paul Smolensky, Colin Wilson, Jason Riggle, John McCarthy, Bob Frank and Jeff Heinz. I have also received valuable comments on portions of this work presented to audiences at NECPhon, University of Massachusetts Amherst, Mayfest, the University of Delaware Workshop on Stress and Accent, and the Yale Computational Linguistics research group (CLAY). Finally, I would also like to thank three anonymous reviewers and the associate editor for very thorough and thoughtful comments on an earlier version of this paper.



Akers, Crystal (2011). Commitment-based learning of hidden linguistic structures. PhD dissertation, Rutgers University.Google Scholar
Alderete, John, Brasoveanu, Adrian, Merchant, Nazarré, Prince, Alan & Tesar, Bruce (2005). Contrast analysis aids the learning of phonological underlying forms. WCCFL 24. 3442.Google Scholar
Apoussidou, Diana (2006). On-line learning of underlying forms. Ms, University of Amsterdam. Available as ROA-835 from the Rutgers Optimality Archive.Google Scholar
Apoussidou, Diana (2007). The learnability of metrical phonology. PhD dissertation, University of Amsterdam.Google Scholar
Apoussidou, Diana & Boersma, Paul (2003). The learnability of Latin stress. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 25. 101148.Google Scholar
Bane, Max & Riggle, Jason (2009). The typological consequences of weighted constraints. CLS 45:1. 1327.Google Scholar
Bane, Max, Riggle, Jason & Sonderegger, Morgan (2010). The VC dimension of constraint-based grammars. Lingua 120. 11941208.CrossRefGoogle Scholar
Biró, Tamás (to appear). Towards a Robuster Interpretive Parsing: learning from overt forms in Optimality Theory. Journal of Logic, Language and Information.Google Scholar
Boersma, Paul (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21. 4358.Google Scholar
Boersma, Paul (2003). Review of Tesar & Smolensky (2000). Phonology 20. 436446.CrossRefGoogle Scholar
Boersma, Paul (2009). Some correct error-driven versions of the Constraint Demotion Algorithm. LI 40. 667686.Google Scholar
Boersma, Paul & Hayes, Bruce (2001). Empirical tests of the Gradual Learning Algorithm. LI 32. 4586.Google Scholar
Boersma, Paul & Levelt, Clara C. (2000). Gradual constraint-ranking learning algorithm predicts acquisition order. In Clark, Eve V. (ed.) Proceedings of the 30th Child Language Research Forum. Stanford: CSLI. 229237.Google Scholar
Boersma, Paul & Pater, Joe (2008). Convergence properties of a Gradual Learning Algorithm for Harmonic Grammar. Ms, University of Amsterdam & University of Massachusetts, Amherst. Available as ROA-970 from the Rutgers Optimality Archive. To appear in McCarthy, John J. (ed.) Harmonic grammar and harmonic serialism. London: Equinox.Google Scholar
Chomsky, Noam (1981). Lectures on government and binding. Dordrecht: Foris.Google Scholar
Coetzee, Andries W. & Pater, Joe (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. NLLT 26. 289337.Google Scholar
Coetzee, Andries W. & Pater, Joe (2011). The place of variation in phonological theory. In Goldsmith, John, Riggle, Jason & Yu, Alan (eds.) The handbook of phonological theory. 2nd edn.Malden, Mass. & Oxford: Wiley-Blackwell. 401431.CrossRefGoogle Scholar
Daelemans, Walter, Gillis, Steven & Durieux, Gert (1994). The acquisition of stress: a data-oriented approach. Computational Linguistics 20. 421451.Google Scholar
Daland, Robert, Hayes, Bruce, White, James, Garellek, Marc, Davis, Andrea & Norrmann, Ingrid (2011). Explaining sonority projection effects. Phonology 28. 197234.CrossRefGoogle Scholar
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39. 138.CrossRefGoogle Scholar
Dresher, B. Elan (1999). Charting the learning path: cues to parameter setting. LI 30. 2767.Google Scholar
Dresher, B. Elan & Kaye, Jonathan D. (1990). A computational learning model for metrical phonology. Cognition 34. 137195.CrossRefGoogle ScholarPubMed
Fischer, Marcus (2005). A Robbins-Monro type learning algorithm for an entropy maximizing version of Stochastic Optimality Theory. MA thesis, Humboldt University, Berlin.Google Scholar
Goldrick, Matthew (2011). Linking speech errors and generative phonological theory. Language and Linguistics Compass 5. 397412.CrossRefGoogle Scholar
Goldsmith, John A. (1994). A dynamic computational theory of accent systems. In Cole, Jennifer & Kisseberth, Charles (eds.) Perspectives in phonology. Stanford: CSLI. 128.Google Scholar
Goldwater, Sharon & Johnson, Mark (2003). Learning OT constraint rankings using a Maximum Entropy model. In Spenador, Jennifer, Eriksson, Anders & Dahl, Östen (eds.) Proceedings of the Stockholm Workshop on Variation within Optimality Theory. Stockholm: Stockholm University. 111120.Google Scholar
Gordon, Matthew (2002). A factorial typology of quantity-insensitive stress. NLLT 20. 491552.Google Scholar
Gupta, Prahlad & Touretzky, David S. (1994). Connectionist models and linguistic theory: investigations of stress systems in language. Cognitive Science 18. 150.CrossRefGoogle Scholar
Hammond, Michael (2004). Gradience, phonotactics, and the lexicon in English phonology. International Journal of English Studies 4. 124.Google Scholar
Hayes, Bruce (1995). Metrical stress theory: principles and case studies. Chicago: University of Chicago Press.Google Scholar
Hayes, Bruce (2004). Phonological Acquisition in Optimality Theory: the early stages. In Kager, René, Pater, Joe & Zonneveld, Wim (eds.) Constraints in phonological acquisition. Cambridge: Cambridge University Press. 158203.CrossRefGoogle Scholar
Hayes, Bruce & Londe, Zsuzsa Cziráky (2006). Stochastic phonological knowledge: the case of Hungarian vowel harmony. Phonology 23. 59104.CrossRefGoogle Scholar
Hayes, Bruce & Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic learning. LI 39. 379440.Google Scholar
Hayes, Bruce, Zuraw, Kie, Siptár, Péter & Londe, Zsuzsa (2009). Natural and unnatural constraints in Hungarian vowel harmony. Lg 85. 822863.Google Scholar
Heinz, Jeffrey (2009). On the role of locality in learning stress patterns. Phonology 26. 303351.CrossRefGoogle Scholar
Hulst, Harry van der, Goedemans, Rob & van Zanten, Ellen (eds.) (2010). A survey of word accentual patterns in the languages of the world. Berlin & New York: De Gruyter Mouton.CrossRefGoogle Scholar
Hyde, Brett (2007). Non-finality and weight-sensitivity. Phonology 24. 287334.CrossRefGoogle Scholar
Jäger, Gerhard (2007). Maximum entropy models and Stochastic Optimality Theory. In Zaenen, Annie, Simpson, Jane, King, Tracy Holloway, Grimshaw, Jane, Maling, Joan & Manning, Chris (eds.) Architectures, rules, and preferences: variations on themes by Joan W. Bresnan. Stanford: CSLI. 467479.Google Scholar
Jäger, Gerhard & Rosenbach, Anette (2006). The winner takes it all – almost: cumulativity in grammatical variation. Linguistics 44. 937971.CrossRefGoogle Scholar
Jarosz, Gaja (2006a). Rich lexicons and restrictive grammars: maximum likelihood learning in Optimality Theory. PhD dissertation, Johns Hopkins University.Google Scholar
Jarosz, Gaja (2006b). Richness of the Base and probabilistic unsupervised learning in Optimality Theory. In Wicentowski, Richard & Kondark, Grzegorz (eds.) Proceedings of the 8th Meeting of the ACL Special Interest Group in Computational Phonology. New York: Association for Computational Linguistics. 5059.Google Scholar
Jarosz, Gaja (2010). Implicational markedness and frequency in constraint-based computational models of phonological learning. Journal of Child Language 37. 565606.CrossRefGoogle ScholarPubMed
Jarosz, Gaja (to appear). Naive parameter learning for Optimality Theory: the hidden structure problem. NELS 40.Google Scholar
Jesney, Karen & Tessier, Anne-Michelle (2011). Biases in Harmonic Grammar: the road to restrictive learning. NLLT 29. 251290.Google Scholar
Johnson, Mark (2002). Optimality-theoretic Lexical Functional Grammar. In Merlo, Paola & Stevenson, Suzanne (eds.) The lexical basis of sentence processing: formal, computational and experimental issues. Amsterdam & Philadelphia: Benjamins. 5973.CrossRefGoogle Scholar
Keller, Frank (2000). Gradience in grammar: experimental and computational aspects of degrees of grammaticality. PhD dissertation, University of Edinburgh.Google Scholar
Keller, Frank & Asudeh, Ash (2002). Probabilistic learning algorithms and Optimality Theory. LI 33. 225244.Google Scholar
Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1990). Can connectionism contribute to syntax? Harmonic Grammar, with an application. CLS 26:1. 237252.Google Scholar
Legendre, Géraldine, Sorace, Antonella & Smolensky, Paul (2006). The Optimality Theory–Harmonic Grammar connection. In Smolensky, & Legendre, (2006: vol. 2). 339402.Google Scholar
Liberman, Mark & Prince, Alan (1977). On stress and linguistic rhythm. LI 8. 249336.Google Scholar
McCarthy, John J. (2003). OT constraints are categorical. Phonology 20. 75138.CrossRefGoogle Scholar
McCarthy, John J. & Prince, Alan (1993). Generalized alignment. Yearbook of Morphology 1993. 79153.CrossRefGoogle Scholar
Magri, Giorgio (2012). Convergence of error-driven ranking algorithms. Phonology 29. 213269.CrossRefGoogle Scholar
Martin, Andrew (2011). Grammars leak: modeling how phonotactic generalizations interact within the grammar. Lg 87. 751770.Google Scholar
Merchant, Nazarré (2008). Discovering underlying forms: contrast pairs and ranking. PhD dissertation, Rutgers University.Google Scholar
Merchant, Nazarré & Tesar, Bruce (2008). Learning underlying forms by searching restricted lexical subspaces. CLS 41:2. 3347.Google Scholar
Pater, Joe (2008). Gradual learning and convergence. LI 39. 334345.Google Scholar
Pater, Joe (2009a). Review of Smolensky & Legendre (2006). Phonology 26. 217226.CrossRefGoogle Scholar
Pater, Joe (2009b). Weighted constraints in generative linguistics. Cognitive Science 33. 9991035.CrossRefGoogle ScholarPubMed
Pater, Joe (to appear). Canadian raising with language-specific weighted constraints. Lg.Google Scholar
Pearl, Lisa S. (2011). When unbiased probabilistic learning is not enough: acquiring a parametric system of metrical phonology. Language Acquisition 18. 87120.CrossRefGoogle Scholar
Potts, Christopher, Pater, Joe, Jesney, Karen, Bhatt, Rajesh & Becker, Michael (2010). Harmonic Grammar with linear programming: from linear systems to linguistic typology. Phonology 27. 77117.CrossRefGoogle Scholar
Prince, Alan (1990). Quantitative consequences of rhythmic organization. CLS 26:2. 355398.Google Scholar
Prince, Alan (2002). Entailed ranking arguments. Ms, Rutgers University. Available as ROA-500 from the Rutgers Optimality Archive.Google Scholar
Prince, Alan (2010). Counting parses. Ms, Rutgers University. Available as ROA-1097 from the Rutgers Optimality Archive.Google Scholar
Prince, Alan & Smolensky, Paul (2004). Optimality Theory: constraint interaction in generative grammar. Malden, Mass. & Oxford: Blackwell.CrossRefGoogle Scholar
Pruitt, Kathryn (2010). Serialism and locality in constraint-based metrical parsing. Phonology 27. 481526.CrossRefGoogle Scholar
Riggle, Jason (2009). The complexity of ranking hypotheses in Optimality Theory. Computational Linguistics 35. 4759.CrossRefGoogle Scholar
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65. 386408.CrossRefGoogle ScholarPubMed
Rubach, Jerzy & Booij, Geert E. (1985). A grid theory of stress in Polish. Lingua 66. 281319.CrossRefGoogle Scholar
Smolensky, Paul (1996). The initial state and ‘Richness of the Base’ in Optimality Theory. Ms, Johns Hopkins University. Available as ROA-154 from the Rutgers Optimality Archive.Google Scholar
Smolensky, Paul & Legendre, Géraldine (eds.) (2006). The harmonic mind: from neural computation to optimality-theoretic grammar. 2 vols. Cambridge, Mass.: MIT Press.Google Scholar
Soderstrom, Melanie, Mathis, Don & Smolensky, Paul (2006). Abstract genomic encoding of Universal Grammar in Optimality Theory. In Smolensky, & Legendre, (2006: vol. 2). 403471.Google Scholar
Tesar, Bruce (1995). Computational Optimality Theory. PhD dissertation, University of Colorado, Boulder.Google Scholar
Tesar, Bruce (1998). An iterative strategy for language learning. Lingua 104. 131145.CrossRefGoogle Scholar
Tesar, Bruce (2000). Using inconsistency detection to overcome structural ambiguity in language learning. Technical Report TR-58, Department of Computer Science, University of Colorado, Boulder. Available as ROA-426 from the Rutgers Optimality Archive.Google Scholar
Tesar, Bruce (2004). Using inconsistency detection to overcome structural ambiguity. LI 35. 219253.Google Scholar
Tesar, Bruce (2006). Faithful contrastive features in learning. Cognitive Science 30. 863903.CrossRefGoogle ScholarPubMed
Tesar, Bruce (2008). Output-driven maps. Ms, Rutgers University. Available as ROA-956 from the Rutgers Optimality Archive.Google Scholar
Tesar, Bruce (2011). Learning phonological grammars for output-driven maps. NELS 39. 785798.Google Scholar
Tesar, Bruce, Alderete, John, Horwood, Graham, Merchant, Nazarré, Nishitani, Koichi & Prince, Alan (2003). Surgery in language learning. WCCFL 22. 477490.Google Scholar
Tesar, Bruce & Smolensky, Paul (1998). Learnability in Optimality Theory. LI 29. 229268.Google Scholar
Tesar, Bruce & Smolensky, Paul (2000). Learnability in Optimality Theory. Cambridge, Mass.: MIT Press.CrossRefGoogle Scholar
Tessier, Anne-Michelle (2009). Frequency of violation and constraint-based phonological learning. Lingua 119. 638.CrossRefGoogle Scholar
Wexler, Kenneth & Culicover, Peter W. (1980). Formal principles of language acquisition. Cambridge, Mass.: MIT Press.Google Scholar
Wilson, Colin (2006). Learning phonology with substantive bias: an experimental and computational study of velar palatalization. Cognitive Science 30. 945982.CrossRefGoogle ScholarPubMed