Skip to main content

Error-driven learning in Optimality Theory and Harmonic Grammar: a comparison*

  • Giorgio Magri (a1)

OT error-driven learning admits guarantees of efficiency, stochastic tolerance and noise robustness which hold independently of any substantive assumptions on the constraints. This paper shows that the HG learner used in the current literature does not admit such constraint-independent guarantees. The HG theory of error-driven learning thus needs to be substantially restricted to specific constraint sets.

Corresponding author
Hide All

Parts of this paper were presented at the 21st Manchester Phonology Meeting in 2013 and at the 11th Old World Conference in Phonology in 2014. I wish to thank Paul Boersma and Joe Pater for useful discussion. Three anonymous reviewers and the associate editor of the journal also provided me with detailed and valuable suggestions. The research reported in this paper was supported by a grant from the Fyssen Research Foundation, as well as by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme.

Appendices providing more technical details and simulation results can be found in supplementary online materials at

Hide All
Bane Max, Riggle Jason & Sonderegger Morgan (2010). The VC dimension of constraint-based grammars. Lingua 120. 11941208.
Bíró Tamás S. (2006). Finding the right words: implementing Optimality Theory with simulated annealing. PhD dissertation, University of Groningen.
Block H. D. (1962). The perceptron: a model of brain functioning. Review of Modern Physics 34. 123135.
Boersma Paul (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21. 4358.
Boersma Paul (1998). Functional phonology. PhD dissertation, University of Amsterdam. Published, The Hague: Holland Academic Graphics.
Boersma Paul (2009). Some correct error-driven versions of the Constraint Demotion Algorithm. LI 40. 667686.
Boersma Paul & Hayes Bruce (2001). Empirical tests of the Gradual Learning Algorithm. LI 32. 4586.
Boersma Paul & van Leussen Jan-Willem (2014). Fast evaluation and learning in multi-level parallel constraint grammars. Ms, University of Amsterdam.
Boersma Paul & Pater Joe (2016). Convergence properties of a Gradual Learning Algorithm for Harmonic Grammar. In McCarthy John J. & Pater Joe (eds.) Harmonic Grammar and Harmonic Serialism. London: Equinox. 389434.
Cesa-Bianchi Nicolò & Lugosi Gábor (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Chomsky Noam (1965). Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.
Coetzee Andries W. & Kawahara Shigeto (2013). Frequency biases in phonological variation. NLLT 31. 4789.
Coetzee Andries W. & Pater Joe (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. NLLT 26. 289337.
Coetzee Andries W. & Pater Joe (2011). The place of variation in phonological theory. In Goldsmith John, Riggle Jason & Yu Alan (eds.) The handbook of phonological theory. 2nd edn. Malden, Mass. & Oxford: Wiley-Blackwell. 401434.
Collins Michael (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Haji Jan & Matsumoto Yuji (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg, PA: Association for Computational Linguistics. 18.
Cristianini Nello & Shawe-Taylor John (2000). An introduction to Support Vector Machines and other kernel-based methods. Cambridge: Cambridge University Press.
Eisner Jason (2000). Easy and hard constraint ranking in Optimality Theory: algorithms and complexity. In Eisner Jason, Karttunen Lauri & Thériault Alain (eds.) Finite-state phonology: Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON). 22–33.
Frank Robert & Kapur Shyam (1996). On the use of triggers in parameter setting. LI 27. 623660.
Freund Yoav & Schapire Robert E. (1999). Large margin classification using the perceptron algorithm. Machine Learning 37. 277296.
Fürnkranz Johannes & Hüllermeier Eyke (2010). Preference learning. Berlin & Heidelberg: Springer.
Gibson Edward & Wexler Kenneth (1994). Triggers. LI 25. 407454.
Hayes Bruce (2004). Phonological acquisition in Optimality Theory: the early stages. In Kager et al. (2004). 158–203.
Heinz Jeffrey (2011). Computational phonology. Part I: Foundations. Language and Linguistics Compass 5. 140152.
Jäger Gerhard & Rosenbach Anette (2006). The winner takes it all – almost: cumulativity in grammatical variation. Linguistics 44. 937971.
Jarosz Gaja (2010). Implicational markedness and frequency in constraint-based computational models of phonological learning. Journal of Child Language 37. 565606.
Jarosz Gaja (2013). Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing. Phonology 30. 2771.
Jesney Karen & Tessier Anne-Michelle (2011). Biases in Harmonic Grammar: the road to restrictive learning. NLLT 29. 251290.
Kager René, Pater Joe & Zonneveld Wim (eds.) (2004). Constraints in phonological acquisition. Cambridge: Cambridge University Press.
Keller Frank (2000). Gradience in grammar: experimental and computational aspects of degrees of grammaticality. PhD dissertation, University of Edinburgh.
Kivinen Jyrki (2003). Online learning of linear classifiers. In Mendelson Shahar & Smola Alexander J. (eds.) Advanced lectures on machine learning. Berlin & Heidelberg: Springer. 235257.
Klasner Norbert & Simon Hans Ulrich (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Maass Wolfgang (ed.) Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT) . New York: ACM. 250257.
Legendre Géraldine, Miyata Yoshiro & Smolensky Paul (1998a). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: an application. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 884–891.
Legendre Géraldine, Miyata Yoshiro & Smolensky Paul (1998b). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: theoretical foundations. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 388–395.
Legendre Géraldine, Sorace Antonella & Smolensky Paul (2006). The Optimality Theory–Harmonic Grammar connection. In Smolensky & Legendre (2006: vol. 2). 903–966.
Levelt Clara C., Schiller Niels O. & Levelt Willem J. (2000). The acquisition of syllable types. Language Acquisition 8. 237264.
Magri Giorgio (2012a). Constraint promotion: not only convergent, but also efficient. CLS 48. 471485.
Magri Giorgio (2012b). Convergence of error-driven ranking algorithms. Phonology 29. 213269.
Magri Giorgio (2013a). The complexity of learning in Optimality Theory and its implications for the acquisition of phonotactics. LI 44. 433468.
Magri Giorgio (2013b). HG has no computational advantages over OT: toward a new toolkit for computational OT. LI 44. 569609.
Magri Giorgio (2015). How to keep the HG weights non-negative: the truncated Perceptron reweighting rule. Journal of Language Modelling 3. 345375.
Magri Giorgio (2016). Noise robustness and stochastic tolerance of OT error-driven ranking algorithms. Journal of Logic and Computation 26. 959988.
Magri Giorgio (forthcoming). Idempotency in Optimality Theory. JL.
Magri Giorgio & Storme Benjamin (forthcoming). A closer look at Boersma & Hayes’ Ilokano metathesis test case. CLS 49.
Minsky Marvin L. & Papert Seymour A. (1969). Perceptrons: an introduction to computational geometry. Cambridge, Mass.: MIT Press.
Mohri Mehryar & Rostamizadeh Afshin (2013). Perceptron mistake bounds.
Mohri Mehryar, Rostamizadeh Afshin & Talwalkar Ameet (2012). Foundations of machine learning. Cambridge, Mass.: MIT Press.
Novikoff Albert B. J. (1962). On convergence proofs on Perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata. Vol. 12. New York: Polytechnic Institute of Brooklyn. 615–622.
Pater Joe (2008). Gradual learning and convergence. LI 39. 334345.
Pater Joe (2009). Weighted constraints in generative linguistics. Cognitive Science 33. 9991035.
Prince Alan & Smolensky Paul (2004). Optimality Theory: constraint interaction in generative grammar. Malden, Mass. & Oxford: Blackwell.
Prince Alan & Tesar Bruce (2004). Learning phonotactic distributions. In Kager et al. (2004). 245–291.
Riggle Jason (2009). The complexity of ranking hypotheses in Optimality Theory. Computational Linguistics 35. 4759.
Rosenblatt Frank (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65. 386408.
Rosenblatt Frank (1962). Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Washington, DC: Spartan.
Shalev-Shwartz Shai & Singer Yoram (2005). A new perspective on an old Perceptron algorithm. In Auer Peter & Meir Ron (eds.) Learning theory. Berlin & Heidelberg: Springer. 264278.
Smolensky Paul & Legendre Géraldine (eds.) (2006). The harmonic mind: from neural computation to optimality-theoretic grammar. 2 vols. Cambridge, Mass.: MIT Press.
Staubs Robert, Becker Michael, Potts Christopher, Pratt Patrick, McCarthy John J. & Pater Joe (2010). OT-Help 2.0. Software package.
Tesar Bruce (2004). Using inconsistency detection to overcome structural ambiguity. LI 35. 219253.
Tesar Bruce (2013). Output-driven phonology: theory and learning. Cambridge: Cambridge University Press.
Tesar Bruce & Smolensky Paul (1998). Learnability in Optimality Theory. LI 29. 229268.
Tesar Bruce & Smolensky Paul (2000). Learnability in Optimality Theory. Cambridge, Mass.: MIT Press.
Wexler Kenneth & Culicover Peter W. (1980). Formal principles of language acquisition. Cambridge, Mass.: MIT Press.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

  • ISSN: 0952-6757
  • EISSN: 1469-8188
  • URL: /core/journals/phonology
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary materials

Magri supplementary material
Magri supplementary material 1

 Unknown (3.1 MB)
3.1 MB


Full text views

Total number of HTML views: 6
Total number of PDF views: 97 *
Loading metrics...

Abstract views

Total abstract views: 268 *
Loading metrics...

* Views captured on Cambridge Core between 16th January 2017 - 25th February 2018. This data will be updated every 24 hours.