Skip to main content
×
Home

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

  • SHULY WINTNER (a1)
Abstract
Abstract

Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage morphological grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development. Using a real-world case study, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.

Copyright
References
Hide All
Amtrup J. W. (2003) Morphology in machine translation systems: efficient integration of finite state transducers and feature structure descriptions. Machine Translation 18 (3): 217238.
Beesley K. R. (1996) Arabic finite-state morphological analysis and generation. In Proceedings of COLING-96, the 16th International Conference on Computational Linguistics, Copenhagen.
Beesley K. R. (1998) Arabic morphology using only finite-state operations. In M. Rosner (eds.), Proceedings of the Workshop on Computational Approaches to Semitic languages, pp. 50–57, Montreal, Quebec. COLING-ACL'98.
Beesley K. R. and Karttunen L. (2003) Finite-State Morphology: Xerox Tools and Techniques. Stanford: CSLI.
Buckwalter T. (2004) Buckwalter Arabic Morphological Analyzer Version 2.0. Philadelphia: Linguistic Data Consortium.
Carrasco R. C. and Forcada M. L. (2002) Incremental construction and maintenance of minimal finite-state automata. Computational Linguistics 28 (2): 207216.
Chanod J.-P. and Tapanainen P. (1996). A robust finite-state grammar for French. In ESSLLI'96 Workshop on Robust Parsing, pp. 16–25, Prague.
Cohen-Sygal Y. and Wintner S. (2005) XFST2FSA: comparing two finite-state toolboxes. In Proceedings of the ACL-2005 Workshop on Software, Ann Arbor, MI.
Cohen-Sygal Y. and Wintner S. (2006) Finite-state registered automata for non-concatenative morphology. Computational Linguistics 32 (1): 4982.
Daciuk J., Mihov S., Watson B. W. and Watson R. E. (2000) Incremental construction of minimal acyclic finite-state automata. Computational Linguistics 26 (1): 316.
Forsberg M. and Ranta A. (2004) Functional morphology. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP'04), pp. 213223, New York: AACM Press.
Görz G. and Paulus D. (1988) A finite state approach to German verb morphology. In Proceedings of the 12th Conference on Computational Linguistics (COLING-88), pp. 212–215, Budapest.
Holzer M. and Kutrib M. (2002) State complexity of basic operations on nondeterministic finite automata. In Implementation and Application of Automata (CIAA '02), pp. 151–160.
Huet G. (2005). A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger. Journal of Functional Programming 15 (4): 573614.
Itai A., Wintner S. and Yona S. (2006) A computational lexicon of contemporary Hebrew. In Proceedings of The Fifth International Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy.
Johnson C. D. (1972) Formal Aspects of Phonological Description. Mouton, The Hague.
Kanthak S. and Ney H. (2004) FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 510–517.
Kaplan R. M. and Kay M. (1994) Regular models of phonological rule systems. Computational Linguistics 20 (3): 331378.
Karttunen L. (1995). The replace operator. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 16–23.
Koskenniemi K. (1983). Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. The Department of General Linguistics, University of Helsinki.
Mohri M. (1997) Finite-state transducers in language and speech processing. Computational Linguistics 23 (2): 269312.
Mohri M. (2000) Minimization algorithms for sequential transducers. Theoretical Computer Science 234: 177201.
Mohri M., Pereira F., and Riley M. (2000) The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (1): 1732.
Oflazer K. (1994) Two-level description of Turkish morphology. Literary and Linguistic Computing 9 (2): 137–48.
Roche E. and Schabes Y. (eds.) (1997) Finite-State Language Processing. Language, Speech and Communication. Cambridge, MA: MIT Press.
Schmid H. (2005) A programming language for finite state transducers. In Proceedings of the 5th Workshop on Finite State Methods in Natural Language Processing, Helsinki, Finland. University of Helsinki.
Shapira M. and Choueka Y. (1964) Mechanographic analysis of Hebrew morphology: possibilities and achievements. Leshonenu 28 (4): 354372. In Hebrew.
Silberztein M. (1993) Dictionnaires électroniques et analyse automatique de textes : le système INTEX Paris: Masson.
Trost H. (1990) The application of two-level morphology to non-concatenative German morphology. In COLING-90, pp. 371–376.
van Noord G. and Gerdemann D. (2001) An extendible regular expression compiler for finite-state approaches in natural language processing. In Boldt O. and Jürgensen H. (eds.), Automata Implementation, number 2214. Lecture Notes in Computer Science. Springer.
Wintner S. (2007) Finite-state technology as a programming environment. In Gelbukh A. (eds.), Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CICLing-2007), vol. 4394. Lecture Notes in Computer Science, pp. 97106. Berlin and Heidelberg: Springer.
Yona S. and Wintner S. (2008). A finite-state morphological grammar of Hebrew. Natural Language Engineering.
Zajac R. (1998) Feature structures, unification and finite-state transducers. In FSMNLP'98: The International Workshop on Finite-state Methods in Natural Language Processing, Ankara, Turkey.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 5
Total number of PDF views: 8 *
Loading metrics...

Abstract views

Total abstract views: 91 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 18th November 2017. This data will be updated every 24 hours.