Hostname: page-component-76fb5796d-vfjqv Total loading time: 0 Render date: 2024-04-25T15:09:27.831Z Has data issue: false hasContentIssue false

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

Published online by Cambridge University Press:  01 October 2008

SHULY WINTNER*
Affiliation:
Department of Computer Science, University of Haifa, 31905 Haifa, Israel e-mail: shuly@cs.haifa.ac.il

Abstract

Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage morphological grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development. Using a real-world case study, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.

Type
Papers
Copyright
Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Amtrup, J. W. (2003) Morphology in machine translation systems: efficient integration of finite state transducers and feature structure descriptions. Machine Translation 18 (3): 217238.CrossRefGoogle Scholar
Beesley, K. R. (1996) Arabic finite-state morphological analysis and generation. In Proceedings of COLING-96, the 16th International Conference on Computational Linguistics, Copenhagen.CrossRefGoogle Scholar
Beesley, K. R. (1998) Arabic morphology using only finite-state operations. In M. Rosner (eds.), Proceedings of the Workshop on Computational Approaches to Semitic languages, pp. 50–57, Montreal, Quebec. COLING-ACL'98.CrossRefGoogle Scholar
Beesley, K. R. and Karttunen, L. (2003) Finite-State Morphology: Xerox Tools and Techniques. Stanford: CSLI.Google Scholar
Buckwalter, T. (2004) Buckwalter Arabic Morphological Analyzer Version 2.0. Philadelphia: Linguistic Data Consortium.Google Scholar
Carrasco, R. C. and Forcada, M. L. (2002) Incremental construction and maintenance of minimal finite-state automata. Computational Linguistics 28 (2): 207216.CrossRefGoogle Scholar
Chanod, J.-P. and Tapanainen, P. (1996). A robust finite-state grammar for French. In ESSLLI'96 Workshop on Robust Parsing, pp. 16–25, Prague.Google Scholar
Cohen-Sygal, Y. and Wintner, S. (2005) XFST2FSA: comparing two finite-state toolboxes. In Proceedings of the ACL-2005 Workshop on Software, Ann Arbor, MI.CrossRefGoogle Scholar
Cohen-Sygal, Y. and Wintner, S. (2006) Finite-state registered automata for non-concatenative morphology. Computational Linguistics 32 (1): 4982.CrossRefGoogle Scholar
Daciuk, J., Mihov, S., Watson, B. W. and Watson, R. E. (2000) Incremental construction of minimal acyclic finite-state automata. Computational Linguistics 26 (1): 316.CrossRefGoogle Scholar
Forsberg, M. and Ranta, A. (2004) Functional morphology. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP'04), pp. 213223, New York: AACM Press.CrossRefGoogle Scholar
Görz, G. and Paulus, D. (1988) A finite state approach to German verb morphology. In Proceedings of the 12th Conference on Computational Linguistics (COLING-88), pp. 212–215, Budapest.CrossRefGoogle Scholar
Holzer, M. and Kutrib, M. (2002) State complexity of basic operations on nondeterministic finite automata. In Implementation and Application of Automata (CIAA '02), pp. 151–160.Google Scholar
Huet, G. (2005). A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger. Journal of Functional Programming 15 (4): 573614.CrossRefGoogle Scholar
Itai, A., Wintner, S. and Yona, S. (2006) A computational lexicon of contemporary Hebrew. In Proceedings of The Fifth International Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy.Google Scholar
Johnson, C. D. (1972) Formal Aspects of Phonological Description. Mouton, The Hague.CrossRefGoogle Scholar
Kanthak, S. and Ney, H. (2004) FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 510–517.CrossRefGoogle Scholar
Kaplan, R. M. and Kay, M. (1994) Regular models of phonological rule systems. Computational Linguistics 20 (3): 331378.Google Scholar
Karttunen, L. (1995). The replace operator. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 16–23.CrossRefGoogle Scholar
Koskenniemi, K. (1983). Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. The Department of General Linguistics, University of Helsinki.Google Scholar
Mohri, M. (1997) Finite-state transducers in language and speech processing. Computational Linguistics 23 (2): 269312.Google Scholar
Mohri, M. (2000) Minimization algorithms for sequential transducers. Theoretical Computer Science 234: 177201.CrossRefGoogle Scholar
Mohri, M., Pereira, F., and Riley, M. (2000) The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (1): 1732.CrossRefGoogle Scholar
Oflazer, K. (1994) Two-level description of Turkish morphology. Literary and Linguistic Computing 9 (2): 137–48.CrossRefGoogle Scholar
Roche, E. and Schabes, Y. (eds.) (1997) Finite-State Language Processing. Language, Speech and Communication. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Schmid, H. (2005) A programming language for finite state transducers. In Proceedings of the 5th Workshop on Finite State Methods in Natural Language Processing, Helsinki, Finland. University of Helsinki.Google Scholar
Shapira, M. and Choueka, Y. (1964) Mechanographic analysis of Hebrew morphology: possibilities and achievements. Leshonenu 28 (4): 354372. In Hebrew.Google Scholar
Silberztein, M. (1993) Dictionnaires électroniques et analyse automatique de textes : le système INTEX Paris: Masson.Google Scholar
Trost, H. (1990) The application of two-level morphology to non-concatenative German morphology. In COLING-90, pp. 371–376.Google Scholar
van Noord, G. and Gerdemann, D. (2001) An extendible regular expression compiler for finite-state approaches in natural language processing. In Boldt, O. and Jürgensen, H. (eds.), Automata Implementation, number 2214. Lecture Notes in Computer Science. Springer.Google Scholar
Wintner, S. (2007) Finite-state technology as a programming environment. In Gelbukh, A. (eds.), Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CICLing-2007), vol. 4394. Lecture Notes in Computer Science, pp. 97106. Berlin and Heidelberg: Springer.CrossRefGoogle Scholar
Yona, S. and Wintner, S. (2008). A finite-state morphological grammar of Hebrew. Natural Language Engineering.CrossRefGoogle Scholar
Zajac, R. (1998) Feature structures, unification and finite-state transducers. In FSMNLP'98: The International Workshop on Finite-state Methods in Natural Language Processing, Ankara, Turkey.Google Scholar