Skip to main content
×
Home
    • Aa
    • Aa

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

  • SHULY WINTNER (a1)
Abstract
Abstract

Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage morphological grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development. Using a real-world case study, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.

Copyright
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

J. W. Amtrup (2003) Morphology in machine translation systems: efficient integration of finite state transducers and feature structure descriptions. Machine Translation 18 (3): 217238.

R. C. Carrasco and M. L. Forcada (2002) Incremental construction and maintenance of minimal finite-state automata. Computational Linguistics 28 (2): 207216.

Y. Cohen-Sygal and S. Wintner (2006) Finite-state registered automata for non-concatenative morphology. Computational Linguistics 32 (1): 4982.

J. Daciuk , S. Mihov , B. W. Watson and R. E. Watson (2000) Incremental construction of minimal acyclic finite-state automata. Computational Linguistics 26 (1): 316.

M. Forsberg and A. Ranta (2004) Functional morphology. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP'04), pp. 213223, New York: AACM Press.

M. Mohri (2000) Minimization algorithms for sequential transducers. Theoretical Computer Science 234: 177201.

M. Mohri , F. Pereira , and M. Riley (2000) The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (1): 1732.

K. Oflazer (1994) Two-level description of Turkish morphology. Literary and Linguistic Computing 9 (2): 137–48.

G. van Noord and D. Gerdemann (2001) An extendible regular expression compiler for finite-state approaches in natural language processing. In O. Boldt and H. Jürgensen (eds.), Automata Implementation, number 2214. Lecture Notes in Computer Science. Springer.

R. Zajac (1998) Feature structures, unification and finite-state transducers. In FSMNLP'98: The International Workshop on Finite-state Methods in Natural Language Processing, Ankara, Turkey.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 3
Total number of PDF views: 5 *
Loading metrics...

Abstract views

Total abstract views: 47 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 25th March 2017. This data will be updated every 24 hours.