Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-09T18:22:43.887Z Has data issue: false hasContentIssue false

Russian morphology: An engineering approach

Published online by Cambridge University Press:  12 September 2008

Andrei Mikheev
Affiliation:
HCRC, Language Technology Group, University of Edinburgh2 Buccleuch Place, Edinburgh EH8 9LW, Scotland, UK e-mail: Andrei.Mikheev@ed.ac.uk
Liubov Liubushkina
Affiliation:
Institute for Informatics Problems (IPI RAN), Russian Academy of Sciences 30/6 Vavilova str., Moscow 117311, Russia e-mail: luba@rbmike.msk.su

Abstract

Morphological analysis, which is at the heart of the processing of natural language requires computationally effective morphological processors. In this paper an approach to the organization of an inflectional morphological model and its application for the Russian language are described. The main objective of our morphological processor is not the classification of word constituents, but rather an efficient computational recognition of morpho-syntactic features of words and the generation of words according to requested morpho-syntactic features. Another major concern that the processor aims to address is the ease of extending the lexicon. The templated word-paradigm model used in the system has an engineering flavour: paradigm formation rules are of a bottom-up (word specific) nature rather than general observations about the language, and word formation units are segments of words rather than proper morphemes. This approach allows us to handle uniformly both general cases and exceptions, and requires extremely simple data structures and control mechanisms which can be easily implemented as a finite-state automata. The morphological processor described in this paper is fully implemented for a substantial subset of Russian (more then 1,500,000 word-tokens – 95,000 word paradigms) and provides an extensive list of morpho-syntactic features together with stress positions for words utilized in its lexicon. Special dictionary management tools were built for browsing, debugging and extension of the lexicon. The actual implementation was done in C and C++, and the system is available for the MS-DOS, MS-Windows and UNIX platforms.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable