To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Alternative and Augmentative Communication (AAC) for people with speech and language disorders is an interesting and challenging application field for research in Natural Language Processing. Further advances in the development of AAC systems require robust language processing techniques and versatile linguistic knowledge bases. Also NLP research can benefit from studying the techniques used in this field and from the user-centred methodologies used to develop and evaluate AAC systems. Until recently, however, apart from some exceptions, there was little scientific exchange between the two research areas. This paper aims to make a contribution to closing this gap. We will argue that current interest in language use, which can be shown by the large amount of research on comprehensive dictionaries and on corpora processing, makes the results of NLP research more relevant to AAC. We will also show that the increasing interest of AAC researchers in NLP is having positive results. To situate research on communication aids, the first half of this paper gives an overview of the AAC research field. The second half is dedicated to an overview of research prototype systems and commercially available communication aids that specifically involve more advanced language processing techniques.
Non-speaking people often rely on AAC (Augmentative and Alternative Communication) devices to assist them to communicate. These AAC devices are slow to operate, however, and as a result conversations can be very difficult and frequently break down. This is especially the case when the conversation partner is unfamiliar with this method of communication, and is a big obstacle to many people when they wish to conduct simple everyday transactions. A way of improving the performance of AAC devices by using scripts is discussed. A prototype system to test this idea was constructed, and a preliminary experiment performed with promising results. A practical AAC device which incorporates scripts was then developed, and is described.
Clustering of a translation memory is proposed to make the retrieval of similar translation examples from a translation memory more efficient, while a second contribution is a metric of text similarity which is based on both surface structure and content. Tests on the two proposed techniques are run on part of the CELEX database. The results reported indicate that the clustering of the translation memory results in a significant gain in the retrieval response time, while the deterioration in the retrieval accuracy can be considered to be negligible. The text similarity metric proposed is evaluated by a human expert and found to be compatible with the human perception of text similarity.
Case systems abound in natural language processing. Almost any attempt to recognize and uniformly represent relationships within a clause – a unit at the centre of any linguistic system that goes beyond word level statistics – must be based on semantic roles drawn from a small, closed set. The set of roles describing relationships between a verb and its arguments within a clause is a case system. What is required of such a case system? How does a natural language practitioner build a system that is complete and detailed yet practical and natural? This paper chronicles the construction of a case system from its origin in English marker words to its successful application in the analysis of English text.
We present a lexical platform that has been developed for the Spanish language. It achieves portability between different computer systems and efficiency, in terms of speed and lexical coverage. A model for the full treatment of Spanish inflectional morphology for verbs, nouns and adjectives is presented. This model permits word formation based solely on morpheme concatenation, driven by a feature-based unification grammar. The run-time lexicon is a collection of allomorphs for both stems and endings. Although not tested, it should be suitable also for other Romance and highly inflected languages. A formalism is also described for encoding a lemma-based lexical source, well suited for expressing linguistic generalizations: inheritance classes, lemma encoding, morpho-graphemic allomorphy rules and limited type-checking. From this source base, we can automatically generate an allomorph indexed dictionary adequate for efficient retrieval and processing. A set of software tools has been implemented around this formalism: lexical base augmenting aids, lexical compilers to build run-time dictionaries and access libraries for them, feature manipulation libraries, unification and pseudo-unification modules, morphological processors, a parsing system, etc. Software interfaces among the different modules and tools are cleanly defined to ease software integration and tool combination in a flexible way. Directions for accessing our e-mail and web demonstration prototypes are also provided. Some figures are given, showing the lexical coverage of our platform compared to some popular spelling checkers.
In this paper, we introduce a method to represent phrase structure grammars for building a large annotated corpus of Korean syntactic trees. Korean is different from English in word order and word compositions. As a result of our study, it turned out that the differences are significant enough to induce meaningful changes in the tree annotation scheme for Korean with respect to the schemes for English. A tree annotation scheme defines the grammar formalism to be assumed, categories to be used, and rules to determine correct parses for unsettled issues in parse construction. Korean is partially free in word order and the essential components such as subjects and objects of a sentence can be omitted with greater freedom than in English. We propose a restricted representation of phrase structure grammar to handle the characteristics of Korean more efficiently. The proposed representation is shown by means of an extensive experiment to gain improvements in parsing time as well as grammar size. We also describe the system named Teb that is a software environment set up with a goal to build a tree annotated corpus of Korean containing more than one million units.
This paper presents a new type of nonlinear discourse structure found to be very common in free English texts. This structure reflects nonlinear presentation of the information and knowledge conveyed by the texts. It is argued that such nonlinearity is representationally and informationally advantageous because it allows one to create smaller, more compact texts. The paper presents a heuristics-based, relatively domain-independent algorithm for computing this new text structure. The paper discusses good quantitative and qualitative performance of the algorithm, and presents the results of the extensive tests on a large volume of free English texts.
Natural language interfaces require dialogue models that allow for robust, habitable and efficient interaction. This paper presents such a model for dialogue management for natural language interfaces. The model is based on empirical studies of human computer interaction in various simple service applications. It is shown that for applications belonging to this class the dialogue can be handled using fairly simple means. The interaction can be modeled in a dialogue grammar with information on the functional role of an utterance as conveyed in the linguistic structure. Focusing is handled using dialogue objects recorded in a dialogue tree representing the constituents of the dialogue. The dialogue objects in the dialogue tree can be accessed by the various modules for interpretation, generation and background system access. Focused entities are modeled in entities pertaining to objects or sets of objects, and related domain concept information; properties of the domain objects. A simple copying principle, where a new dialogue object's focal parameters are instantiated with information from the preceding dialogue object, accounts for most context dependent utterances. The action to be carried out by the interface is determined on the basis of how the objects and related properties are specified. This in turn depends on information presented in the user utterance, context information from the dialogue tree and information in the domain model. The use of dialogue objects facilitates customization to the sublanguage utilized in a specific application. The framework has successfully been applied to various background systems and interaction modalities. In the paper results from the customization of the dialogue manager to three typed interaction applications are presented together with results from applying the model to two applications utilizing spoken interaction.
This special issue presents the state-of-the-art in implemented, general-purpose Natural Language Processing (NLP) systems that use nontrivial Knowledge Representation and Reasoning (KRR). These systems use full-scale implementations of traditional KRR techniques as well as some newer knowledge-related processing mechanisms that have been developed specifically to meet the needs of natural language processing. The papers cover a wide range of natural language inputs, knowledge and formalisms, application domains and processing tasks, illustrating the key role that knowledge representation plays in all types of NLP systems.
We describe the natural language processing and knowledge representation components of B2, a collaborative system that allows medical students to practice their decision-making skills by considering a number of medical cases that differ from each other in a controlled manner. The underlying decision-support model of B2 uses a Bayesian network that captures the results of prior clinical studies of abdominal pain. B2 generates story-problems based on this model and supports natural language queries about the conclusions of the model and the reasoning behind them. B2 benefits from having a single knowledge representation and reasoning component that acts as a blackboard for intertask communication and cooperation. All knowledge is represented using a propositional semantic network formalism, thereby providing a uniform representation to all components. The natural language component is composed of a generalized augmented transition network parser/grammar and a discourse analyzer for managing the natural language interactions. The knowlege representation component supports the natural language component by providing a uniform representation of the content and structure of the interaction, at the parser, discourse, and domain levels. This uniform representation allows distinct tasks, such as dialog management, domain-specific reasoning, and meta-reasoning about the Bayesian network, to all use the same information source, without requiring mediation. This is important because there are queries, such as Why?, whose interpretation and response requires information from each of these tasks. By contrast, traditional approaches treat each subtask as a “black-box” with respect to other task components, and have a separate knowledge representation language for each. As a result, they have had much more difficulty providing useful responses.
This paper describes the approach to knowledge representation taken in the LaSIE Information Extraction (IE) system. Unlike many IE systems that skim texts and use large collections of shallow, domain-specific patterns and heuristics to fill in templates, LaSIE attempts a fuller text analysis, first translating individual sentences to a quasi-logical form, and then constructing a weak discourse model of the entire text from which template fills are finally derived. Underpinning the system is a general ‘world model’, represented as a semantic net, which is extended during the processing of a text by adding the classes and instances described in that text. In the paper we describe the system's knowledge representation formalisms, their use in the IE task, and how the knowledge represented in them is acquired, including experiments to extend the system's coverage using the WordNet general purpose semantic network. Preliminary evaluations of our approach, through the Sixth DARPA Message Understanding Conference, indicate comparable performance to shallower approaches. However, we believe its generality and extensibility offer a route towards the higher precision that is required of IE systems if they are to become genuinely usable technologies.
A large collection of texts may be reached through the Internet and this provides a powerful platform from which common-sense knowledge may be gathered. This paper presents a system that contains a core knowledge base structured around WordNet, a lexical database, capable of extracting contextual information from a given input text. Such context information is then used to retrieve other texts from the Internet that relate to that context. When processed by the system, these new texts bring more information that represents an enhanced domain context for the initial text. This is an incremental method for text processing that acquires domain knowledge from other texts. The paper describes the system architecture, its core knowledge base and inference engine, and the acquisition of new knowledge from corpora.
In this paper, we describe NKRL (Narrative Knowledge Representation Language), a language designed for representing, in a standardized way, the semantic content (the ‘meaning’) of complex narrative texts. After having introduced informally the four ‘components’ (specialized sub-languages) of NKRL, we will describe (some of) the data structures proper to each of them, trying to show that the NKRL coding retains the main informational elements of the original narrative expressions. We will then focus on an important subset of NKRL, the so-called AECS sub-language, showing in particular that the operators of this sub-language can be used to represent some sorts of ‘plural’ expressions.
In this article, we give an overview of Natural Language Generation (NLG) from an applied system-building perspective. The article includes a discussion of when NLG techniques should be used; suggestions for carrying out requirements analyses; and a description of the basic NLG tasks of content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation, and linguistic realisation. Throughout, the emphasis is on established techniques that can be used to build simple but practical working systems now. We also provide pointers to techniques in the literature that are appropriate for more complicated scenarios.
Natural language generation is now moving away from research prototypes into more practical applications. Generation functionality is also being asked to play a more significant role in established applications such as machine translation. In both cases, multilingual generation techniques have much to offer. However, the take-up of multilingual generation is being restricted by a critical lack both of large-scale linguistic resources suited to the generation task and of appropriate development environments. This paper describes KPML, a multilingual development environment that offers one possible solution to these problems. KPML aims to provide generation projects with standardized, broad-coverage, reusable resources and a basic engine for using such resources for generation. A variety of focused debugging aids ensure efficient maintenance, while supporting multilingual work such as contrastive language development and automatic merging of independently developed resources. KPML is based on a new, generic approach to multilinguality in resource description that extends significantly beyond previous approaches. The system has already been used in a number of large generation projects and is freely available to the generation community.
This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The main reason for this is the systems of functional (closed) words are quite different in the two languages. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs (anchors) that correspond to each other by relaxing parameters. The method, by combining two kinds of word correspondences, achieves adequate word correspondences for complete alignment. As a result, texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese–English texts.