We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Non-speaking people often rely on AAC (Augmentative and Alternative
Communication)
devices to assist them to communicate. These AAC devices are slow to operate,
however, and
as a result conversations can be very difficult and frequently break down.
This is especially
the case when the conversation partner is unfamiliar with this method of
communication, and
is a big obstacle to many people when they wish to conduct simple everyday
transactions. A
way of improving the performance of AAC devices by using scripts is discussed.
A prototype
system to test this idea was constructed, and a preliminary experiment
performed with
promising results. A practical AAC device which incorporates scripts was
then developed,
and is described.
Clustering of a translation memory is proposed to make the
retrieval of similar translation examples from a translation memory more
efficient,
while a second contribution is a metric of
text similarity which is based on both surface structure and content. Tests
on the two proposed
techniques are run on part of the CELEX database. The results reported
indicate that the
clustering of the translation memory results in a significant gain in the
retrieval response
time, while the deterioration in the retrieval accuracy can be considered
to be negligible. The
text similarity metric proposed is evaluated by a human expert and found
to be compatible
with the human perception of text similarity.
Case systems abound in natural language processing. Almost any
attempt to recognize
and uniformly represent relationships within a clause – a unit at
the centre of any linguistic system
that goes beyond word level statistics – must be based on
semantic roles drawn from a small,
closed set. The set of roles describing relationships between a verb and
its arguments within a
clause is a case system. What is required of such a case system? How does
a natural language
practitioner build a system that is complete and detailed yet practical
and natural? This paper
chronicles the construction of a case system from its origin in English
marker words to its
successful application in the analysis of English text.
We present a lexical platform that has been developed for the
Spanish language. It achieves
portability between different computer systems and efficiency, in terms
of speed and lexical
coverage. A model for the full treatment of Spanish inflectional morphology
for verbs,
nouns and adjectives is presented. This model permits word formation based
solely on
morpheme concatenation, driven by a feature-based unification grammar.
The run-time
lexicon is a collection of allomorphs for both stems and endings. Although
not tested, it
should be suitable also for other Romance and highly inflected languages.
A formalism is
also described for encoding a lemma-based lexical source, well suited for
expressing linguistic
generalizations: inheritance classes, lemma encoding, morpho-graphemic
allomorphy rules and
limited type-checking. From this source base, we can automatically generate
an allomorph
indexed dictionary adequate for efficient retrieval and processing. A set
of software tools
has been implemented around this formalism: lexical base augmenting aids,
lexical compilers
to build run-time dictionaries and access libraries for them, feature manipulation
libraries,
unification and pseudo-unification modules, morphological processors, a
parsing system, etc.
Software interfaces among the different modules and tools are cleanly defined
to ease software
integration and tool combination in a flexible way. Directions for accessing
our e-mail and
web demonstration prototypes are also provided. Some figures are given,
showing the lexical
coverage of our platform compared to some popular spelling checkers.
In this paper, we introduce a method to represent phrase structure
grammars for building a
large annotated corpus of Korean syntactic trees. Korean is different from
English in word
order and word compositions. As a result of our study, it turned out that
the differences are
significant enough to induce meaningful changes in the tree annotation
scheme for Korean
with respect to the schemes for English. A tree annotation scheme defines
the grammar
formalism to be assumed, categories to be used, and rules to determine
correct parses for
unsettled issues in parse construction. Korean is partially free in word
order and the essential
components such as subjects and objects of a sentence can be omitted with
greater freedom
than in English. We propose a restricted representation of phrase structure
grammar to handle
the characteristics of Korean more efficiently. The proposed representation
is shown by means
of an extensive experiment to gain improvements in parsing time as well
as grammar size.
We also describe the system named Teb that is a software environment set
up with a goal to
build a tree annotated corpus of Korean containing more than one million
units.
This paper presents a new type of nonlinear discourse structure
found to be very common
in free English texts. This structure reflects nonlinear presentation of
the information and
knowledge conveyed by the texts. It is argued that such nonlinearity is
representationally and
informationally advantageous because it allows one to create smaller, more
compact texts.
The paper presents a heuristics-based, relatively domain-independent algorithm
for computing
this new text structure. The paper discusses good quantitative and qualitative
performance of
the algorithm, and presents the results of the extensive tests on a large
volume of free English texts.
Natural language interfaces require dialogue models that
allow for robust, habitable and
efficient interaction. This paper presents such a model for dialogue management
for natural
language interfaces. The model is based on empirical studies of human computer
interaction
in various simple service applications. It is shown that for applications
belonging to this class
the dialogue can be handled using fairly simple means. The interaction
can be modeled in a
dialogue grammar with information on the functional role of an utterance
as conveyed in the
linguistic structure. Focusing is handled using dialogue objects recorded
in a dialogue tree
representing the constituents of the dialogue. The dialogue objects in
the dialogue tree can be
accessed by the various modules for interpretation, generation and background
system access.
Focused entities are modeled in entities pertaining to objects or sets
of objects, and related
domain concept information; properties of the domain objects. A simple
copying principle,
where a new dialogue object's focal parameters are instantiated with
information from the
preceding dialogue object, accounts for most context dependent utterances.
The action to
be carried out by the interface is determined on the basis of how the objects
and related
properties are specified. This in turn depends on information presented
in the user utterance,
context information from the dialogue tree and information in the domain
model. The use of
dialogue objects facilitates customization to the sublanguage
utilized in a specific application.
The framework has successfully been applied to various background systems
and interaction
modalities. In the paper results from the customization of the dialogue
manager to three
typed interaction applications are presented together with results from
applying the model to
two applications utilizing spoken interaction.
This special issue presents the state-of-the-art in implemented,
general-purpose
Natural Language Processing (NLP) systems that use nontrivial Knowledge
Representation
and Reasoning (KRR). These systems use full-scale implementations of
traditional KRR techniques as well as some newer knowledge-related processing
mechanisms that have been developed specifically to meet the needs of natural
language processing. The papers cover a wide range of natural language
inputs,
knowledge and formalisms, application domains and processing tasks, illustrating
the key role that knowledge representation plays in all types of NLP systems.
We describe the natural language processing and knowledge representation
components of B2, a collaborative system that allows medical students
to practice their decision-making skills by
considering a number of medical cases that differ from each other in a
controlled manner. The
underlying decision-support model of B2 uses a Bayesian network that captures
the results
of prior clinical studies of abdominal pain. B2 generates story-problems
based on this model
and supports natural language queries about the conclusions of the model
and the reasoning
behind them. B2 benefits from having a single knowledge representation
and reasoning
component that acts as a blackboard for intertask communication and cooperation.
All
knowledge is represented using a propositional semantic network formalism,
thereby providing
a uniform representation to all components. The natural language
component is composed
of a generalized augmented transition network parser/grammar and a
discourse analyzer
for managing the natural language interactions. The knowlege representation
component
supports the natural language component by providing a uniform representation
of the
content and structure of the interaction, at the parser, discourse, and
domain levels. This
uniform representation allows distinct tasks, such as dialog management,
domain-specific
reasoning, and meta-reasoning about the Bayesian network, to all use the
same information
source, without requiring mediation. This is important because there are
queries, such as
Why?, whose interpretation and response requires information from
each of these tasks. By contrast, traditional approaches treat each subtask
as a “black-box” with respect to other
task components, and have a separate knowledge representation language
for each. As a
result, they have had much more difficulty providing useful responses.
This paper describes the approach to knowledge representation
taken in the LaSIE Information Extraction (IE) system. Unlike
many IE systems that skim texts and use large collections
of shallow, domain-specific patterns and heuristics to fill in templates,
LaSIE attempts a fuller
text analysis, first translating individual sentences to a quasi-logical
form, and then constructing a weak discourse model of the entire text from
which template fills are finally derived.
Underpinning the system is a general ‘world model’,
represented as a semantic net, which is
extended during the processing of a text by adding the classes and instances
described in
that text. In the paper we describe the system's knowledge
representation formalisms, their
use in the IE task, and how the knowledge represented in them is acquired,
including experiments to extend the system's coverage using the
WordNet general purpose semantic network.
Preliminary evaluations of our approach, through the Sixth DARPA Message
Understanding
Conference, indicate comparable performance to shallower approaches. However,
we believe
its generality and extensibility offer a route towards the higher precision
that is required of IE systems if they are to become genuinely usable technologies.
A large collection of texts may be reached through the Internet
and this provides a powerful
platform from which common-sense knowledge may be gathered. This paper
presents a system
that contains a core knowledge base structured around WordNet, a lexical
database, capable
of extracting contextual information from a given input text. Such context
information is
then used to retrieve other texts from the Internet that relate to that
context. When processed
by the system, these new texts bring more information that represents an
enhanced domain
context for the initial text. This is an incremental method for text processing
that acquires
domain knowledge from other texts. The paper describes the system architecture,
its core
knowledge base and inference engine, and the acquisition of new knowledge
from corpora.
In this paper, we describe NKRL (Narrative Knowledge Representation
Language), a
language designed for representing, in a standardized way, the semantic
content
(the ‘meaning’) of complex narrative texts. After having
introduced informally the four ‘components’ (specialized sub-languages)
of NKRL, we will describe (some of) the data structures proper to each
of them, trying to show that the NKRL coding retains the main informational
elements of
the original narrative expressions. We will then focus on an important
subset of NKRL, the
so-called AECS sub-language, showing in particular that the operators of
this sub-language
can be used to represent some sorts of ‘plural’ expressions.
In this article, we give an overview of Natural Language Generation (NLG) from an applied system-building perspective. The article includes a discussion of when NLG techniques should be used; suggestions for carrying out requirements analyses; and a description of the basic NLG tasks of content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation, and linguistic realisation. Throughout, the emphasis is on established techniques that can be used to build simple but practical working systems now. We also provide pointers to techniques in the literature that are appropriate for more complicated scenarios.
Natural language generation is now moving away from research prototypes into more practical applications. Generation functionality is also being asked to play a more significant role in established applications such as machine translation. In both cases, multilingual generation techniques have much to offer. However, the take-up of multilingual generation is being restricted by a critical lack both of large-scale linguistic resources suited to the generation task and of appropriate development environments. This paper describes KPML, a multilingual development environment that offers one possible solution to these problems. KPML aims to provide generation projects with standardized, broad-coverage, reusable resources and a basic engine for using such resources for generation. A variety of focused debugging aids ensure efficient maintenance, while supporting multilingual work such as contrastive language development and automatic merging of independently developed resources. KPML is based on a new, generic approach to multilinguality in resource description that extends significantly beyond previous approaches. The system has already been used in a number of large generation projects and is freely available to the generation community.
This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The main reason for this is the systems of functional (closed) words are quite different in the two languages. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs (anchors) that correspond to each other by relaxing parameters. The method, by combining two kinds of word correspondences, achieves adequate word correspondences for complete alignment. As a result, texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese–English texts.
The paper presents background and motivation for a processing model
that segments discourse
into units that are simple, non-nested clauses, prior to the recognition
of clause internal phrasal
constituents, and experimental results in support of this model. One
set of results is derived
from a statistical reanalysis of the Swedish empirical data in
Strangert, Ejerhed and Huber
1993 concerning the linguistic structure of major prosodic units. The
other set of results is
derived from experiments in segmenting part of speech annotated Swedish
text corpora into
clauses, using a new clause segmentation algorithm. The clause segmented
corpus data is taken from the Stockholm Umeå Corpus (SUC), 1 M words
of Swedish
texts from different
genres, part of speech annotated by hand, and from the Umeå corpus
DAGENS INDUSTRI
1993 (DI93), 5 M words of Swedish financial newspaper text, processed by
fully automatic
means consisting of tokenizing, lexical analysis, and probabilistic POS
tagging. The results of
these two experiments show that the proposed clause segmentation
algorithm is 96% correct
when applied to manually tagged text, and 91% correct when applied
to probabilistically tagged text.