To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Generating text in a hypermedia environment places different demands on a text generation system than occurs in non-interactive environments. This paper describes some of these demands, then shows how the architecture of one text generation system, ILEX, has been shaped by them. The architecture is described in terms of the levels of linguistic representation used, and the processes which map between them. Particular attention is paid to the processes of content selection and text structuring.
ANVIL is an information retrieval system using natural language processing techniques, intended for retrieval of captioned images. It extracts dependency structures from the image captions and user queries, and then applies a high accuracy matching algorithm which recursively explores the dependency structures to determine their similarity. A further algorithm allows additional contextual information to be extracted following a successful match, with the intention of helping users understand and organise the retrieval results. ANVIL was developed to high engineering standards, and as well as looking at the research aspects of the system, we also look at some of the design and development issues. English and Japanese versions of the system have been developed.
The automated analysis of natural language data has become a central issue in the design of Intelligent Information Systems. The term natural language is intended to cover all the possible modalities of human communication and it is not restricted to written or spoken language. Processing unrestricted natural language is still considered as an AI-hard task. However various analysis techniques have been proposed in order to address specific aspects of natural language. In particular, recent interest has been on providing approximate analysis techniques, assuming that perfect analysis is not possible, but that partial results are still very useful.
Compound noun segmentation is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without space in real text, which makes it difficult to identify its morphological constituents. This paper presents an effective method of Korean compound noun segmentation based on lexical data extracted from a corpus. The segmentation consists of two tasks: First, it uses a Hand-Build Segmentation Dictionary (HBSD) to segment compound nouns which frequently occur or need an exceptional process. Second, a segmentation algorithm using data from a corpus is proposed, where simple nouns and their frequencies are stored in a Simple Noun Dictionary (SND) for segmentation. The analysis is executed based on modified tabular parsing using min-max operation. Our experiments have shown a very effective accuracy rate of about 97.29%, which turns out to be very effective.
Transformation-Based Learning (TBL) is a relatively new machine learning method that has achieved notable success on language problems. This paper presents a variant of TBL, called Randomized TBL, that overcomes the training time problems of standard TBL without sacrificing accuracy. It includes a set of experiments on part-of-speech tagging in which the size of the corpus and template set are varied. The results show that Randomized TBL can address problems that are intractable in terms of training time for standard TBL. In addition, for language problems such as dialogue act tagging where the most effective features have not been identified through linguistic studies, Randomized TBL allows the researcher to experiment with a large set of templates capturing many potentially useful features and feature interactions.
Automatic Accent Insertion (AAI) is the problem of re-inserting accents (diacritics) into a text where they are missing. Unaccented French texts are still quite common in electronic media, as a result of a long history of character encoding problems and the lack of well-established conventions for typing accented characters on computer keyboards. An AAI method for French is presented, based on a statistical language model. Next, it is shown how this AAI method can be used to do real-time accent insertions within a word processing environment, making it possible to type in French without having to type accents. Various mechanisms are proposed to improve the performance of real-time AAI, by exploiting online corrections made by the user. Experiments show that, on average, such a system produces less than one accentuation error for every 200 words typed.
System evaluation has mattered since research on automatic language and information processing began. However, the (D)ARPA conferences have raised the stakes substantially in requiring and delivering systematic evaluations and in sustaining these through long term programmes; and it has been claimed that this has both significantly raised task performance, as defined by appropriate effectiveness measures, and promoted relevant engineering development. These controlled laboratory evaluations have made very strong assumptions about the task context. The paper examines these assumptions for six task areas, considers their impact on evaluation and performance results, and argues that for current tasks of interest, e.g. summarising, it is now essential to play down the present narrowly-defined performance measures in order to address the task context, and specifically the role of the human participant in the task, so that new measures, of larger value, can be developed and applied.
In this paper, we discuss a natural language interface to a database of structured textual descriptions in the form of annotations of video objects. The interface maps the natural language query input on to the annotation structures. The language processing is done in three phases of expectations and implications from the input word, disambiguation of noun implications and slot-filling of prepositional expectations, and finally, disambiguation of verbal expectations. The system has been tested with different types of user inputs, including ill-formed sentences, and studied for erroneous inputs and for different types of portability issues.
We present a data-to-speech system called D2S, which can be used for the creation of data-to-speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a natural language text expressing the system's input data, and speech generation is used to make this text audible. In D2S, this combination is exploited by using linguistic information available in the language generation module for the computation of prosody. This allows us to achieve a better prosodic output quality than can be achieved in a plain text-to-speech system. For language generation in D2S, the use of syntactically enriched templates is guided by knowledge of the discourse context, while for speech generation pre-recorded phrases are combined in a prosodically sophisticated manner. This combination of techniques makes it possible to create linguistically sound but efficient systems with a high quality language and speech output.
The design of methods for performance evaluation is a major open research issue in the area of spoken language dialogue systems. This paper presents the PARADISE methodology for developing predictive models of spoken dialogue performance, and shows how to evaluate the predictive power and generalizability of such models. To illustrate the methodology, we develop a number of models for predicting system usability (as measured by user satisfaction), based on the application of PARADISE to experimental data from three different spoken dialogue systems. We then measure the extent to which the models generalize across different systems, different experimental conditions, and different user populations, by testing models trained on a subset of the corpus against a test set of dialogues. The results show that the models generalize well across the three systems, and are thus a first approximation towards a general performance model of system usability.
This paper is a case study of user involvement in the requirements specification for project ISLE: Interactive Spoken Language Education. Developers of Spoken Language Dialogue Systems should involve users from the outset, particularly if the aim is to develop novel solutions for a generic target application area or market. As well as target end-users, SLDS developers should identify and consult ‘meta-level’ domain experts with expertise in human-to-human dialogue in the target domain. In our case, English language teachers and publishers provided generic knowledge of learners' dialogue preferences; other applications have analogous domain language experts. These domain language experts can help to pin down a domain-specific sublanguage which fits the constraints of current speech recognition technology: linguistically-naive end-users may expect unconstrained conversational English, but in practice, dialogue interactions have to be constrained in vocabulary and syntax. User consultation also highlighted a need to consider how to integrate speech input and output with other modes of interaction and processing; in our case the input speech signal is processed by speech recogniser, stress and mispronunciation detectors, and output responses are text and graphics as well as speech. This suggests a need to revisit the definition of ‘dialogue’: other SLDS developers should also consider the merits of multimodality as an adjunct to pure spoken language dialogue, particularly given that current systems are not capable of accurately handling unconstrained English.
In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer-directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.
This paper describes our work on dialogue systems that can mimic human conversation, with the goal of providing intuitive access to a wide range of applications by expanding the user's options in the interaction. We concentrate on practical dialogue: dialogues in which the participants need to accomplish some objective or perform some task. Two hypotheses regarding practical dialogue motivate our research. First, that the conversational competence required for practical dialogues, while still complex, is significantly simpler to achieve than general human conversational competence. And second, that within the genre of practical dialogue, the bulk of the complexity in the language interpretation and dialogue management is independent of the task being performed. If these hypotheses are true, then it should be possible to build a generic dialogue shell for practical dialogue, by which we mean the full range of components required in a dialogue system, including speech recognition, language processing, dialogue management and response planning, built in such a way as to be readily adapted to new applications by specifying the domain and task models. This paper documents our progress and what we have learned so far based on building and adapting systems in a series of different problem solving domains.
This article provides a global overview of the main aspects of current practice in the design, implementation and evaluation of speech recognition components for Spoken Language Dialog Systems (SLDSs), and presents the results of the DISC European project related to speech recognition. DISC and its successor DISC-2 are efforts towards the definition of best practice guidelines for SLDS development and evaluation. SLDSs aim at using natural spoken input for performing an information processing task such as automated standards, call routing or travel planning and reservations. The main functionality of an SLDS are speech recognition, natural language understanding, dialog management, database access and interpretation, response generation and speech synthesis. Speech recognition, which transforms the acoustic signal into a string of words, is a key technology in any SLDS.
Whilst Spoken Language Dialogue Systems (SLDSs) technology has made good progress in recent years, the issue of SLDS usability is still lagging behind both theoretically and in actual SLDS development and evaluation. However, as more products reach the market and competition intensifies, there is growing recognition of the importance of systematically understanding the factors which must be taken into account in order to optimise SLDS usability. Ideally, this understanding should be comprehensive (i.e. include all major human factors perspectives on SLDSs), and exhaustive (i.e. describe each perspective as it pertains to the detailed development and evaluation of any possible SLDS). This paper addresses the requirement of comprehensiveness by decomposing the complex space of SLDS usability best practice into eleven issues which should be considered by developers during specification, design, development and evaluation. The discussion of each issue is aimed to support the developer in building SLDSs which are likely to generate user satisfaction, which are perceived to be easy to understand and control, and which enable smooth user- system interaction. Based on the best practice issues discussed, criteria for evaluating SLDS usability are proposed. Several limits to our current understanding of SLDS usability are highlighted.