Introduction
Best practice in spoken language dialogue systems engineering: Introduction to the special issue
- JAN VAN KUPPEVELT, ULRICH HEID, HANS KAMP
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 205-212
-
- Article
-
- You have access Access
- Export citation
-
This special issue brings together representative views on what has come to be known as "best practice" in the development and evaluation of spoken language dialogue systems (SLDSs). The issue was initiated in the context of the European Esprit project DISC, which ran from June 1997 till February 2000. DISC's main goal was to identify current practice in both the development and the evaluation of SLDSs, in order to arrive at a useful definition and description of best practice. The project has resulted in a collection of guidelines which are intended for different target groups, in particular developers, deployers and customers.
DISC partners were: Natural Interactive Systems Laboratory, Odense University, Denmark (coordination); Department of Speech, Music and Hearing (KTH), Stockholm, Sweden; Human-Machine Communication Department, CNRS-LIMSI, Orsay, France; Institute for Natural Language Processing (IMS), University of Stuttgart, Germany; Vocalis Ltd, Cambridge, United Kingdom; DaimlerChrysler Research Center Ulm, Germany; and the ELSNET foundation, Utrecht, The Netherlands. The last few years the interest in SLDSs has increased enormously. At present there is a large number of systems available, many of them for commercial use. Their number is growing rapidly, and so are the variety of their functionalities and the diversity of their application domains. The tasks that advanced systems are able to perform are often more complex, less stereotypical, and are often carried out in the context of several interconnected domains of application. With these advances have come higher expectations of the naturalness and intelligence with which SLDSs fulfill their assignments, and as a consequence the interest in such systems has even grown more, both within academic and commercial circles. As far as natural human- system interaction is concerned, one significant change in SLDS design concerns the interaction between natural language understanding and dialogue management. Here we see a clear tendency towards models that incorporate a substantial amount of discourse semantics and make use of some conception of context-change. This allows for more natural interactions between the system and its human users, due on the one hand to the system's improved ability to compute the intended meaning of the user's input and on the other to the increased sophistication of the strategies it uses for planning its own responses. Such improved capacities are crucial when the system is to leave more of the initiative to the user, instead of keeping the dialogue on a narrowly circumscribed path of largely predictable exchanges. Further, there is a tendency to combine spoken language human-system interaction with other modalities of information exchange and representation (e.g., images and gestures), asking for both modality-specific and modality-integrating syntactic and semantic processing capabilities. All these developments have led to a situation in which there is a great need, shared by developers, deployers and customers alike, for effective guidelines, which will enable them to make accurate and successful design and implementation decisions, in accordance with broad consensus of what must be best practice in this particular engineering domain.
Research Article
An architecture for a generic dialogue shell
- JAMES ALLEN, DONNA BYRON, MYROSLAVA DZIKOVSKA, GEORGE FERGUSON, LUCIAN GALESCU, AMANDA STENT
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 213-228
-
- Article
- Export citation
-
This paper describes our work on dialogue systems that can mimic human conversation, with the goal of providing intuitive access to a wide range of applications by expanding the user's options in the interaction. We concentrate on practical dialogue: dialogues in which the participants need to accomplish some objective or perform some task. Two hypotheses regarding practical dialogue motivate our research. First, that the conversational competence required for practical dialogues, while still complex, is significantly simpler to achieve than general human conversational competence. And second, that within the genre of practical dialogue, the bulk of the complexity in the language interpretation and dialogue management is independent of the task being performed. If these hypotheses are true, then it should be possible to build a generic dialogue shell for practical dialogue, by which we mean the full range of components required in a dialogue system, including speech recognition, language processing, dialogue management and response planning, built in such a way as to be readily adapted to new applications by specifying the domain and task models. This paper documents our progress and what we have learned so far based on building and adapting systems in a series of different problem solving domains.
User-guided system development in Interactive Spoken Language Education
- ERIC ATWELL, PETER HOWARTH, CLIVE SOUTER, PATRIZIO BALDO, ROBERTO BISIANI, DARIO PEZZOTTA, PATRIZIA BONAVENTURA, WOLFGANG MENZEL, DANIEL HERRON, RACHEL MORTON, JUERGEN SCHMIDT
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 229-241
-
- Article
- Export citation
-
This paper is a case study of user involvement in the requirements specification for project ISLE: Interactive Spoken Language Education. Developers of Spoken Language Dialogue Systems should involve users from the outset, particularly if the aim is to develop novel solutions for a generic target application area or market. As well as target end-users, SLDS developers should identify and consult ‘meta-level’ domain experts with expertise in human-to-human dialogue in the target domain. In our case, English language teachers and publishers provided generic knowledge of learners' dialogue preferences; other applications have analogous domain language experts. These domain language experts can help to pin down a domain-specific sublanguage which fits the constraints of current speech recognition technology: linguistically-naive end-users may expect unconstrained conversational English, but in practice, dialogue interactions have to be constrained in vocabulary and syntax. User consultation also highlighted a need to consider how to integrate speech input and output with other modes of interaction and processing; in our case the input speech signal is processed by speech recogniser, stress and mispronunciation detectors, and output responses are text and graphics as well as speech. This suggests a need to revisit the definition of ‘dialogue’: other SLDS developers should also consider the merits of multimodality as an adjunct to pure spoken language dialogue, particularly given that current systems are not capable of accurately handling unconstrained English.
Usability issues in spoken dialogue systems
- LAILA DYBKJAER, NIELS OLE BERNSEN
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 243-271
-
- Article
- Export citation
-
Whilst Spoken Language Dialogue Systems (SLDSs) technology has made good progress in recent years, the issue of SLDS usability is still lagging behind both theoretically and in actual SLDS development and evaluation. However, as more products reach the market and competition intensifies, there is growing recognition of the importance of systematically understanding the factors which must be taken into account in order to optimise SLDS usability. Ideally, this understanding should be comprehensive (i.e. include all major human factors perspectives on SLDSs), and exhaustive (i.e. describe each perspective as it pertains to the detailed development and evaluation of any possible SLDS). This paper addresses the requirement of comprehensiveness by decomposing the complex space of SLDS usability best practice into eleven issues which should be considered by developers during specification, design, development and evaluation. The discussion of each issue is aimed to support the developer in building SLDSs which are likely to generate user satisfaction, which are perceived to be easy to understand and control, and which enable smooth user- system interaction. Based on the best practice issues discussed, criteria for evaluating SLDS usability are proposed. Several limits to our current understanding of SLDS usability are highlighted.
Speech technology on trial: Experiences from the August system
- JOAKIM GUSTAFSON, LINDA BELL
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 273-286
-
- Article
- Export citation
-
In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer- directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.
Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI)
- KATE S. HONE, ROBERT GRAHAM
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 287-303
-
- Article
- Export citation
-
Applications of speech recognition are now widespread, but user-centred evaluation methods are necessary to ensure their success. Objective evaluation techniques are fairly well established, but previous subjective techniques have been unstructured and unproven. This paper reports on the first stage in the development of a questionnaire measure for the Subjective Assessment of Speech System Interfaces (SASSI). The aim of the research programme is to produce a valid, reliable and sensitive measure of users' subjective experiences with speech recognition systems. Such a technique could make an important contribution to theory and practice in the design and evaluation of speech recognition systems according to best human factors practice. A prototype questionnaire was designed, based on established measures for evaluating the usability of other kinds of user interface, and on a review of the research literature into speech system design. This consisted of 50 statements with which respondents rated their level of agreement. The questionnaire was given to users of four different speech applications, and Exploratory Factor Analysis of 214 completed questionnaires was conducted. This suggested the presence of six main factors in users' perceptions of speech systems: System Response Accuracy, Likeability, Cognitive Demand, Annoyance, Habitability and Speed. The six factors have face validity, and a reasonable level of statistical reliability. The findings form a useful theoretical and practical basis for the subjective evaluation of any speech recognition interface. However, further work is recommended, to establish the validity and sensitivity of the approach, before a final tool can be produced which warrants general use.
Towards best practice in the development and evaluation of speech recognition components of a spoken language dialog system
- LORI LAMEL, WOLFGANG MINKER, PATRICK PAROUBEK
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 305-322
-
- Article
- Export citation
-
This article provides a global overview of the main aspects of current practice in the design, implementation and evaluation of speech recognition components for Spoken Language Dialog Systems (SLDSs), and presents the results of the DISC European project related to speech recognition. DISC and its successor DISC-2 are efforts towards the definition of best practice guidelines for SLDS development and evaluation. SLDSs aim at using natural spoken input for performing an information processing task such as automated standards, call routing or travel planning and reservations. The main functionality of an SLDS are speech recognition, natural language understanding, dialog management, database access and interpretation, response generation and speech synthesis. Speech recognition, which transforms the acoustic signal into a string of words, is a key technology in any SLDS.
Information state and dialogue management in the TRINDI dialogue move engine toolkit
- STAFFAN LARSSON, DAVID R. TRAUM
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 323-340
-
- Article
- Export citation
-
We introduce an architecture and toolkit for building dialogue managers currently being developed in the TRINDI project, based on the notions of information state and dialogue move engine. The aim is to provide a framework for experimenting with implementations of different theories of information state, information state update and dialogue control. A number of dialogue managers are currently being built using the toolkit, and we present overviews of two of them. We believe that this framework will make implementation of dialogue processing theories easier, also facilitating comparison of different types of dialogue systems, thus helping to achieve a prerequisite for arriving at a best practice for the development of the dialogue management component of a spoken dialogue system.
Object-oriented modelling of spoken language dialogue systems
- IAN M. O'NEILL, MICHAEL F. McTEAR
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 341-362
-
- Article
- Export citation
-
In this paper we show how established object modelling techniques can be used in the creation of spoken dialogue management systems. One of the motivations behind the particular approach adopted here is the observation that, in spoken human-to-human dialogues, certain skillsets and patterns of dialogue evolution are common to many different contexts; other dialogue skills and accompanying real-world knowledge are required only for more specialised transactions within particular business domains. As a starting point for modelling an automated spoken dialogue management system we recommend a use case analysis of the required functionality. The use case analysis encourages the developer to identify generic-specific relationships and interactions between different dialogue management skills. We consider some of the broad philosophies underlying current dialogue management systems and outline practical high-level dialogue behaviour based on mixed-initiative, frame-based processing, combined with a rigorously applied confirmation strategy. On the basis of the use case requirements analysis, we explore a possible design for an object-oriented dialogue management system, indicating the roles and relationships of the various classes that embody the required dialogue functionality, and showing how implemented objects within the system will interact. The manner of this interaction is such as to allow one overall system to process transactions in several business domains. We also indicate some of the advantages of a rule-based implementation: the proposed design is tailored towards such an implementation in Prolog++. An object-oriented development process places high-level, generic dialogue management functionality at the disposal of more specialised ‘expert’ components. Maintainability and extensibility are therefore enhanced: if the developer chooses to refine generic behaviour, it is immediately available to the more specialised components; if new domain-specific expertise is required, it can be added with minimal impact on generic behaviour.
Towards developing general models of usability with PARADISE
- MARILYN WALKER, CANDACE KAMM, DIANE LITMAN
-
- Published online by Cambridge University Press:
- 26 March 2001, pp. 363-377
-
- Article
- Export citation
-
The design of methods for performance evaluation is a major open research issue in the area of spoken language dialogue systems. This paper presents the PARADISE methodology for developing predictive models of spoken dialogue performance, and shows how to evaluate the predictive power and generalizability of such models. To illustrate the methodology, we develop a number of models for predicting system usability (as measured by user satisfaction), based on the application of PARADISE to experimental data from three different spoken dialogue systems. We then measure the extent to which the models generalize across different systems, different experimental conditions, and different user populations, by testing models trained on a subset of the corpus against a test set of dialogues. The results show that the models generalize well across the three systems, and are thus a first approximation towards a general performance model of system usability.