Towards developing general models of usability with PARADISE

MARILYN WALKER; CANDACE KAMM; DIANE LITMAN

doi:10.1017/S1351324900002503

Abstract

The design of methods for performance evaluation is a major open research issue in the area of spoken language dialogue systems. This paper presents the PARADISE methodology for developing predictive models of spoken dialogue performance, and shows how to evaluate the predictive power and generalizability of such models. To illustrate the methodology, we develop a number of models for predicting system usability (as measured by user satisfaction), based on the application of PARADISE to experimental data from three different spoken dialogue systems. We then measure the extent to which the models generalize across different systems, different experimental conditions, and different user populations, by testing models trained on a subset of the corpus against a test set of dialogues. The results show that the models generalize well across the three systems, and are thus a first approximation towards a general performance model of system usability.

Information

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Litman, Diane J. and Pan, Shimei 2002. Designing and Evaluating an Adaptive Spoken Dialogue System. User Modeling and User-Adapted Interaction, Vol. 12, Issue. 2-3, p. 111.

Hajdinjak, M. and Mihelic, F. 2003. Wizard of Oz experiments. Vol. 2, Issue. , p. 112.

Hajdinjak, Melita and Mihelič, France 2003. Text, Speech and Dialogue. Vol. 2807, Issue. , p. 400.

Porzel, Robert and Gurevych, Iryna 2003. Modeling and Using Context. Vol. 2680, Issue. , p. 272.

Cerrato, Loredana and Ekeklint, Susanne 2004. From Brows to Trust. Vol. 7, Issue. , p. 101.

Higashinaka, Ryuichiro Miyazaki, Noboru Nakano, Mikio and Aikawa, Kiyoaki 2004. Evaluating discourse understanding in spoken dialogue systems. ACM Transactions on Speech and Language Processing, Vol. 1, Issue. , p. 1.

Sanders, Gregory A. and Le, Audrey N. 2004. Effects of Speech Recognition Accuracy on the Performance of DARPA Communicator Spoken Dialogue Systems. International Journal of Speech Technology, Vol. 7, Issue. 4, p. 293.

Hirschberg, Julia Litman, Diane and Swerts, Marc 2004. Prosodic and other cues to speech recognition failures. Speech Communication, Vol. 43, Issue. 1-2, p. 155.

Dybkjær, Laila Bernsen, Niels Ole and Minker, Wolfgang 2004. Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, Vol. 43, Issue. 1-2, p. 33.

Skantze, Gabriel 2005. Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Communication, Vol. 45, Issue. 3, p. 325.

Kruschwitz, Udo and Al‐Bakour, Hala 2005. Users want more sophisticated search assistants: Results of a task‐based evaluation. Journal of the American Society for Information Science and Technology, Vol. 56, Issue. 13, p. 1377.

Lemon, Oliver Georgila, Kallirroi and Henderson, James 2006. EVALUATING EFFECTIVENESS AND PORTABILITY OF REINFORCEMENT LEARNED DIALOGUE STRATEGIES WITH REAL USERS: THE TALK TOWNINFO EVALUATION. p. 178.

Gelbart, David Bryant, John Stolcke, Andreas Porzel, Robert Baudis, Manja and Morgan, Nelson 2006. SmartKom: Foundations of Multimodal Dialogue Systems. p. 453.

Rieser, Verena and Lemon, Oliver 2006. USING LOGISTIC REGRESSION TO INITIALISE REINFORCEMENT-LEARNING-BASED DIALOGUE SYSTEMS. p. 190.

Andreani, G. Fabbrizio, G. Gilbert, M. Gillick, D. Hakkani-Tur, D. and Lemon, O. 2006. LET'S DISCOH: COLLECTING AN ANNOTATED OPEN CORPUSWITH DIALOGUE ACTS AND REWARD SIGNALS FOR NATURAL LANGUAGE HELPDESKS. p. 218.

Porzel, Robert Gurevych, Iryna and Malaka, Rainer 2006. SmartKom: Foundations of Multimodal Dialogue Systems. p. 269.

Hajdinjak, Melita and Mihelič, France 2006. The PARADISE Evaluation Framework: Issues and Findings. Computational Linguistics, Vol. 32, Issue. 2, p. 263.

Möller, Sebastian Smeele, Paula Boland, Heleen and Krebber, Jan 2007. Evaluating spoken dialogue systems according to de-facto standards: A case study. Computer Speech & Language, Vol. 21, Issue. 1, p. 26.

Williams, Jason D. 2007. A method for evaluating and comparing user simulations: The Cramér-von Mises divergence. p. 508.

Potamianos, A. Fosler-Lussier, E. Ammicht, E. and Perakakis, M. 2007. Information Seeking Spoken Dialogue Systems— Part II: Multimodal Dialogue. IEEE Transactions on Multimedia, Vol. 9, Issue. 3, p. 550.

Download full list

Article contents

Towards developing general models of usability with PARADISE

Abstract

Information

Access options

Article purchase

Temporarily unavailable

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Towards developing general models of usability with PARADISE

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests