Hostname: page-component-77c78cf97d-5vn5w Total loading time: 0 Render date: 2026-04-24T02:36:06.039Z Has data issue: false hasContentIssue false

Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Published online by Cambridge University Press:  01 February 2011

HUA AI
Affiliation:
Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, USA e-mail: hua@cs.pitt.edu, litman@cs.pitt.edu, iamhuaai@gmail.com
DIANE LITMAN
Affiliation:
Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, USA e-mail: hua@cs.pitt.edu, litman@cs.pitt.edu, iamhuaai@gmail.com

Abstract

While different user simulations are built to assist dialog system development, there is an increasing need to quickly assess the quality of the user simulations reliably. Previous studies have proposed several automatic evaluation measures for this purpose. However, the validity of these evaluation measures has not been fully proven. We present an assessment study in which human judgments are collected on user simulation qualities as the gold standard to validate automatic evaluation measures. We show that a ranking model can be built using the automatic measures to predict the rankings of the simulations in the same order as the human judgments. We further show that the ranking model can be improved by using a simple feature that utilizes time-series analysis.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable