Skip to main content Accessibility help
×
×
Home

Bootstrapping spoken dialogue systems by exploiting reusable libraries

  • GIUSEPPE DI FABBRIZIO (a1), GOKHAN TUR (a1), DILEK HAKKANI-TÜR (a1), MAZIN GILBERT (a1), BERNARD RENGER (a1), DAVID GIBBON (a2), ZHU LIU (a2) and BEHZAD SHAHRARAY (a2)...
Abstract

Building natural language spoken dialogue systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes and defines by hand the system core functionalities: the system semantic scope (call-types) and the dialogue manager strategy that will drive the human–machine interaction. This approach is extensive and error-prone since it involves several nontrivial design decisions that can be evaluated only after the actual system deployment. Moreover, scalability is compromised by time, costs, and the high level of UE know-how needed to reach a consistent design. We propose a novel approach for bootstrapping spoken dialogue systems based on the reuse of existing transcribed and labeled data, common reusable dialogue templates, generic language and understanding models, and a consistent design process. We demonstrate that our approach reduces design and development time while providing an effective system without any application-specific data.

Copyright
References
Hide All
Abella, A. and Gorin, A. 1999. Construct algebra: Analytical dialog management. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Washington, DC, June.
Bobrow, D. and Fraser, B. 1969. An augmented state transition network analysis procedure. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 557–567, Washington, DC, May.
Buntschuh, B., Kamm, C., Di Fabbrizio, G., Abella, A., Mohri, M., Narayanan, S., Zeljkovic, I., Sharp, R. D., Wright, J., Marcus, S., Shaffer, J., Duncan, R. and Wilpon, J. G., 1998. VPQ: A spoken language interface to large scale directory information. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Sydney, New South Wales, Australia, November.
Di Fabbrizio, G., Dutton, D., Gupta, N., Hollister, B., Rahim, M., Riccardi, G., Schapire, R. and Schroeter, J. 2002. AT&T Help Desk. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, September.
Di Fabbrizio, G. and Lewis, C. 2004. Florence: A dialogue manager framework for spoken dialogue systems. In ICSLP 2004, 8th International Conference on Spoken Language Processing, Jeju, Jeju Island, Korea, October 4–8.
Di Fabbrizio, G., Tur, G. and Hakkani-Tür, D. 2004. Bootstrapping spoken dialog systems with data reuse. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue, Cambridge, MA, April 30 – May 1.
Dybkjr, L. and Bernsen, N. 2000. The MATE workbench. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, May.
Godfrey, J. J., Holliman, E. C. and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 1, pages 517–520, San Francisco, CA, March.
Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M., Riccardi, G. and Saraclar, M. 2005. The AT&T Watson Speech Recognizer. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, May 19–23.
Gorin, A. L., Riccardi, G. and Wright, J. H. 1997. How may I help you? Speech Communication 23: 113127, October.
Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G. and Rahim, M. 2006. The AT&T Spoken Language Understanding System. IEEE Transactions on Audio, Speech and Language Processing 14 (1): 213222, January.
Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Computer Speech & Language 13: 267282, July.
Kotelly, B. 2003. The Art and the Business of Speech Recognition—Creating the Noble Voice, chapter 5, pp. 58–64. Addison-Wesley.
Lewis, C. and Di Fabbrizio, G. 2005. A clarification algorithm for spoken dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, May 19–23.
McTear, M. F. 2002. Spoken dialogue technology: enabling the conversational user interface. ACM Computing Surveys (CSUR) 34 (1): 90169, March.
NAICS. 2002. North American Industry Classification System (NAICS). http://www.census.gov/epcd/www/naics.html
Natarajan, P., Prasad, R., Suhm, B. and McCarthy, D. 2002. Speech enabled natural language call routing: BBN call director. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, September.
Paek, T. 2001. Empirical methods for evaluating dialog systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Workshop on Evaluation Methodologies for Language and Dialogue Systems, Toulouse, France, July.
Riccardi, G. and Hakkani-Tür, D. 2003. Active and unsupervised learning for automatic speech recognition. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, September.
Riccardi, G., Pieraccini, R. and Bocchieri, E. 1996. Stochastic automata for language modeling. Computer Speech & Language, 10: 265293.
Rosenfeld, R. 1995. Optimizing lexical and n-gram coverage via judicious use of linguistic data. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 2, pp. 1763–1766, Madrid, Spain, September.
Schapire, R. E. and Singer, Y. 2000. BoosTexter: A boosting-based system for text categorization. Machine Learning 39 (2/3): 135168.
Schapire, R. E., Rochery, M., Rahim, M. and Gupta, N. 2002. Incorporating prior knowledge into boosting. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, New South Wales, Australia, July.
Schapire, R. E. 2001. The boosting approach to machine learning: An overview. In Proceedings of the MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, March.
Sutton, S. and Cole, R. 1998. Universal speech tools: The CSLU toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Sydney, New South Wales, Australia, November.
Tur, G., Hakkani-Tür, D. and Schapire, R. E. 2005. Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45 (2): 171186.
Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, September.
VoiceXML. 2003. Voice extensible markup language (VoiceXML) version 2.0. http://www.w3.org/TR/voicexml20/
Walker, M. A., Litman, D. J.Kamm, C. A. and Abella, A. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)–Conference of the European Chapter of the Association for Computational Linguistics (EACL), Madrid, Spain, July.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed