Skip to main content
×
×
Home

Multi-modal sensing and analysis of poster conversations with smart posterboard

  • Tatsuya Kawahara (a1), Takuma Iwatate (a1), Koji Inoue (a1), Soichiro Hayashi (a1), Hiromasa Yoshimoto (a1) and Katsuya Takanashi (a1)...
Abstract

Conversations in poster sessions in academic events, referred to as poster conversations, pose interesting, and challenging topics on multi-modal signal and information processing. We have developed a smart posterboard for multi-modal recording and analysis of poster conversations. The smart posterboard has multiple sensing devices to record poster conversations, so we can review who came to the poster and what kind of questions or comments he/she made. The conversation analysis incorporates face and eye-gaze tracking for effective speaker diarization. It is demonstrated that eye-gaze information is useful for predicting turn-taking and also improving speaker diarization. Moreover, high-level indexing of interest and comprehension level of the audience is explored based on the multi-modal behaviors during the conversation. This is realized by predicting the audience's speech acts such as questions and reactive tokens.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Multi-modal sensing and analysis of poster conversations with smart posterboard
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Multi-modal sensing and analysis of poster conversations with smart posterboard
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Multi-modal sensing and analysis of poster conversations with smart posterboard
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Corresponding author
Corresponding author: T. Kawahara Email: kawahara@i.kyoto-u.ac.jp
References
Hide All
[1] Renals, S.; Hain, T.; Bourlard, H.: Recognition and understanding of meetings: the AMI and AMIDA projects, in Proc. IEEE Workshop Automatic Speech Recognition & Understanding, Kyoto, 2007.
[2] Ohtsuka, K.: Conversation scene analysis. Signal Process. Mag., 28(4) (2011), 127131.
[3] Kawahara, T.: Multi-modal sensing and analysis of poster conversations toward smart posterboard, in Proc. SIGdial (keynote speech), Seoul, 2012, 19.
[4] Kawahara, T.: Smart posterboard: multi-modal sensing and analysis of poster conversations, in Proc. APSIPA ASC, page (Plenary overview talk), Kaohsiung, 2013.
[5] Kawahara, T.; Sumi, K.; Chang, Z.Q.; Takanashi, K.: Detection of hot spots in poster conversations based on reactive tokens of audience, in Proc. INTERSPEECH, Makuhari, 2010, 30423045.
[6] Kawahara, T.; Setoguchi, H.; Takanashi, K.; Ishizuka, K.; Araki, S.: Multi-modal recording, analysis and indexing of poster sessions, in Proc. INTERSPEECH, Brisbane, 2008, 16221625.
[7] Maekawa, K.: Corpus of Spontaneous Japanese: its design and evaluation, in Proc. ISCA & IEEE Workshop on Spontaneous Speech Process. Recognit., Tokyo, 2003, 712.
[8] Nakamura, K.; Nakadai, K.; Asano, F.; Ince, G.: Intelligent sound source localization and its application to multimodal human tracking, in Proc. IROS, San Francisco, 2011.
[9] Wakabayashi, Y.; Inoue, K.; Yoshimoto, H.; Kawahara, T.: Speaker diarization based on audio-visual integration for smart posterboard, in Proc. APSIPA ASC, Siem Reap, 2014.
[10] Yoshimoto, H.; Nakamura, Y.: Cubistic representation for real-time 3D shape and pose estimation of unknown rigid object, in Proc. ICCV Workshop, Sydney, 2013, 522529.
[11] Ohsuga, T.; Nishida, M.; Horiuchi, Y.; Ichikawa, A.: Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue, in Proc. INTERSPEECH, Lisbon, 2005, 3336.
[12] Ishi, C.T.; Ishiguro, H.; Hagita, N.: Analysis of prosodic and linguistic cues of phrase finals for turn-taking and dialog acts, in Proc. INTERSPEECH, Pittsburgh, 2006, 20062009.
[13] Ward, N.G.; Bayyari, Y.A.: A case study in the identification of prosodic cues to turn-taking: back-channeling in Arabic, in Proc. INTERSPEECH, Pittsburgh, 2006, 20182021.
[14] Xiao, B.; Rozgic, V.; Katsamanis, A.; Baucom, B.R.; Georgiou, P.G.; Narayanan, S.: Acoustic and visual cues of turn-taking dynamics in dyadic interactions, in Proc. INTERSPEECH, Florence, 2011, 24412444.
[15] Sato, R.; Higashinaka, R.; Tamoto, M.; Nakano, M.; Aikawa, K.: Learning decision trees to determine turn-taking by spoken dialogue systems, in Proc. ICSLP, Denver, 2002, 861864.
[16] Schlangen, D.: From reaction to prediction: experiments with computational models of turn-taking, in Proc. INTERSPEECH, Pittsburgh, 2006, 20102013.
[17] Raux, A.; Eskenazi, M.: A finite-state turn-taking model for spoken dialog systems, in Proc. HLT/NAACL, Boulder, 2009.
[18] Ward, N.G.; Fuentes, O.; Vega, A.: Dialog prediction for a general model of turn-taking, in Proc. INTERSPEECH, Makuhari, 2010, 26622665.
[19] Bohus, D.; Horvitz, E.: Models for multiparty engagement in open-world dialog, in Proc. SIGdial, London, 2009.
[20] Fujie, S.; Matsuyama, Y.; Taniyama, H.; Kobayashi, T.: Conversation robot participating in and activating a group communication, in Proc. INTERSPEECH, Brighton, 2009, 264267.
[21] Laskowski, K.; Edlund, J.; Heldner, M.: A single-port non-parametric model of turn-taking in multi-party conversation, in Proc. ICASSP, Prague, 2011, 56005603.
[22] Jokinen, K.; Harada, K.; Nishida, M.; Yamamoto, S.: Turn-alignment using eye-gaze and speech in conversational interaction, in Proc. INTERSPEECH, Florence, 2011, 20182021.
[23] Kendon, A.: Some functions of gaze direction in social interaction. Acta Psychol., 26 (1967), 2263.
[24] Tranter, S.E.; Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. ASLP, 14(5) (2006), 15571565.
[25] Reynolds, D.A.; Kenny, P.; Castaldo, F.: A study of new approaches to speaker diarization, in Proc. INTERSPEECH, Brighton, 2009, 10471050.
[26] Friedland, G. et al. : The ICSI RT-09 speaker diarization system. IEEE Trans. ASLP, 20(2) (2012), 371381.
[27] Macho, D. et al. : Automatic speech activity detection, source localization, and speech recognition on the CHIL seminar corpus, in Proc. ICME, Amsterdam, 2005, 876879.
[28] Anguera, X.; Wooters, C.; Hernando, J.: Acoustic beamforming for speaker diarization of meetings. IEEE Trans. ASLP, 15(7) (2007), 20112022.
[29] Araki, S.; Fujimoto, M.; Ishizuka, K.; Sawada, H.; Makino, S.: A DOA based speaker diarization system for real meetings, in Proc. HSCMA, Trento, 2008, 2932.
[30] Ishiguro, K.; Yamada, T.; Araki, S.; Nakatani, T.; Sawada, H.: Probabilistic speaker diarization with bag-of-words representations of speaker angle information. IEEE Trans. ASLP, 20(2) (2012), 447460.
[31] Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag., 34(3) (1986), 276280.
[32] Yamamoto, K.; Asano, F.; Yamada, T.; Kitawaki, N.: Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans., E89-A(8) (2006), 21582165.
[33] Misra, H.; Bourlard, H.; Tyagi, V.: New entropy based combination rules in hmm/ann multi-stream asr, in Proc. ICASSP, Hong Kong, vol. 2, 2003, 741744.
[34] Fiscus, J.G.; Ajot, J.; Michel, M.; Garofolo, J.S.: The Rich Transcription 2006 Spring Meeting Recognition Evaluation. Bethesda, 2006.
[35] Kawahara, T.; Chang, Z.Q.; Takanashi, K.: Analysis on prosodic features of Japanese reactive tokens in poster conversations, in Proc. Int. Conf. on Speech Prosody, Chicago, 2010.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed