Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-07T14:52:23.806Z Has data issue: false hasContentIssue false

Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue

Published online by Cambridge University Press:  12 September 2018

Koji Inoue*
Affiliation:
Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
Divesh Lala
Affiliation:
Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
Katsuya Takanashi
Affiliation:
Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
Tatsuya Kawahara
Affiliation:
Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
*
Corresponding author: Koji Inoue Email: inoue@sap.ist.i.kyoto-u.ac.jp

Abstract

Engagement represents how much a user is interested in and willing to continue the current dialogue. Engagement recognition will provide an important clue for dialogue systems to generate adaptive behaviors for the user. This paper addresses engagement recognition based on multimodal listener behaviors of backchannels, laughing, head nodding, and eye gaze. In the annotation of engagement, the ground-truth data often differs from one annotator to another due to the subjectivity of the perception of engagement. To deal with this, we assume that each annotator has a latent character that affects his/her perception of engagement. We propose a hierarchical Bayesian model that estimates both engagement and the character of each annotator as latent variables. Furthermore, we integrate the engagement recognition model with automatic detection of the listener behaviors to realize online engagement recognition. Experimental results show that the proposed model improves recognition accuracy compared with other methods which do not consider the character such as majority voting. We also achieve online engagement recognition without degrading accuracy.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2018
Figure 0

Fig. 1. Scheme of engagement recognition.

Figure 1

Fig. 2. Setup for dialogue collection.

Figure 2

Fig. 3. Inter-annotator agreement score (Cohen's kappa) on each pair of the annotators.

Figure 3

Table 1. The number of times selected by the annotators as meaningful behaviors

Figure 4

Table 2. Relationship between the occurrence of each behavior and the annotated engagement (1: occurred / engaged, 0: not occurred / not engaged)

Figure 5

Table 3. Accuracy scores of each annotator's engagement labels when the reference labels of each behavior are used

Figure 6

Fig. 4. Graphical model of latent character model.

Figure 7

Table 4. Engagement recognition accuracy (K is the number of characters.)

Figure 8

Table 5. Engagement recognition accuracy of the online processing

Figure 9

Table 6. Recognition accuracy without each behavior of the proposed method

Figure 10

Fig. 5. Estimated parameter values of character distribution (Each value corresponds to the probability that each annotator has each character.).

Figure 11

Table 7. Clustered annotators based on character distribution and averaged in-cluster agreement scores

Figure 12

Table 8. Averaged agreement scores between-clusters

Figure 13

Fig. 6. Estimated parameter values of engagement distribution (Each value corresponds to the probability that each behavior pattern is recognized as engaged by each character. The number in parentheses next to the behavior pattern is the frequency of the behavior pattern in the corpus.).

Figure 14

Table 9. Regression weights for mapping from Big Five scores to character distribution

Figure 15

Fig. 7. Real-time engagement visualization tool.