Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-06T22:26:22.782Z Has data issue: false hasContentIssue false

Reliable uncertainty estimation in emotion recognition in conversation using conformal prediction framework

Published online by Cambridge University Press:  30 October 2024

Samad Roohi*
Affiliation:
Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, Australia
Richard Skarbez
Affiliation:
Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, Australia
Hien Duy Nguyen
Affiliation:
Department of Mathematical and Physical Science, La Trobe University, Melbourne, Victoria, Australia Institute of Mathematics for Industry, Kyushu University, Fukuoka, Fukuoka Prefecture, Japan
*
Corresponding author: Samad Roohi; Email: s.roohi@latrobe.edu.au
Rights & Permissions [Opens in a new window]

Abstract

Emotion recognition in conversation (ERC) faces two major challenges: biased predictions and poor calibration. Classifiers often disproportionately favor certain emotion categories, such as neutral, due to the structural complexity of classifiers, the subjective nature of emotions, and imbalances in training datasets. This bias results in poorly calibrated predictions where the model’s predicted probabilities do not align with the true likelihood of outcomes. To tackle these problems, we introduce the application of conformal prediction (CP) into ERC tasks. CP is a distribution-free method that generates set-valued predictions to ensure marginal coverage in classification, thus improving the calibration of models. However, inherent biases in emotion recognition models prevent baseline CP from achieving a uniform conditional coverage across all classes. We propose a novel CP variant, class spectrum conformation, which significantly reduces coverage bias in CP methods. The methodologies introduced in this study enhance the reliability of prediction calibration and mitigate bias in complex natural language processing tasks.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Illustration of emotion prediction using traditional and conformal prediction methods. The input into the predictive model $\hat{\pi }$ includes a context and a query text with the true label ‘sadness’. The traditional prediction method outputs a single prediction ‘neutral’, which is incorrect. On the other hand, the conformal prediction method provides a set of plausible emotions ‘sadness’, ‘neutral’, and ‘joy’ considering the specified confidence level, as indicated by the dashed line at $1-\hat{q}$.

Figure 1

Figure 2. Architecture of a RoBERTa model, fine-tuned for emotion recognition in conversation (ERC). This model can be replaced by any arbitrary classifier.

Figure 2

Figure 3. Schematic of the conformal prediction process. The softmax scores produced by the classifier is used to calculate nonconformity measures, which are used to establish a prediction threshold based on a calculated quantile. The final step produces the prediction set {2,4} as potential class labels with the predefined confidence level $\alpha$.

Figure 3

Table 1. Accuracy of emotion classification on three selected emotional conversation datasets. Labels 0–6 for MELD are sadness, surprise, neutral, joy, anger, disgust, and fear; labels 0–6 for EmoWOZ are neutral, fearful (sad), dissatisfied, apologetic, abusive, excited, and satisfied; and labels 0–3 for Emocx are others, happy, sad, and angry

Figure 4

Figure 4. Distribution of (a) MELD, (b) EmoWOZ, and (c) EmoContext datasets utilized in this paper: demonstrating imbalance across emotion categories.

Figure 5

Figure 5. Calibration plots for MELD, EmoWoZ, and EmoCX datasets showing reliability diagrams. ECE and MCE values indicate poor calibration. The bars show accuracy per bin, and the blue line connects the average confidence per bin.

Figure 6

Algorithm 1 Class Spectrum Conformal Prediction (CSCP)

Figure 7

Algorithm 2 Class Spectrum Adaptive Prediction Set (CS-APS)

Figure 8

Figure 6. Comparative analysis of prediction set coverage between (a) naive top-k prediction sets and (b) size-stratified prediction sets for CP. CP demonstrating consistently higher accuracy and superior performance across various set sizes compared to the naive methods.

Figure 9

Figure 7. The calibration for conformal prediction considering various confidence levels.

Figure 10

Figure 8. Comparative analysis of marginal coverage and average prediction set sizes for four approaches (LAC, CSCP, APS, CS-APS) across three datasets (MELD, EmoWOZ, and EmoCX). CS-APS outperformed other methods in marginal coverage with an efficient prediction size.

Figure 11

Figure 9. Class spectrum coverage analysis across MELD, EmoWOZ, and EmoCX datasets: comparing the performance of LAC, CSCP, APS, and CS-APS at confidence levels of 90% for MELD and 95% for EmoWOZ and EmoCX, highlighting the inferior performance of LAC and the superiority of CS-APS.

Figure 12

Figure 10. Class spectrum prediction set sizes: comparing LAC, CSCP, APS, and CS-APS with emphasis on LAC’s underestimation of uncertainty, CSCP’s overestimation, and the balanced adaptivity of APS and CS-APS.

Figure 13

Table 2. Results of four approaches on three selected emotional conversation datasets. Labels 0–6 for MELD are sadness, surprise, neutral, joy, anger, disgust, and fear; for EmoWOZ are neutral, fearful (sad), dissatisfied, apologetic, abusive, excited, and satisfied; and labels 0–3 for EmoContext are others, happy, sad, and angry

Figure 14

Table 3. Results of four approaches on three selected emotional conversation datasets. Labels 0–6 for MELD are sadness, surprise, neutral, joy, anger, disgust, and fear; for EmoWOZ are neutral, fearful (sad), dissatisfied, apologetic, abusive, excited, and satisfied; and labels 0–3 for EmoContext are others, happy, sad, and angry