Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-29T08:28:12.192Z Has data issue: false hasContentIssue false

Sentence encoding for Dialogue Act classification

Published online by Cambridge University Press:  02 November 2021

Nathan Duran*
Affiliation:
Department of Computer Science and Creative Technologies, University of the West of England, Coldharbour Ln, Bristol BS16 1QY, UK
Steve Battle
Affiliation:
Department of Computer Science and Creative Technologies, University of the West of England, Coldharbour Ln, Bristol BS16 1QY, UK
Jim Smith
Affiliation:
Department of Computer Science and Creative Technologies, University of the West of England, Coldharbour Ln, Bristol BS16 1QY, UK
*
*Corresponding author. E-mail: nathan.duran@uwe.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. A generic DA classification architecture, including the sentence encoding module (components 1-3), and example parameters (Sequence Length, Vocabulary, etc), additional context information (4), dimensionality reduction (5) and classification (6).

Figure 1

Table 1. Overview of the SwDA and Maptask corpora used throughout this study

Figure 2

Table 2. Validation accuracy for the letter case and punctuation experiments

Figure 3

Table 3. TextCNN averaged F1 scores for the three most frequent labels (sd, b and sv), and all question-type labels (Tag-Question does not appear), in the SwDA test set.

Figure 4

Table 4. Vocabulary size which produced the best validation accuracy for each model on the SwDA and Maptask data

Figure 5

Figure 2. Maptask validation accuracy for all supervised models with different vocabulary sizes. Vertical lines are the mean word occurrence, per-vocabulary range (up to 100 words the mean frequency = 1268, and for 100 to 200 words the mean frequency = 162).

Figure 6

Table 5. Input sequence length which produced the best validation accuracy for each model on the SwDA and Maptask data

Figure 7

Figure 3. SwDA validation accuracy for all supervised models with different sequence lengths. Vertical lines are the cumulative sum of utterances up to a given length.

Figure 8

Table 6. Vocabulary size and sequence length group which produced the best validation accuracy for each model on the SwDA and Maptask data

Figure 9

Table 7. Embedding type and dimension which produced the best validation accuracy for each model on the SwDA and Maptask data

Figure 10

Figure 4. The DCNN model’s SwDA validation accuracies for all embedding type and dimension combinations.

Figure 11

Table 8. Validation accuracy for 1, 2 and 3-layer recurrent models on the SwDA and Maptask data

Figure 12

Table 9. Test set accuracy for each of the supervised models on the SwDA and Maptask data

Figure 13

Table 10. Validation set accuracy and test set accuracy for each of the pre-trained language models on the SwDA and Maptask data

Figure 14

Table A.1 Parameters for the (base) supervised and language models