Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-05T15:53:12.203Z Has data issue: false hasContentIssue false

Topic aware probing: From sentence length prediction to idiom identification how reliant are neural language models on topic?

Published online by Cambridge University Press:  25 October 2024

Vasudevan Nedumpozhimana*
Affiliation:
ADAPT Research Centre, Technological University Dublin, Dublin, Ireland
John D. Kelleher
Affiliation:
ADAPT Research Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
*
Corresponding author: Vasudevan Nedumpozhimana; Email: vasudevan.nedumpozhimana@tudublin.ie
Rights & Permissions [Opens in a new window]

Abstract

Transformer-based neural language models achieve state-of-the-art performance on various natural language processing tasks. However, an open question is the extent to which these models rely on word-order/syntactic or word co-occurrence/topic-based information when processing natural language. This work contributes to this debate by addressing the question of whether these models primarily use topic as a signal, by exploring the relationship between Transformer-based models’ (BERT and RoBERTa’s) performance on a range of probing tasks in English, from simple lexical tasks such as sentence length prediction to complex semantic tasks such as idiom token identification, and the sensitivity of these tasks to the topic information. To this end, we propose a novel probing method which we call topic-aware probing. Our initial results indicate that Transformer-based models encode both topic and non-topic information in their intermediate layers, but also that the facility of these models to distinguish idiomatic usage is primarily based on their ability to identify and encode topic. Furthermore, our analysis of these models’ performance on other standard probing tasks suggests that tasks that are relatively insensitive to the topic information are also tasks that are relatively difficult for these models.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Sample input and corresponding target output for the general idiom token identification task

Figure 1

Figure 1. Topic-Aware probing method.

Figure 2

Table 2. Number of tail topics from 10 topic models on Bigram shift and VNIC dataset

Figure 3

Figure 2. Experimental design.

Figure 4

Table 3. Average seen and unseen AUC ROC scores and their differences along with standard deviations for different embeddings on the Bigram Shift Probing task and the General Idiom Token Identification task

Figure 5

Figure 3. Seen and Unseen AUC ROC scores from GloVe and different layers of BERT and RoBERTa on the Bigram Shift Task.

Figure 6

Figure 4. Seen and Unseen AUC ROC scores from different layers of BERT and RoBERTa with GloVe baseline on General Idiom Token Identification Task.

Figure 7

Figure 5. Difference between seen scores and unseen scores from different layers of BERT and RoBERTa on General Idiom Token Identification Task.

Figure 8

Table 4. Descriptions and summary statistics of the datasets for the VNIC, Bigram shift, and 8 other probing tasks

Figure 9

Table 5. Number of tail topics from 10 topic models on datasets of other 8 probing tasks

Figure 10

Figure 6. Seen and Unseen AUC ROC scores from different layers of BERT with GloVe baseline on Probing Tasks.

Figure 11

Figure 7. Seen and Unseen AUC ROC scores from different layers of RoBERTa with GloVe baseline on Probing Tasks.

Figure 12

Table 6. Average seen and unseen AUC ROC scores and their differences for GloVe and best BERT and RoBERTa layer embeddings on different probing tasks—tasks are ranked in the descending order of the difference between GloVe Seen score and GloVe Unseen score

Figure 13

Figure 8. GloVe Seen Score versus GloVe Score Difference (Task Topic Sensitivity) for each probing task (Note that scores of SOMO and CI are very similar and therefore both of them are overlapping in the plot).

Figure 14

Figure 9. BERT and RoBERTa Seen Score versus GloVe Score Difference (Task Topic Sensitivity) for each probing task.