Hostname: page-component-77f85d65b8-7lfxl Total loading time: 0 Render date: 2026-03-28T11:30:39.727Z Has data issue: false hasContentIssue false

Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI

Published online by Cambridge University Press:  09 June 2023

Moritz Laurer*
Affiliation:
Department of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands. Email: m.laurer@vu.nl, wouter.van.atteveldt@vu.nl, a.casassalleras@vu.nl, k.welbers@vu.nl
Wouter van Atteveldt
Affiliation:
Department of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands. Email: m.laurer@vu.nl, wouter.van.atteveldt@vu.nl, a.casassalleras@vu.nl, k.welbers@vu.nl
Andreu Casas
Affiliation:
Department of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands. Email: m.laurer@vu.nl, wouter.van.atteveldt@vu.nl, a.casassalleras@vu.nl, k.welbers@vu.nl
Kasper Welbers
Affiliation:
Department of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands. Email: m.laurer@vu.nl, wouter.van.atteveldt@vu.nl, a.casassalleras@vu.nl, k.welbers@vu.nl
*
Corresponding author Moritz Laurer
Rights & Permissions [Opens in a new window]

Abstract

Supervised machine learning is an increasingly popular tool for analyzing large political text corpora. The main disadvantage of supervised machine learning is the need for thousands of manually annotated training data points. This issue is particularly important in the social sciences where most new research questions require new training data for a new task tailored to the specific research question. This paper analyses how deep transfer learning can help address this challenge by accumulating “prior knowledge” in language models. Models like BERT can learn statistical language patterns through pre-training (“language knowledge”), and reliance on task-specific data can be reduced by training on universal tasks like natural language inference (NLI; “task knowledge”). We demonstrate the benefits of transfer learning on a wide range of eight tasks. Across these eight tasks, our BERT-NLI model fine-tuned on 100 to 2,500 texts performs on average 10.7 to 18.3 percentage points better than classical models without transfer learning. Our study indicates that BERT-NLI fine-tuned on 500 texts achieves similar performance as classical models trained on around 5,000 texts. Moreover, we show that transfer learning works particularly well on imbalanced data. We conclude by discussing limitations of transfer learning and by outlining new opportunities for political science research.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Table 1 Examples of the NLI task.

Figure 1

Figure 1 Illustration of standard classification versus universal NLI classification.

Figure 2

Table 2 Key political datasets used in the analysis.

Figure 3

Figure 2 Average performance across eight tasks versus training data size. The “classical-best” lines display the results from either the SVM or logistic regression, whichever is better. Note that four datasets contain more than 2,500 data points (see Figure 3).

Figure 4

Figure 3 Performance per task versus training data size (F1 Macro).

Supplementary material: File

Laurer et al. supplementary material
Download undefined(File)
File 1.4 MB
Supplementary material: Link

Laurer et al. Dataset

Link