Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-03-27T18:50:04.575Z Has data issue: false hasContentIssue false

When Correlation Is Not Enough: Validating Populism Scores from Supervised Machine-Learning Models

Published online by Cambridge University Press:  09 January 2023

Michael Jankowski*
Affiliation:
Institute for Social Sciences, Carl von Ossietzky University Oldenburg, Ammerländer Heerstraße 114-118, 26111 Oldenburg, Germany. Email: michael.jankowski@uol.de
Robert A. Huber
Affiliation:
Department of Political Science, University of Salzburg, Rudolfskai 42, 5020 Salzburg, Austria. Email: robertalexander.huber@plus.ac.at
*
Corresponding author Michael Jankowski
Rights & Permissions [Opens in a new window]

Abstract

Despite the ongoing success of populist parties in many parts of the world, we lack comprehensive information about parties’ level of populism over time. A recent contribution to Political Analysis by Di Cocco and Monechi (DCM) suggests that this research gap can be closed by predicting parties’ populism scores from their election manifestos using supervised machine learning. In this paper, we provide a detailed discussion of the suggested approach. Building on recent debates about the validation of machine-learning models, we argue that the validity checks provided in DCM’s paper are insufficient. We conduct a series of additional validity checks and empirically demonstrate that the approach is not suitable for deriving populism scores from texts. We conclude that measuring populism over time and between countries remains an immense challenge for empirical research. More generally, our paper illustrates the importance of more comprehensive validations of supervised machine-learning models.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Table 1 Top-five most important features of the Random Forest model for each country.

Figure 1

Table 2 Example of coding errors for Austrian election manifestos.

Figure 2

Figure 1 Populism scores of Di Cocco and Monechi (2022a) and V-Party for the four main parties in Austria. Note: The party names and years in the figure identify the labels used by DCM for training the machine-learning models in Austria. The labels are incorrectly assigned to the manifesto content (see Section A5 of the Supplementary Material).

Figure 3

Figure 2 Scores for party manifestos (a) and manifesto labels (b) in Germany based on 500 reshuffling analyses.

Figure 4

Figure 3 Correlation of “populism” scores from reshuffled text corpora with expert positions (V-Party and POPPA). Note: The dashed lines display correlation between DCM’s populism scores with the respective expert surveys.

Supplementary material: Link

Jankowski and Huber Dataset

Link
Supplementary material: PDF

Jankowski and Huber supplementary material

Appendix

Download Jankowski and Huber supplementary material(PDF)
PDF 10.1 MB