Hostname: page-component-77f85d65b8-zzw9c Total loading time: 0 Render date: 2026-03-30T09:48:45.650Z Has data issue: false hasContentIssue false

Chasing the authoritarian spectre: Detecting authoritarian discourse with large language models

Published online by Cambridge University Press:  02 January 2026

Michal Mochtak*
Affiliation:
Department of Political Science, Radboud University, The Netherlands
*
Address for correspondence: Michal Mochtak, Department of Political Science, Radboud University, Heyendaalseweg 141, 6525 AJ Nijmegen, the Netherlands. Email: michal.mochtak@ru.nl
Rights & Permissions [Opens in a new window]

Abstract

The paper introduces a deep‐learning model fine‐tuned for detecting authoritarian discourse in political speeches. Set up as a regression problem with weak supervision logic, the model is trained for the task of classification of segments of text for being/not being associated with authoritarian discourse. Rather than trying to define what an authoritarian discourse is, the model builds on the assumption that authoritarian leaders inherently define it. In other words, authoritarian leaders talk like authoritarians. When combined with the discourse defined by democratic leaders, the model learns the instances that are more often associated with authoritarians on the one hand and democrats on the other. The paper discusses several evaluation tests using the model and advocates for its usefulness in a broad range of research problems. It presents a new methodology for studying latent political concepts and positions as an alternative to more traditional research strategies.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © 2024 The Author(s). European Journal of Political Research published by John Wiley & Sons Ltd on behalf of European Consortium for Political Research.
Figure 0

Figure 1. Distribution of speeches per geopolitical region in the UNGD corpus. Note: AP (Asia‐Pacific); EE&CA (Eastern Europe and Central Asia); LA&C (Latin America and the Caribbean); MENA (Middle East and North Africa); SSA (Sub‐Saharan Africa); WE&NA (Western Europe and North Africa); see the full description of the referred geopolitical regions in the V‐Dem codebook v13 (Coppedge et al., 2023).

Figure 1

Figure 2. Distribution of EDI scores across the corpus of sentence trigrams (UNGD corpus).

Figure 2

Figure 3. Confusion matrix with binary classification results on evaluation dataset #1 (M&S Corpus). Note: Numbers in the confusion matrix depict counts of classified documents per category.

Figure 3

Figure 4. Visualization of predicted and real EDI scores (M&S corpus).

Figure 4

Figure 5. Confusion matrix with binary classification results on evaluation dataset #2. Note: Numbers in the confusion matrix depict counts of documents per category.

Figure 5

Figure 6. Visualization of predicted and real EDI scores (IA corpus).

Supplementary material: File

Mochtak supplementary material

Mochtak supplementary material
Download Mochtak supplementary material(File)
File 7.3 MB