Hostname: page-component-77f85d65b8-v2srd Total loading time: 0 Render date: 2026-04-18T14:20:25.315Z Has data issue: false hasContentIssue false

PAPEA: A modular pipeline for the automation of protest event analysis

Published online by Cambridge University Press:  23 June 2025

Sebastian Haunss*
Affiliation:
Research Institute Social Cohesion, University of Bremen, Bremen, Germany
Priska Daphi
Affiliation:
Institute of Sociology, Justus-Liebig University, Giessen, Germany
Jan Matti Dollbaum
Affiliation:
Department of European Studies and Slavic Studies, Université de Fribourg, Switzerland Geschwister-Scholl-Institute for Political Science, Ludwig Maximilian University of Munich, Munich, Germany
Lidiya Hristova
Affiliation:
Research Institute Social Cohesion, University of Bremen, Bremen, Germany
Pál Susánszky
Affiliation:
Research Institute Social Cohesion, University of Bremen, Bremen, Germany
Elias Steinhilper
Affiliation:
Consensus and Conflict Department, German Centre for Integration and Migration Research (DeZIM), Berlin, Germany
*
Corresponding author: Sebastian Haunss; Email: sebastian.haunss@uni-bremen.de
Rights & Permissions [Opens in a new window]

Abstract

Protest event analysis (PEA) is the core method to understand spatial patterns and temporal dynamics of protest. We show how Large Language Models (LLM) can be used to automate the classification of protest events and of political event data more broadly with levels of accuracy comparable to humans, while reducing necessary annotation time by several orders of magnitude. We propose a modular pipeline for the automation of PEA (PAPEA) based on fine-tuned LLMs and provide publicly available models and tools which can be easily adapted and extended. PAPEA enables getting from newspaper articles to PEA datasets with high levels of precision without human intervention. A use case based on a large German news-corpus illustrates the potential of PAPEA.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of EPS Academic Ltd.
Figure 0

Figure 1. The complete pipeline.

Figure 1

Figure 2. Relevance classification, distribution of predicted probabilities.

Figure 2

Table 1. Frequency of protest forms in the ProLoc data set

Figure 3

Table 2. Evaluation of form prediction

Figure 4

Table 3. Evaluation of topic prediction

Figure 5

Figure 3. Distribution of automatically annotated articles containing protest events, Germany 2019.

Note: Kernel bandwidth chosen to maximize information while smoothing over seasonality (drop in articles every Sunday).
Figure 6

Figure 4. Topics addressed in automatically annotated protest events, Germany 2019.

Supplementary material: File

Haunss et al. supplementary material

Haunss et al. supplementary material
Download Haunss et al. supplementary material(File)
File 539 KB