Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-29T16:49:22.683Z Has data issue: false hasContentIssue false

Extractive versus Generative Language Models for Political Conflict Text Classification

Published online by Cambridge University Press:  31 December 2025

Patrick T. Brandt*
Affiliation:
School of Economic, Political and Policy Sciences, University of Texas at Dallas , USA
Sultan Alsarra*
Affiliation:
Computer Science, King Saud University , Saudi Arabia
Vito D’Orazio
Affiliation:
Political Science, West Virginia University , USA
Dagmar Heintze
Affiliation:
School of Economic, Political and Policy Sciences, University of Texas at Dallas , USA
Latifur Khan
Affiliation:
Engineering and Computer Science, University of Texas at Dallas , USA
Shreyas Meher
Affiliation:
Erasmus School of Social and Behavioural Sciences, Erasmus University Rotterdam, Rotterdam, The Netherlands
Javier Osorio
Affiliation:
Department of Political Science, University of Arizona , USA
Marcus Sianan
Affiliation:
School of Economic, Political and Policy Sciences, University of Texas at Dallas , USA
*
Corresponding authors: Patrick T. Brandt; Email: pbrandt@utdallas.edu; Sultan Alsarra; Email: salsarra@ksu.edu.sa
Corresponding authors: Patrick T. Brandt; Email: pbrandt@utdallas.edu; Sultan Alsarra; Email: salsarra@ksu.edu.sa
Rights & Permissions [Opens in a new window]

Abstract

We review our recent ConfliBERT language model (Hu et al. 2022 [ConfliBERT: A Pre-Trained Language Model for Political Conflict and Violence]) to process political and violence-related texts. When fine-tuned, results show that ConfliBERT has superior performance in accuracy, precision, and recall over other large language models (LLMs) like Google’s Gemma 2 (9B), Meta’s Llama 3.1 (7B), and Alibaba’s Qwen 2.5 (14B) within its relevant domains. It is also hundreds of times faster than these more generalist LLMs. These results are illustrated using texts from the BBC, re3d, and the Global Terrorism Database. We demonstrate that open, fine-tuned models can outperform the more general models in terms of accuracy, precision, and recall, and at a fraction of the cost.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Table 1 A comparison of extractive vs. generative LLMs across settings.

Figure 1

Table 2 Performance metrics for binary classifications of BBC texts.

Figure 2

Table 3 Performance metrics for named entity recognition of re3d texts.

Figure 3

Table 4 Performance metrics for ConfliBERT, Llama 3.1, and Gemma 2 models.

Figure 4

Figure 1 ROC and AUC for each LLM and event type.Note: Curves along the northwestern edge are better.

Figure 5

Figure 2 Precision–recall curves for each LLM and event type.Note: Curves along the northeastern edge are better.

Figure 6

Figure 3 $F_1$ scores across cutoffs for each event type model.Note: Higher curves are better.

Figure 7

Table 5 Model performance comparison (macro averages).

Figure 8

Table 6 Multi-label classification metrics.

Figure 9

Figure 4 Cumulative number of predicted events, 2017–2021, by type and model.

Figure 10

Table A1 Full per-class performance metrics for named entity recognition models for re3d.

Figure 11

Table D1 Overall performance metrics.

Figure 12

Table D2 Performance on rare vs. common attack types ($F_1$ score).

Supplementary material: Link

Brandt et al. Dataset

Link