Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-06T10:15:34.829Z Has data issue: false hasContentIssue false

Using Multilingual Language Technology to Classify Open-Ended Survey Responses: Conceptions of Democracy in a Cross-Cultural Survey Setting

Published online by Cambridge University Press:  04 May 2026

Stefan Dahlberg*
Affiliation:
Department of Humanities and Social Sciences, Mid Sweden University, Sweden Department of Government, University of Bergen, Norway
Luise Dürlich
Affiliation:
Department of Linguistics and Philology, Uppsala University, Sweden RISE Research Institutes of Sweden AB, Sweden
Sofia Axelsson
Affiliation:
Department of Political Science, University of Gothenburg, Sweden
Yahui Zhao
Affiliation:
Centrum voor Wiskunde en Informatica, Netherlands
Joakim Nivre
Affiliation:
Department of Linguistics and Philology, Uppsala University, Sweden RISE Research Institutes of Sweden AB, Sweden
*
Corresponding author: Stefan Dahlberg; Email: stefan.dahlberg@miun.se
Rights & Permissions [Opens in a new window]

Abstract

Recent advancements in language technology have opened new avenues in political science for automating and improving survey data analysis across diverse cultural contexts. This article examines the effectiveness of language models (LMs) in analyzing open-ended survey responses about democracy from ten countries, contrasting these modern tools with traditional survey methodologies. Utilizing a predefined coding scheme and a subset of pre-annotated survey data, it assesses the performance of fine-tuning pre-trained LMs in a multilingual setting to classify text spans. The findings suggest that LMs can capture democratic perceptions and handle data abstractions at levels comparable to human annotators. This study not only highlights the potential of LMs to transform political science research by augmenting traditional methods but also discusses the practical applications of pre-trained LMs in classifying complex survey responses, in collaboration with human annotators.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Figure 1 Number of responses per question (Q1 and Q2) and country/language.

Figure 1

Figure 2 Inter-annotator agreement at Democracy Tree level 1. Blue bars correspond to examples with high enough agreement to be included for further use, while red bars show examples selected for consolidation by both annotators in each language.

Figure 2

Figure 3 Example of a response annotated as spans (top row) and in BIO token classification format (bottom row).

Figure 3

Table 1 Annotation results as category-wise F1-score with monolingual models

Figure 4

Table 2 Annotation results as category-wise F1-score with multilingual models

Figure 5

Table 3 Annotation results as category-wise F1-score with cross-lingual models

Figure 6

Figure 4 Category proportions in the test set according to human annotation and best model predictions

Figure 7

Table 4 Pearson correlation between predicted proportions by the best models for each language and human annotation on the test set

Figure 8

Table 5 Absolute difference between proportions based on automatic and human annotation

Figure 9

Table 6 Pearson correlation between predicted proportions by cross-lingual (leave-one-out) models and human annotation on the test set

Figure 10

Figure 5 Category proportions by country as estimated by the multilingual model.

Figure 11

Table 7 Logistic regression results: Nativism and conceptions of democracy

Supplementary material: File

Dahlberg et al. supplementary material

Dahlberg et al. supplementary material
Download Dahlberg et al. supplementary material(File)
File 329.3 KB
Supplementary material: Link

Dahlberg et al. Dataset

Link