Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-06T08:47:19.410Z Has data issue: false hasContentIssue false

Fine-tuned large language models can replicate expert coding better than trained coders: a study on informative signals sent by interest groups

Published online by Cambridge University Press:  13 February 2026

Dahyun Choi*
Affiliation:
Department of Politics, Princeton University, Princeton, NJ, USA
Denis Peskoff
Affiliation:
Department of Sociology and Office of Population Research, Princeton University, Princeton, NJ, USA
Brandon M. Stewart
Affiliation:
Department of Sociology and Office of Population Research, Princeton University, Princeton, NJ, USA
*
Corresponding author: Dahyun Choi; Email: dahyunc@princeton.edu
Rights & Permissions [Opens in a new window]

Abstract

Understanding how political information is transmitted requires tools that can reliably and scalably capture complex signals in text. While existing studies highlight interest groups as strategic information providers, empirical analysis has been constrained by reliance on expert annotation. Using policy documents released by interest groups, this study shows that fine-tuned large language models (LLMs) outperform lightly trained workers, crowdworkers, and zero-shot LLMs in distinguishing two difficult-to-separate categories: informative signals that help improve political decision-making and associative signals that shape preferences but lack substantive relevance. We further demonstrate that the classifier generalizes out of distribution across two applications. Although the empirical setting is domain-specific, the approach offers a scalable method for expert-driven text coding applicable to other areas of political inquiry.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of EPS Academic Ltd.
Figure 0

Figure 1. Estimated presidential discretion from Lowande and Shipan (2022). Figure adapted from Lowande and Shipan (2022) using data kindly shared by the original author and regenerated with updated software. Red indicates topics where the estimates show a margin greater than 1 between expert coding and nonexpert coding. Note also that the implied ordering is substantially different between the two sets of estimates.

Figure 1

Table 1. Schematic description of codebook

Figure 2

Figure 2. Average accuracy over the eight categories in our coding scheme by method. Green indicates machine performance, while red and orange indicate human coder performance. See Table 2 for a description of the methods compared.

Figure 3

Table 2. Summary of classification methods compared in Figure 2

Figure 4

Figure 3. Measures of fine-tuned GPT-3 precision, recall, F1 and accuracy on whether any signal of informative/associative type is present.

Figure 5

Figure 4. Signal Composition (U.S. Chamber of Commerce). Error bars indicate the confidence intervals estimated from the doubly robust estimation, which integrates expert coding and surrogate labels, calculated using the DSL package (Egami et al., 2024) on February 15, 2025.

Figure 6

Figure 5. Signal composition (USTR). Error bars indicate the confidence intervals estimated from the doubly robust estimation, which integrates expert coding and surrogate labels, calculated using the DSL package (Egami et al., 2024).

Supplementary material: File

Choi et al. supplementary material

Choi et al. supplementary material
Download Choi et al. supplementary material(File)
File 1.5 MB