Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-06T13:27:22.377Z Has data issue: false hasContentIssue false

Human Rights Violations in Space: Assessing the External Validity of Machine-Geocoded versus Human-Geocoded Data

Published online by Cambridge University Press:  15 December 2021

Logan Stundal*
Affiliation:
Department of Political Science, University of Minnesota, Minneapolis, MN, USA. E-mail: stund005@umn.edu
Benjamin E. Bagozzi
Affiliation:
Department of Political Science & IR, University of Delaware, Newark, DE, USA. E-mail: bagozzib@udel.edu
John R. Freeman
Affiliation:
Department of Political Science, University of Minnesota, Minneapolis, MN, USA. E-mail: freeman@umn.edu
Jennifer S. Holmes
Affiliation:
School of Economic, Political & Policy Sciences, UT-Dallas, Richardson, TX, USA. E-mail: jholmes@utdallas.edu
*
Corresponding author Logan Stundal
Rights & Permissions [Opens in a new window]

Abstract

Political event data are widely used in studies of political violence. Recent years have seen notable advances in the automated coding of political event data from international news sources. Yet, the validity of machine-coded event data remains disputed, especially in the context of event geolocation. We analyze the frequencies of human- and machine-geocoded event data agreement in relation to an independent (ground truth) source. The events are human rights violations in Colombia. We perform our evaluation for a key, 8-year period of the Colombian conflict and in three 2-year subperiods as well as for a selected set of (non)journalistically remote municipalities. As a complement to this analysis, we estimate spatial probit models based on the three datasets. These models assume Gaussian Markov Random Field error processes; they are constructed using a stochastic partial differential equation and estimated with integrated nested Laplacian approximation. The estimated models tell us whether the three datasets produce comparable predictions, underreport events in relation to the same covariates, and have similar patterns of prediction error. Together the two analyses show that, for this subnational conflict, the machine- and human-geocoded datasets are comparable in terms of external validity but, according to the geostatistical models, produce prediction errors that differ in important respects.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s) 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology
Figure 0

Table 1 Municipality-event confusion matrix, 2002–2009 (confidence intervals in brackets).

Figure 1

Figure 1 Observed FARC events.

Figure 2

Figure 2 Selected journalistically remote and journalistically proximate municipalities.

Figure 3

Table 2 Municipality-event confusion matrices for selected municipalities, 2002–2009, 2002–2004 and 2005–2007 (confidence intervals in brackets).

Figure 4

Figure 3 Coefficient and GMRF parameter values for geostatistical models based on ICEWS, GED, and CINEP datasets, 2002–2009, 2002–2004, and 2005–2007.

Figure 5

Figure 4 Coefficient and GMRF parameter values for geostatistical models of underreporting of ICEWS and GED compared to CINEP, 2002–2009, 2002–2004, and 2005–2007.

Figure 6

Figure 5 ROC curves and AUC statistics for ICEWS, GED, and CINEP models of FARC–HRVs.

Supplementary material: PDF

Stundal et al. supplementary material

Appendix

Download Stundal et al. supplementary material(PDF)
PDF 1.1 MB