Hostname: page-component-76fb5796d-45l2p Total loading time: 0 Render date: 2024-04-26T11:53:11.721Z Has data issue: false hasContentIssue false

The Prevalence and Severity of Underreporting Bias in Machine- and Human-Coded Data

Published online by Cambridge University Press:  05 March 2018

Abstract

Textual data are plagued by underreporting bias. For example, news sources often fail to report human rights violations. Cook et al. propose a multi-source estimator to gauge, and to account for, the underreporting of state repression events within human codings of news texts produced by the Agence France-Presse and Associated Press. We evaluate this estimator with Monte Carlo experiments, and then use it to compare the prevalence and seriousness of underreporting when comparable texts are machine coded and recorded in the World-Integrated Crisis Early Warning System dataset. We replicate Cook et al.’s investigation of human-coded state repression events with our machine-coded events, and validate both models against an external measure of human rights protections in Africa. We then use the Cook et al. estimator to gauge the seriousness and prevalence of underreporting in machine and human-coded event data on human rights violations in Colombia. We find in both applications that machine-coded data are as valid as human-coded data.

Type
Research Notes
Copyright
© The European Political Science Association 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

Benjamin E. Bagozzi, Department of Political Science & International Relations, University of Delaware, 405 Smith Hall, 18 Amstel Ave, Newark, DE 19716 (bagozzib@udel.edu). Patrick T. Brandt (pbrandt@utdallas.edu), Jennifer S. Holmes (jholmes@utdallas.edu), Alisha Kim (Alisha.Kim@utdallas.edu) and Agustin Palao Mendizabal (Agustin.PalaoMendizabal@utdallas.edu), School of Economic, Political and Policy Sciences, University of Texas, Dallas, 800 W. Campbell Rd, GR31 Richardson TX 75080. John R. Freeman (freeman@umn.edu) and Carly Potz-Nielsen (potzn001@umn.edu), Department of Political Science, University of Minnesota, 1414 Social Sciences, 267 19th Ave S., Minneapolis, MN 55455. An earlier version of this paper was presented as a poster at the 34th Annual Meeting of the Political Methodology Society. This research is supported by NSF Grant Number SBE-SMA-1539302. The authors thank Associate Editor Daniel Stegmueller, two anonymous reviewers, as well as Scott Cook, Mark Nieman, and Vito D’Orazio for their helpful comments and suggestions. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2018.11

References

Bagozzi, Benjamin E., and Berliner, Daniel. 2017. ‘The Politics of Scrutiny in Human Rights Monitoring: Evidence from Structural Topic Models of U.S. State Department Human Rights Reports’. Political Science Research and Methods, 1–17.Google Scholar
Beieler, John, Brandt, Patrick T., Halterman, Andrew, Schrodt, Philip A., and Simpson, Erin M.. 2016. ‘Generating Political Event Data in Near Real Time: Opportunities and Challenges’. In R. Michael Alvarez (ed.), Computational Social Science: Discovery and Prediction , 98–120. New York: Cambridge University Press.Google Scholar
Boschee, Elizabeth, Lautenschlager, Jennifer, O’Brien, Sean, Shellman, Steve, Starz, James, and Ward, Michael. 2016. ‘ICEWS Coded Event Data’, Harvard Dataverse. Available at http://dx.doi.org/10.7910/DVN/28075, accessed 8 November 2016.Google Scholar
Centro de Investigación y Educación Popular (CINEP). 2008. ‘Marco Conceptual: Banco de Datos de Derechos Humanos y Violencia Política’. CINEP, Bogotá, Colombia.Google Scholar
Cook, Scott J., Blas, Betsabe, Carroll, Raymond J., and Sinha, Samiran. 2017. ‘Two Wrongs Don’t Make a Right: Addressing Underreporting in Binary Data from Multiple Sources’. Political Analysis 25(2):223240.Google Scholar
Fariss, Christopher J. 2014. ‘Respect for Human Rights has Improved Over Time: Modeling the Changing Standard of Accountability’. American Political Science Review 108(2):297316.Google Scholar
Grimmer, Justin, and Stewart, Brandon M.. 2013. ‘Text as Data: The Promise and Pitfalls of Automated Content Analysis Methods for Political Texts’. Political Analysis 21(3):267297.Google Scholar
Hendrix, Cullen S., Salehyan, Idean, Hamner, Jesse, Case, Christina, Linebarger, Christopher, Stull, Emily, and Williams, Jennifer. 2012. ‘Social Conflict in Africa: A New Database’. International Interactions 38(4):503511.Google Scholar
King, Gary, and Lowe, Will. 2003. ‘An Automated Information Extraction Tool for International Conflict Data with Performance as Good As Human Coders’. International Organization 57(3):617642.Google Scholar
Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. ‘Extracting Policy Positions from Political Texts Using Words as Data’. American Political Science Review 97:23112331.Google Scholar
Schrodt, Philip A., and Van Brackle, David. 2013. ‘Automated Coding of Political Event Data’. In V. S. Subrahmanian (ed.), Handbook of Computational Approaches to Counterterrorism , 23–49. New York: Springer Press.Google Scholar
Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. ‘A Scaling Model for Estimating Time-Series Party Positions from Texts’. American Journal of Political Science 52(3):705722.Google Scholar
Sundberg, Ralph, and Melander, Erik. 2013. ‘Introducing the UCDP Georeferenced Event Dataset’. Journal of Peace Research 50(4):523532.Google Scholar
Ward, Michael D., and Beger, Andreas. 2017. ‘Lessons from Near Real-Time Forecasting of Irregular Leadership Changes’. Journal of Peace Research 54(2):141156.Google Scholar
Weidmann, Nils B. 2015. ‘On the Accuracy of Media-Based Conflict Event Data’. Journal of Conflict Resolution 59(6):11291149.Google Scholar
Supplementary material: Link

Bagozzi et al. Dataset

Link
Supplementary material: PDF

Bagozzi et al. supplementary material

Bagozzi et al. supplementary material 1

Download Bagozzi et al. supplementary material(PDF)
PDF 501.8 KB