The Prevalence and Severity of Underreporting Bias in Machine- and Human-Coded Data

Benjamin E. Bagozzi; Patrick T. Brandt; John R. Freeman; Jennifer S. Holmes; Alisha Kim; Agustin Palao Mendizabal; Carly Potz-Nielsen

doi:10.1017/psrm.2018.11

The Prevalence and Severity of Underreporting Bias in Machine- and Human-Coded Data

Published online by Cambridge University Press: 05 March 2018

Benjamin E. Bagozzi ,

Patrick T. Brandt ,

John R. Freeman ,

Jennifer S. Holmes ,

Alisha Kim ,

Agustin Palao Mendizabal and

Carly Potz-Nielsen

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Textual data are plagued by underreporting bias. For example, news sources often fail to report human rights violations. Cook et al. propose a multi-source estimator to gauge, and to account for, the underreporting of state repression events within human codings of news texts produced by the Agence France-Presse and Associated Press. We evaluate this estimator with Monte Carlo experiments, and then use it to compare the prevalence and seriousness of underreporting when comparable texts are machine coded and recorded in the World-Integrated Crisis Early Warning System dataset. We replicate Cook et al.’s investigation of human-coded state repression events with our machine-coded events, and validate both models against an external measure of human rights protections in Africa. We then use the Cook et al. estimator to gauge the seriousness and prevalence of underreporting in machine and human-coded event data on human rights violations in Colombia. We find in both applications that machine-coded data are as valid as human-coded data.

Information

Type: Research Notes
Information: Political Science Research and Methods , Volume 7 , Issue 3 , July 2019 , pp. 641 - 649

DOI: https://doi.org/10.1017/psrm.2018.11 [Opens in a new window]
Copyright: © The European Political Science Association 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Benjamin E. Bagozzi, Department of Political Science & International Relations, University of Delaware, 405 Smith Hall, 18 Amstel Ave, Newark, DE 19716 (bagozzib@udel.edu). Patrick T. Brandt (pbrandt@utdallas.edu), Jennifer S. Holmes (jholmes@utdallas.edu), Alisha Kim (Alisha.Kim@utdallas.edu) and Agustin Palao Mendizabal (Agustin.PalaoMendizabal@utdallas.edu), School of Economic, Political and Policy Sciences, University of Texas, Dallas, 800 W. Campbell Rd, GR31 Richardson TX 75080. John R. Freeman (freeman@umn.edu) and Carly Potz-Nielsen (potzn001@umn.edu), Department of Political Science, University of Minnesota, 1414 Social Sciences, 267 19th Ave S., Minneapolis, MN 55455. An earlier version of this paper was presented as a poster at the 34th Annual Meeting of the Political Methodology Society. This research is supported by NSF Grant Number SBE-SMA-1539302. The authors thank Associate Editor Daniel Stegmueller, two anonymous reviewers, as well as Scott Cook, Mark Nieman, and Vito D’Orazio for their helpful comments and suggestions. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2018.11

References

Bagozzi, Benjamin E., and Berliner, Daniel. 2017. ‘The Politics of Scrutiny in Human Rights Monitoring: Evidence from Structural Topic Models of U.S. State Department Human Rights Reports’. Political Science Research and Methods, 1–17.Google Scholar

Beieler, John, Brandt, Patrick T., Halterman, Andrew, Schrodt, Philip A., and Simpson, Erin M.. 2016. ‘Generating Political Event Data in Near Real Time: Opportunities and Challenges’. In R. Michael Alvarez (ed.), Computational Social Science: Discovery and Prediction , 98–120. New York: Cambridge University Press.Google Scholar

Boschee, Elizabeth, Lautenschlager, Jennifer, O’Brien, Sean, Shellman, Steve, Starz, James, and Ward, Michael. 2016. ‘ICEWS Coded Event Data’, Harvard Dataverse. Available at http://dx.doi.org/10.7910/DVN/28075, accessed 8 November 2016.Google Scholar

Centro de Investigación y Educación Popular (CINEP). 2008. ‘Marco Conceptual: Banco de Datos de Derechos Humanos y Violencia Política’. CINEP, Bogotá, Colombia.Google Scholar

Cook, Scott J., Blas, Betsabe, Carroll, Raymond J., and Sinha, Samiran. 2017. ‘Two Wrongs Don’t Make a Right: Addressing Underreporting in Binary Data from Multiple Sources’. Political Analysis 25(2):223–240.Google Scholar

Fariss, Christopher J. 2014. ‘Respect for Human Rights has Improved Over Time: Modeling the Changing Standard of Accountability’. American Political Science Review 108(2):297–316.Google Scholar

Grimmer, Justin, and Stewart, Brandon M.. 2013. ‘Text as Data: The Promise and Pitfalls of Automated Content Analysis Methods for Political Texts’. Political Analysis 21(3):267–297.Google Scholar

Hendrix, Cullen S., Salehyan, Idean, Hamner, Jesse, Case, Christina, Linebarger, Christopher, Stull, Emily, and Williams, Jennifer. 2012. ‘Social Conflict in Africa: A New Database’. International Interactions 38(4):503–511.Google Scholar

King, Gary, and Lowe, Will. 2003. ‘An Automated Information Extraction Tool for International Conflict Data with Performance as Good As Human Coders’. International Organization 57(3):617–642.Google Scholar

Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. ‘Extracting Policy Positions from Political Texts Using Words as Data’. American Political Science Review 97:2311–2331.Google Scholar

Schrodt, Philip A., and Van Brackle, David. 2013. ‘Automated Coding of Political Event Data’. In V. S. Subrahmanian (ed.), Handbook of Computational Approaches to Counterterrorism , 23–49. New York: Springer Press.Google Scholar

Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. ‘A Scaling Model for Estimating Time-Series Party Positions from Texts’. American Journal of Political Science 52(3):705–722.Google Scholar

Sundberg, Ralph, and Melander, Erik. 2013. ‘Introducing the UCDP Georeferenced Event Dataset’. Journal of Peace Research 50(4):523–532.Google Scholar

Ward, Michael D., and Beger, Andreas. 2017. ‘Lessons from Near Real-Time Forecasting of Irregular Leadership Changes’. Journal of Peace Research 54(2):141–156.Google Scholar

Weidmann, Nils B. 2015. ‘On the Accuracy of Media-Based Conflict Event Data’. Journal of Conflict Resolution 59(6):1129–1149.Google Scholar

Bagozzi et al. Dataset

Dataset

http://dx.doi.org/10.7910/DVN/CFCZ2L

Link

Bagozzi et al. supplementary material

Bagozzi et al. supplementary material 1

PDF 501.8 KB

Article contents

The Prevalence and Severity of Underreporting Bias in Machine- and Human-Coded Data

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Bagozzi et al. Dataset

Bagozzi et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests