Two Wrongs Make a Right: Addressing Underreporting in Binary Data from Multiple Sources

Scott J. Cook; Betsabe Blas; Raymond J. Carroll; Samiran Sinha

doi:10.1017/pan.2016.13

Two Wrongs Make a Right: Addressing Underreporting in Binary Data from Multiple Sources

Published online by Cambridge University Press: 11 April 2017

Scott J. Cook ,

Betsabe Blas ,

Raymond J. Carroll and

Samiran Sinha

Show author details

Scott J. Cook*: Affiliation:
Department of Political Science, Texas A&M University, College Station, TX 77843, USA. Email: sjcook@tamu.edu
Betsabe Blas: Affiliation:
Department of Statistics, Federal University of Pernambuco, University City, Brazil 50670-901
Raymond J. Carroll: Affiliation:
Department of Statistics, Texas A&M University, College Station, TX 77843, USA School of Mathematical Sciences, University of Technology Sydney, Sydney 2007, Australia
Samiran Sinha: Affiliation:
Department of Statistics, Texas A&M University, College Station, TX 77843, USA
*: *Email: sjcook@tamu.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Media-based event data—i.e., data comprised from reporting by media outlets—are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreported by these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly outperforms current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.

Information

Type: Articles
Information: Political Analysis , Volume 25 , Issue 2 , April 2017 , pp. 223 - 240

DOI: https://doi.org/10.1017/pan.2016.13 [Opens in a new window]
Copyright: Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Authors’ note: For their helpful comments and suggestions, thanks to Kenneth Benoit, Graeme Blair, Chad Hazlett, Florian Hollenbach, Idean Salyehan, Nils Weidmann, the reviewers, and the editor(s). Replication materials are available online at Cook et al. (2016). All inquiries should be sent to the corresponding author at sjcook@tamu.edu

Contributing Editor: R. Michael Alvarez

References

Abrevaya, Jason, and Hausman, Jerry A.. 1999. Semiparametric estimation with mismeasured dependent variables: An application to duration models for unemployment spells. Annales d’Economie et de Statistique 55/56:243–275.Google Scholar

Achen, Christopher H. 2002. Toward a new political methodology: Microfoundations and ART. Annual Review of Political Science 5(1):423–450.Google Scholar

Carroll, Raymond J., Ruppert, David, Stefanski, Leonard A., and Crainiceanu, Ciprian M.. 2006. Measurement error in nonlinear models: a modern perspective . Boca Raton, FL: CRC Press.Google Scholar

Carroll, Raymond J., and Pederson, Shane. 1993. On robustness in the logistic regression model. Journal of the Royal Statistical Society, Series B 55:693–706.Google Scholar

Cook, Scott, Blas, Betsabe, Carroll, Raymond, and Sinha, Samiran. 2016. Replication data for: Two wrongs make a right. doi:10.7910/DVN/92GMLB, Harvard Dataverse.Google Scholar

Copas, J. B. 1988. Binary regression models for contaminated data. Journal of the Royal Statistical Society, Series B 50:225–265.Google Scholar

Davenport, Christian. 2007. State repression and political order. Annual Review of Political Science 10:1–23.Google Scholar

Davenport, Christian, and Ball, Patrick. 2002. Views to a kill exploring the implications of source selection in the case of Guatemalan state terror, 1977–1995. Journal of Conflict Resolution 46(3):427–450.Google Scholar

Earl, Jennifer, Martin, Andrew, McCarthy, John D., and Soule, Sarah A.. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30:65–80.Google Scholar

Hausman, Jerry A., Abrevaya, Jason, and Scott-Morton, Fiona M.. 1998. Misclassification of the dependent variable in a discrete-response setting. Journal of Econometrics 87(2):239–269.Google Scholar

Heckman, James J.1977. Sample selection bias as a specification error (with an application to the estimation of labor supply functions).Google Scholar

Hendrix, Cullen S., and Salehyan, Idean. 2015. No news is good news: Mark and recapture for event data when reporting probabilities are less than one. International Interactions 41(2):392–406.Google Scholar

Hendrix, Cullen S., and Salehyan, Idean. 2016. A house divided threat perception, military factionalism, and repression in Africa. Journal of Conflict Resolution (forthcoming).Google Scholar

Hug, Simon. 2003. Selection bias in comparative research: The case of incomplete data sets. Political Analysis 11(3):255–274.Google Scholar

Hug, Simon. 2009. The effect of misclassifications in probit models: Monte Carlo simulations and applications. Political Analysis 18(1):78–102.Google Scholar

Hug, Simon, and Wisler, Dominique. 1998. Correcting for selection bias in social movement research. Mobilization: An International Quarterly 3(2):141–161.Google Scholar

Imai, Kosuke, and Yamamoto, Teppei. 2010. Causal inference with differential measurement error: Nonparametric identification and sensitivity analysis. American Journal of Political Science 54(2):543–560.Google Scholar

Mackenzie, Darryl I., Nichols, James D., Andrew Royle, J., Pollock, Kenneth H., Bailey, Larissa L., and Hines, James E.. 2006. Occupancy estimation and modeling: Inferring patterns and dynamics of species occurrence . San Diego, CA: Elsevier Academic Press.Google Scholar

Maddala, Gangadharrao S. 1983. Limited-dependent and qualitative variables in econometrics. Number 1 . New York: Cambridge University Press.Google Scholar

Poe, Steven C., and Neal Tate, C.. 1994. Repression of human rights to personal integrity in the 1980s: A global analysis. American Political Science Review 88(4):853–872.Google Scholar

Poe, Steven C., Neal Tate, C., and Keith, Linda Camp. 1999. Repression of the human right to personal integrity revisited: A global cross-national study covering the years 1976–1993. International Studies Quarterly 43(2):291–313.Google Scholar

Poe, Steven C., Nicolas, Rost, and Carey, Sabine C.. 2006. Assessing risk and opportunity in conflict studies: A human rights analysis. Journal of Conflict Resolution 50(4):484–507.Google Scholar

Salehyan, Idean, Hendrix, Cullen S., Hamner, Jesse, Case, Christina, Linebarger, Christopher, Stull, Emily, and Williams, Jennifer. 2012. Social conflict in Africa: A new database. International Interactions 38(4):503–511.Google Scholar

Schrodt, Philip A. 2012. Precedents, progress, and prospects in political event data. International Interactions 38(4):546–569.Google Scholar

Schrodt, Philip A., and Gerner, Deborah J.. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–92. American Journal of Political Science 38(3):825–854.Google Scholar

Strange, Austin M., Park, Bradley, Tierney, Michael J., Fuchs, Andreas, Dreher, Axel, and Ramachandran, Vijaya. 2013. China’s development finance to Africa: A media-based approach to data collection. Center for Global Development Working Paper No. 323.Google Scholar

Trumbore, Peter F., and Woo, Byungwon. 2014. Smugglers blues: Examining why countries become narcotics transit states using the new international narcotics production and transit (INAPT) data set. International Interactions 40(5):763–787.Google Scholar

Weidmann, Nils B. 2014. On the accuracy of media-based conflict event data. Journal of Conflict Resolution 59(6):1129–1149.Google Scholar

Weidmann, Nils B. 2016. A closer look at reporting bias in conflict event data. American Journal of Political Science 60(1):206–218.Google Scholar

Woolley, John T. 2000. Using media-based data in studies of politics. American Journal of Political Science 44(1):156–173.Google Scholar

Cook supplementary material

Cook supplementary material 1

File 217 KB

Article contents

Two Wrongs Make a Right: Addressing Underreporting in Binary Data from Multiple Sources

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Cook supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests