Skip to main content

Two Wrongs Make a Right: Addressing Underreporting in Binary Data from Multiple Sources

  • Scott J. Cook (a1), Betsabe Blas (a2), Raymond J. Carroll (a3) (a4) and Samiran Sinha (a3)

Media-based event data—i.e., data comprised from reporting by media outlets—are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreported by these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly outperforms current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.

Corresponding author
* Email:
Hide All

Authors’ note: For their helpful comments and suggestions, thanks to Kenneth Benoit, Graeme Blair, Chad Hazlett, Florian Hollenbach, Idean Salyehan, Nils Weidmann, the reviewers, and the editor(s). Replication materials are available online at Cook et al. (2016). All inquiries should be sent to the corresponding author at

Contributing Editor: R. Michael Alvarez

Hide All
Abrevaya, Jason, and Hausman, Jerry A.. 1999. Semiparametric estimation with mismeasured dependent variables: An application to duration models for unemployment spells. Annales d’Economie et de Statistique 55/56:243275.
Achen, Christopher H. 2002. Toward a new political methodology: Microfoundations and ART. Annual Review of Political Science 5(1):423450.
Carroll, Raymond J., Ruppert, David, Stefanski, Leonard A., and Crainiceanu, Ciprian M.. 2006. Measurement error in nonlinear models: a modern perspective . Boca Raton, FL: CRC Press.
Carroll, Raymond J., and Pederson, Shane. 1993. On robustness in the logistic regression model. Journal of the Royal Statistical Society, Series B 55:693706.
Cook, Scott, Blas, Betsabe, Carroll, Raymond, and Sinha, Samiran. 2016. Replication data for: Two wrongs make a right. doi:10.7910/DVN/92GMLB, Harvard Dataverse.
Copas, J. B. 1988. Binary regression models for contaminated data. Journal of the Royal Statistical Society, Series B 50:225265.
Davenport, Christian. 2007. State repression and political order. Annual Review of Political Science 10:123.
Davenport, Christian, and Ball, Patrick. 2002. Views to a kill exploring the implications of source selection in the case of Guatemalan state terror, 1977–1995. Journal of Conflict Resolution 46(3):427450.
Earl, Jennifer, Martin, Andrew, McCarthy, John D., and Soule, Sarah A.. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30:6580.
Hausman, Jerry A., Abrevaya, Jason, and Scott-Morton, Fiona M.. 1998. Misclassification of the dependent variable in a discrete-response setting. Journal of Econometrics 87(2):239269.
Heckman, James J.1977. Sample selection bias as a specification error (with an application to the estimation of labor supply functions).
Hendrix, Cullen S., and Salehyan, Idean. 2015. No news is good news: Mark and recapture for event data when reporting probabilities are less than one. International Interactions 41(2):392406.
Hendrix, Cullen S., and Salehyan, Idean. 2016. A house divided threat perception, military factionalism, and repression in Africa. Journal of Conflict Resolution (forthcoming).
Hug, Simon. 2003. Selection bias in comparative research: The case of incomplete data sets. Political Analysis 11(3):255274.
Hug, Simon. 2009. The effect of misclassifications in probit models: Monte Carlo simulations and applications. Political Analysis 18(1):78102.
Hug, Simon, and Wisler, Dominique. 1998. Correcting for selection bias in social movement research. Mobilization: An International Quarterly 3(2):141161.
Imai, Kosuke, and Yamamoto, Teppei. 2010. Causal inference with differential measurement error: Nonparametric identification and sensitivity analysis. American Journal of Political Science 54(2):543560.
Mackenzie, Darryl I., Nichols, James D., Andrew Royle, J., Pollock, Kenneth H., Bailey, Larissa L., and Hines, James E.. 2006. Occupancy estimation and modeling: Inferring patterns and dynamics of species occurrence . San Diego, CA: Elsevier Academic Press.
Maddala, Gangadharrao S. 1983. Limited-dependent and qualitative variables in econometrics. Number 1 . New York: Cambridge University Press.
Poe, Steven C., and Neal Tate, C.. 1994. Repression of human rights to personal integrity in the 1980s: A global analysis. American Political Science Review 88(4):853872.
Poe, Steven C., Neal Tate, C., and Keith, Linda Camp. 1999. Repression of the human right to personal integrity revisited: A global cross-national study covering the years 1976–1993. International Studies Quarterly 43(2):291313.
Poe, Steven C., Nicolas, Rost, and Carey, Sabine C.. 2006. Assessing risk and opportunity in conflict studies: A human rights analysis. Journal of Conflict Resolution 50(4):484507.
Salehyan, Idean, Hendrix, Cullen S., Hamner, Jesse, Case, Christina, Linebarger, Christopher, Stull, Emily, and Williams, Jennifer. 2012. Social conflict in Africa: A new database. International Interactions 38(4):503511.
Schrodt, Philip A. 2012. Precedents, progress, and prospects in political event data. International Interactions 38(4):546569.
Schrodt, Philip A., and Gerner, Deborah J.. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–92. American Journal of Political Science 38(3):825854.
Strange, Austin M., Park, Bradley, Tierney, Michael J., Fuchs, Andreas, Dreher, Axel, and Ramachandran, Vijaya. 2013. China’s development finance to Africa: A media-based approach to data collection. Center for Global Development Working Paper No. 323.
Trumbore, Peter F., and Woo, Byungwon. 2014. Smugglers blues: Examining why countries become narcotics transit states using the new international narcotics production and transit (INAPT) data set. International Interactions 40(5):763787.
Weidmann, Nils B. 2014. On the accuracy of media-based conflict event data. Journal of Conflict Resolution 59(6):11291149.
Weidmann, Nils B. 2016. A closer look at reporting bias in conflict event data. American Journal of Political Science 60(1):206218.
Woolley, John T. 2000. Using media-based data in studies of politics. American Journal of Political Science 44(1):156173.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary materials

Cook supplementary material
Cook supplementary material 1

 Unknown (217 KB)
217 KB


Altmetric attention score

Full text views

Total number of HTML views: 8
Total number of PDF views: 198 *
Loading metrics...

Abstract views

Total abstract views: 808 *
Loading metrics...

* Views captured on Cambridge Core between 11th April 2017 - 23rd July 2018. This data will be updated every 24 hours.