Skip to main content Accessibility help

Election Fraud: A Latent Class Framework for Digit-Based Tests

  • Juraj Medzihorsky (a1)


Digit-based election forensics (DBEF) typically relies on null hypothesis significance testing, with undesirable effects on substantive conclusions. This article proposes an alternative free of this problem. It rests on decomposing the observed numeral distribution into the “no fraud” and “fraud” latent classes, by finding the smallest fraction of numerals that needs to be either removed or reallocated to achieve a perfect fit of the “no fraud” model. The size of this fraction can be interpreted as a measure of fraudulence. Both alternatives are special cases of measures of model fit—the π∗ mixture index of fit and the Δ dissimilarity index, respectively. Furthermore, independently of the latent class framework, the distributional assumptions of DBEF can be relaxed in some contexts. Independently or jointly, the latent class framework and the relaxed distributional assumptions allow us to dissect the observed distributions using models more flexible than those of existing DBEF. Reanalysis of Beber and Scacco's (2012) data shows that the approach can lead to new substantive conclusions.


Corresponding author

e-mail: (corresponding author)


Hide All

Author's note: I am grateful to Tamás Rudas, Gábor Tóka, Levente Littvay, Zoltán Fazekas, Daniela Širinić, Pavol Hardos, two anonymous reviewers, and the editors for helpful comments and suggestions, and the members of the Political Behavior Research Group at CEU for helpful discussion. Replication materials are available online as Medzihorsky, Juraj, 2015, “Replication Data for: Election Fraud: A Latent Class Framework for Digit-Based Tests”,, Harvard Dataverse, V2 (Medzihorsky 2015b), and include the version of the R package pistar (Medzihorsky 2015a) used in the analysis. The article uses data from Beber and Scacco (2012), which is available online also as Beber and Scacco (2011). Supplementary materials for this article are available on the Political Analysis Web site.



Hide All
Agresti, A. 2002. Categorical data analysis, 2nd ed. Hoboken, N.J.: John Wiley & Sons.
Alvarez, R. M., Hall, T. E. and Hyde, S. D. 2009. Election Fraud: Detecting and Deterring Electoral Manipulation. Washington, D.C.: Brookings Institution Press.
Alvarez, R. M., Atkeson, L. R., and Hall, T. E. 2012. Evaluating elections: A handbook of methods and standards. Cambridge [England]; New York: Cambridge University Press.
Beber, B., and Scacco, A. 2011. Replication Data for: What the Numbers Say: A Digit-Based Test for Election Fraud. Harvard Dataverse, V2. (accessed April 26, 2014).
Beber, B., and Scacco, A. 2012. What the numbers say: A digit-based test for election fraud. Political Analysis 20(2): 211–34.
Benford, F. 1938. The law of anomalous numbers. Proceedings of the American Philosophical Society 78:551–72.
Breunig, C., and Goerres, A. 2011. Searching for electoral irregularities in an established democracy: Applying Benford's law tests to Bundestag elections in unified Germany. Electoral Studies 30(3): 534–45.
Buttorf, G. 2008. Detecting fraud in America's Gilded Age. Unpublished manuscript, University of Iowa.
Cantú, F., and Saiegh, S. M. 2011. Fraudulent democracy? An analysis of Argentina's infamous decade using supervised machine learning. Political Analysis 19(4): 409–33.
Clogg, C., Rudas, T., and Xi, L. 1995. A new index of structure for the analysis of models for mobility tables and other cross-classifications. Sociological Methodology 25:197222.
Clogg, C. C., Rudas, T., and Matthews, S. 1997. Analysis of contingency tables using graphical displays based on the mixture index of fit. In Visualization of categorical data, eds. Blasius, J. and Greenacre, M., 425–39. San Diego: Academic Press.
Dayton, C. M. 2003. Applications and computational strategies for the two-point mixture index of fit. British Journal of Mathematical and Statistical Psychology 56(1): 113.
Deckert, J., Myagkov, M., and Ordeshook, P. C. 2011. Benford's law and the detection of election fraud. Political Analysis 19(3): 245–68.
Formann, A. K. 2000. Rater agreement and the generalized Rudas-Clogg-Lindsay index of fit. Statistics in Medicine 19(14): 1881–8.
Formann, A. K 2003a. Latent class model diagnosis from a frequentist point of view. Biometrics 59(1): 189–96.
Formann, A. K 2003b. Latent class model diagnostics—A review and some proposals. Computational Statistics & Data Analysis 41(3): 549–59.
Formann, A. K 2006. Testing the Rasch model by means of the mixture fit index. British Journal of Mathematical and Statistical Psychology 59(1): 8995.
Giles, D. E. 2007. Benford's law and naturally occurring prices in certain eBay auctions. Applied Economics Letters 14(3): 157–61.
Gini, C. 1914. Di una misura della dissomiglianza tra due gruppi di quantità e delle sue applicazioni allo studio delle relazione statistiche. Atti del Reale Instituto Veneto di Scienze, Lettere ed Arti (Series 8) 74:185213.
Hernández, J. M., Rubio, V. J., Revuelta, J., and Santacreu, J. 2006. A procedure for estimating intrasubject behavior consistency. Educational and Psychological Measurement 66(3): 417–34.
Hill, T. P. 1995. A statistical derivation of the significant-digit law. Statistical Science 10(4): 354–63.
Ispány, M., and Verdes, E. 2014. On the robustness of mixture index of fit. Journal of Mathematical Sciences 200(4): 432–40.
Jiménez, R., and Hidalgo, M. 2014. Forensic analysis of Venezuelan elections during the Chávez presidency. PLoS One 9(6):e100884.
Judge, G., and Schechter, L. 2009. Detecting problems in survey data using Benford's law. Journal of Human Resources 44(1): 124.
Leemann, L., and Bochsler, D. 2014. A systematic approach to study electoral fraud. Electoral Studies 35:3347.
Leemis, L. M., Schmeiser, B. W., and Evans, D. L. 2000. Survival distributions satisfying Benford's law. American Statistician 54(4): 236–41.
Mebane, W. R. 2006a. Election forensics: The second-digit Benford's law test and recent American presidential elections. In Prepared for delivery at the Election Fraud Conference. September 29–30, Salt Lake City, Utah.
Mebane, W. R 2006b. Election forensics: Vote counts and Benford's law. Summer Meeting of the Political Methodology Society, UC-Davis, July.
Mebane, W. R 2007. Election forensics: Statistical interventions in election controversies. Prepared for presentation at the 2007 Annual Meeting of the American Political Science Association, Chicago, Aug 30-Sep 2.
Mebane, W. R 2008. Election forensics: Outlier and digit tests in America and Russia. American Electoral Process Conference, Center for the Study of Democratic Politics, Princeton University.
Mebane, W. R 2010a. Election fraud or strategic voting? Can second-digit tests tell the difference? Summer Meeting of the Political Methodology Society, University of Iowa.
Mebane, W. R 2010b. Fraud in the 2009 presidential election in Iran? Chance 23(1): 615.
Mebane, W. R 2011. Comment on “Benford's law and the detection of election fraud.” Political Analysis 19(3): 269–72.
Mebane, W. R., and Kalinin, K. 2009. Comparative election fraud detection. Prepared for presentation at the 2009 Annual Meeting of the American Political Science Association, Toronto, Canada, Sept 3–6.
Medzihorsky, J. 2015a. pistar: Rudas, Clogg and Lindsay mixture index of fit. R package version
Medzihorsky, J 2015b. Replication Data for: Election Fraud: A Latent Class Framework for Digit-Based Tests. Harvard Dataverse, V2 [UNF:6:FIWHvsHNzZgPStT0+kgbsQ==].
Newcomb, S. (1881). Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics 4(1): 3940.
Nickerson, R. S. 2002. The production and perception of randomness. Psychological Review 109(2): 330.
Norris, P., Frank, R. W., and Coma, F. M. I. 2014. Advancing electoral integrity. Oxford: Oxford University Press.
Pericchi, L., and Torres, D. 2011. Quick anomaly detection by the Newcomb-Benford law, with applications to electoral processes data from the USA, Puerto Rico, and Venezuela. Statistical Science 26(4): 502–16.
Revuelta, J. 2008. Estimating the &b.pi;* goodness of fit index for finite mixtures of item response models. British Journal of Mathematical and Statistical Psychology 61(1): 93113.
Rudas, T. 1998. The mixture index of fit. In Advances in methodology, data analysis, and statistics, ed. Ferligoj, A., 1522. Ljubljana: FDV.
Rudas, T 1999. The mixture index of fit and minimax regression. Metrika 50(2): 163–72.
Rudas, T 2002. A latent class approach to measuring the fit of a statistical model. In Applied latent class analysis, eds. Hagenaars, J. A. and McCutcheon, A. L., 345–65. Cambridge: Cambridge University Press.
Rudas, T 2005. Mixture models of missing data. Quality & Quantity 39(1): 1936.
Rudas, T., Clogg, C., and Lindsay, B. 1994. A new index of fit based on mixture methods for the analysis of contingency tables. Journal of the Royal Statistical Society. Series B (Methodological) 56(4): 623–39.
Rudas, T., and Verdes, E. 2015. Model-based analysis of incomplete data using the mixture index of fit. In Advances in latent class analysis: A Festschrift in Honor of C. Mitchell Dayton, eds. Hancock, G. R. and Macready, G. B. Charlotte, NC: Information Age Publishing.
Rudas, T., and Zwick, R. 1997. Estimating the importance of differential item functioning. Journal of Educational and Behavioral Statistics 22(1): 3145.
Tam Cho, W. K., and Gaines, B. J. 2007. Breaking the (Benford) law: Statistical fraud detection in campaign finance. American Statistician 61(3): 218–23.
Verdes, E., and Rudas, T. 2003. The π* index as a new alternative for assessing goodness of fit of logistic regression. In Foundations of Statistical Inference: Proceedings of the Shoresh Conference 2000, eds. Haitovsky, Y. and Ritov, Y., 167–77. Berlin and Heidelberg: Springer.
Ziliak, S. T., and McCloskey, D. N. 2008. The cult of statistical significance: How the standard error costs us jobs, justice, and lives. Ann Arbor: University of Michigan Press.
MathJax is a JavaScript display engine for mathematics. For more information see
Type Description Title
Supplementary materials

Medzihorsky supplementary material

 PDF (262 KB)
262 KB


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed