Skip to main content

Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

  • Francisco Cantú (a1) and Sebastián M. Saiegh (a1)

In this paper, we introduce an innovative method to diagnose electoral fraud using vote counts. Specifically, we use synthetic data to develop and train a fraud detection prototype. We employ a naive Bayes classifier as our learning algorithm and rely on digital analysis to identify the features that are most informative about class distinctions. To evaluate the detection capability of the classifier, we use authentic data drawn from a novel data set of district-level vote counts in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Our results corroborate the validity of our approach: The elections considered to be irregular (legitimate) by most historical accounts are unambiguously classified as fraudulent (clean) by the learner. More generally, our findings demonstrate the feasibility of generating and using synthetic data for training and testing an electoral fraud detection system.

    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning
      Available formats
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning
      Available formats
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning
      Available formats
Corresponding author
Hide All
Medina, Abal, Manuel, Juan, and Cao, Julieta Suárez. 2003. Partisan competition in Argentina. From closed and predictable to open and unpredictable. Meeting of the Latin American Studies Association, Dallas, TX.
Alston, Lee J., and Gallo, Andres A. 2010. Electoral fraud, the rise of Peron and demise of checks and balances in Argentina. Explorations in Economic History 47: 179–97.
Altinçay, Hakan. 2005. On naive Bayesian fusion of dependent classifiers. Pattern Recognition Letters 26: 2463–73.
Alvarez, R. Michael, Hall, Thad E., and Hyde, Susan D. 2008. Election fraud: Detecting and deterring electoral manipulation. New York: The Brookings Institution.
Beber, Bernd, and Scacco, Alexandra. 2008. What the numbers say: A digit-based test for election fraud using new data from Nigeria. Working paper.
Bejar, María Dolores. 2005. El Régimen Fraudulento. La política en la provincia de Buenos Aires, 1930-1943. Buenos Aires: Siglo XXI editores.
Busta, Bruce, and Weinberg, Randy. 1998. Using Benford's Law and neural networks as review procedure. Managerial Auditing Journal 13: 356–66.
Bustamante, Carlos, Garrido, Leonardo, and Soto, Rogelio. 2006. Comparing fuzzy naive Bayes and Gaussian naive Bayes for decision making in RoboCup 3D. In Advances in artificial intelligence. Berlin, Germany: Springer.
Buttorf, Gail. 2008. Detecting fraud in America's gilded age. Working paper.
Chan, P., Fan, W., Prodromiris, A., and Stolfo, S. 1999. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems 14: 6774.
Cho, Wendy K. Tam, and Gaines, Brian J. 2007. Breaking the (Benford) law: Statistical fraud detection in campaign finance. The American Statistician 61: 218–23.
Ciofalo, Michele. 2009. Entropy, Benford's first digit law, and the distribution of everything. Palermo, Italy: Dipartamento di Ingenieria Nucleare, Universita degli Studi di Palermo.
Ciria, Alberto. 1974. Parties and power in modern Argentina. Albany, NY: State University of New York Press.
Clifford, P., and Heath, A. F. 1993. The political consequences of social mobility. Journal of the Royal Statistical Society. Series A (Statistics in Society) 156: 5161.
Cox, Gary W. 1997. Making votes count. New York: Cambridge University Press.
Cox, Gary W., and Kousser, Morgan. 1981. Turnout and rural corruption: New York as a test case. American Journal of Political Science 25: 646–63.
Debar, H., Dacier, M., Wespi, A., and Lampart, S. 1998. An experimentation workbench for intrusion detection systems. Technical Report RZ2998. Zurich, Switzerland: IBM Research Division.
Demichelis, Francesca, Magni, Paolo, Piergiorgi, Paolo, Rubin, Mark, and Bellazzi, Riccardo. 2006. A hierarchical Naive Bayes Model for handling sample heterogeneity in classification problems: An application to tissue microarrays. BMC Bioinformatics 514(7): 112.
Demirekler, Mübeccel, and Altinçay, Hakan. 2002. Plurality voting-based multiple classifier systems: Statistically independent with respect to dependent classifier sets. Pattern Recognition 35: 2365–79.
Domingos, Pedro, and Pazzani, Michael. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29 (2-3): 103–30.
Drake, Paul. 2009. Between tyranny and anarchy. Stanford, CA: Stanford University Press.
Drake, Philip D., and Nigrini, Mark J. 2000. Computer assisted analytical procedures using Benford's Law. Journal of Accounting Education 18: 127–46.
Eno, Josh, and Thompson, Craig W. 2008. Generating synthetic data to match mininig patterns. Internet Computing 12: 7882.
Fewster, R. M. 2009. A simple explanation of Benford's Law. The American Statistician 63: 2632.
Grendar, Marian, Judge, George, and Schechter, Laura. 2007. An empirical non-parametric likelihood family of data-based Benford-like distributions. Physica A 380: 429–38.
Haines, J. W., Lippmann, R. P., Fried, D. J., Tran, E., Boswell, S., and Zissman, M. A. 2001. Data intrusion detection system evaluation: Design and procedures. Technical Report 1062. Lexington, MA: MIT Lincoln Laboratory.
Hand, David J., and Yu, Keming. 2001. Idiot's Bayes—Not so stupid after all? International Statistical Review 3: 385–98.
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The elements of statistical learning. New York: Springer.
Hill, Theodore P. 1995a. The significant-digit phenomenon. The American Mathematical Monthly 102: 322–7.
Hill, Theodore P. 1995b. A statistical derivation of the significant-digit law. Statistical Science 10: 354–63.
Hyde, Susan D. 2007. The observer effect in international politics: Evidence from a natural experiment. World Politics 50(1): 3763.
Janvresse, Élise, and de la Rue, Thierry. 2004. From uniform distributions to Benford's Law. Journal of Applied Probability 41: 1203–10.
Katz, Jonathan N., and Sala, Brian R. 1996. Careerism, committee assignments, and the electoral connection. The American Political Science Review 90: 2133.
Kotsiantis, S. B. 2007. Supervised machine learning: A review of classification techniques. Informatica 31: 249–68.
Kuncheva, Ludmila I. 2006. On the optimality of Naive Bayes with dependent binary features. Pattern Recognition Letters 27: 830–7.
Kvarnstrom, H., Lundin, E., and Jonsson, E. 2000. Combining fraud and intrusion detection—Meeting new requirements. Proceedings of the Fifth Nordic Workshop on Secure IT systems, Reykjavik, Iceland, October 12-13.
Leemis, Lawrence M., Schmeiser, Bruce, and Evans, Diane L. 2000. Survival distributions satisfying Benford's Law. The American Statistician 54: 236–41.
Lehoucq, Fabrice. 2003. Electoral fraud: Causes, types, and consequences. Annual Review of Political Science 6: 233–56.
Levin, Ines, Cohn, Gabe, Michael Alvarez, R., and Ordeshook, Peter C. 2009. Detecting voter fraud in an electronic voting context: An analysis of the unlimited reelection vote in Venezuela. Online Proceedings of the Electronic Voting Technology Workshop.
Lundin, Emilie, Kvarnström, Håkan, and Jonsson, Erland. 2002. A Synthetic Fraud Data Generation Methodology. Lecture Notes in Computer Science. Berlin, Germany: Springer.
Lupu, Noam, and Stokes, Susan. 2009. The social bases of political parties in Argentina, 1912-2003. Latin American Research Review 44: 5887.
Mebane, Walter R. 2006. Election forensics: Vote counts and Benford's Law. 2006 Summer Meeting of the Political Methodology Society, UC-Davis.
Mebane, Walter R. 2007. Statistics for digits. 2007 Summer Meeting of the Political Methodology Society, Penn State University, University Park, PA.
Mebane, Walter R. 2008a. Election forensics: Outlier and digit tests in America and Russia. Working paper.
Mebane, Walter R. 2008b. Elections forensics: The second-digit Benford Law's test and recent American presidential elections. In Election fraud, ed. Michael Alvarez, R., Hall, Thad E., and Hyde, Susan D. Washington, DC: The Brookings Institutions.
Mebane, Walter R. 2010. Fraud in the 2009 presidential election in Iran? Chance 23: 615.
Mitchell, T. 1997. Machine learning. New York: McGraw-Hill.
Nye, John, and Moul, Charles. 2007. The political economy of numbers: On the application of Benford's Law to international macroeconomics statistics. The B. E. Journal of Macroeconomics 7(1): 112.
Pericchi, Luis R., and Torres, David. 2004. La Ley de Newcomb-Benford y sus aplicaciones al Referendum Revocatorio en Venezuela. Working paper.
Phua, Clifton, Lee, Vincent, Smith, Kate, and Gayler, Ross. 2005. A comprehensive survey of data mining-based fraud detection research. Victoria: 1-14.
Potash, Robert A. 1969. The army and politics in Argentina: 1928-1945. Stanford, CA: Stanford University Press.
Przeworski, Adam. 2010. Democracy and the limits of self-government. New York: Cambridge University Press.
Puketza, N. J., Zhang, K., Chung, M., Mukherjee, B., and Olsson, R. A. 1996. A methodology for testing intrusion detection systems. Software Engineering 22 (10): 719–29.
Reiter, J. P. 2004. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30: 235–42.
Rish, Irina. 2001. An empirical study of the naive Bayes classifier. Proceedings of the IJCAI workshop on “Empirical Methods in AI.”
Roukema, Boudewijn F. 2009. Benford's Law anomalies in the 2009 Iranian presidential election. Unpublished manuscript.
Rubin, Donald B. 1993. Discussion statistical disclosure limitation. Journal of Official Statistics 9: 461–8.
Schäfer, Christin, Schräpler, Jörg-Peter, Müller, Klaus-Robert, and Wagner, Gert G. 2004. Automatic identification of faked and fraudulent interviews in surveys by two different methods. Working paper.
Tan, Aik Choon, and Gilbert, David. 2003. An empirical comparison of supervised machine learning techniques in bioinformatics. First Asia-Pacific Bioinformatics Conference, Adelaide, Australia, February 4-7.
Varian, Hal A. 1972. Benford's Law. The American Statistician 26(3): 65–6.
Walter, Richard J. 1985. The province of Buenos Aires and Argentine politics, 1912-1943. New York: Cambridge University Press.
Wong, Weng-Keen, Moore, Andrew, Cooper, Gregory, and Wagner, Michael. 2003. Bayesian network anomaly pattern detection for disease outbreaks. Proceedings of the International Conference on Machine Learning, Washington, DC, August 21-24.
Yu, Lei, and Liu, Huan. 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5: 1205–24.
Zetter, Kim. 2009. Crunching Iranian election numbers for evidence of fraud.
Zhang, Harry. 2004. The optimality of naive Bayes. Proceedings of the Seventeenth Florida Artificial Intelligence Research Society Conference. The AAAI Press, pp. 562-67.
Zhang, Harry, and Su, Jiang. 2004. Naive Bayesian classifiers for ranking. In ECML 2004, lecture notes in Computer Science, ed. Boulicaut, J.-F. Berlin, Germany: Springer.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed