Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Francisco Cantú; Sebastián M. Saiegh

doi:10.1093/pan/mpr033

Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Published online by Cambridge University Press: 04 January 2017

Francisco Cantú and

Sebastián M. Saiegh

Show author details

Francisco Cantú: Affiliation:
Department of Political Science, University of California, San Diego, CA 92093
Sebastián M. Saiegh*: Affiliation:
Department of Political Science, University of California, San Diego, CA 92093
*: e-mail: ssaiegh@ucsd.edu

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper, we introduce an innovative method to diagnose electoral fraud using vote counts. Specifically, we use synthetic data to develop and train a fraud detection prototype. We employ a naive Bayes classifier as our learning algorithm and rely on digital analysis to identify the features that are most informative about class distinctions. To evaluate the detection capability of the classifier, we use authentic data drawn from a novel data set of district-level vote counts in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Our results corroborate the validity of our approach: The elections considered to be irregular (legitimate) by most historical accounts are unambiguously classified as fraudulent (clean) by the learner. More generally, our findings demonstrate the feasibility of generating and using synthetic data for training and testing an electoral fraud detection system.

Information

Type: Articles
Information: Political Analysis , Volume 19 , Issue 4 , Autumn 2011 , pp. 409 - 433

DOI: https://doi.org/10.1093/pan/mpr033 [Opens in a new window]
Copyright: Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology

References

Medina, Abal, Manuel, Juan, and Cao, Julieta Suárez. 2003. Partisan competition in Argentina. From closed and predictable to open and unpredictable. Meeting of the Latin American Studies Association, Dallas, TX.Google Scholar

Alston, Lee J., and Gallo, Andres A. 2010. Electoral fraud, the rise of Peron and demise of checks and balances in Argentina. Explorations in Economic History 47: 179–97.Google Scholar

Altinçay, Hakan. 2005. On naive Bayesian fusion of dependent classifiers. Pattern Recognition Letters 26: 2463–73.CrossRef Google Scholar

Alvarez, R. Michael, Hall, Thad E., and Hyde, Susan D. 2008. Election fraud: Detecting and deterring electoral manipulation. New York: The Brookings Institution.Google Scholar

Beber, Bernd, and Scacco, Alexandra. 2008. What the numbers say: A digit-based test for election fraud using new data from Nigeria. Working paper.Google Scholar

Bejar, María Dolores. 2005. El Régimen Fraudulento. La política en la provincia de Buenos Aires, 1930-1943. Buenos Aires: Siglo XXI editores.Google Scholar

Busta, Bruce, and Weinberg, Randy. 1998. Using Benford's Law and neural networks as review procedure. Managerial Auditing Journal 13: 356–66.CrossRef Google Scholar

Bustamante, Carlos, Garrido, Leonardo, and Soto, Rogelio. 2006. Comparing fuzzy naive Bayes and Gaussian naive Bayes for decision making in RoboCup 3D. In Advances in artificial intelligence. Berlin, Germany: Springer.Google Scholar

Buttorf, Gail. 2008. Detecting fraud in America's gilded age. Working paper.Google Scholar

Chan, P., Fan, W., Prodromiris, A., and Stolfo, S. 1999. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems 14: 67–74.CrossRef Google Scholar

Cho, Wendy K. Tam, and Gaines, Brian J. 2007. Breaking the (Benford) law: Statistical fraud detection in campaign finance. The American Statistician 61: 218–23.Google Scholar

Ciofalo, Michele. 2009. Entropy, Benford's first digit law, and the distribution of everything. Palermo, Italy: Dipartamento di Ingenieria Nucleare, Universita degli Studi di Palermo.Google Scholar

Ciria, Alberto. 1974. Parties and power in modern Argentina. Albany, NY: State University of New York Press.Google Scholar

Clifford, P., and Heath, A. F. 1993. The political consequences of social mobility. Journal of the Royal Statistical Society. Series A (Statistics in Society) 156: 51–61.CrossRef Google Scholar

Cox, Gary W. 1997. Making votes count. New York: Cambridge University Press.Google Scholar

Cox, Gary W., and Kousser, Morgan. 1981. Turnout and rural corruption: New York as a test case. American Journal of Political Science 25: 646–63.Google Scholar

Debar, H., Dacier, M., Wespi, A., and Lampart, S. 1998. An experimentation workbench for intrusion detection systems. Technical Report RZ2998. Zurich, Switzerland: IBM Research Division.Google Scholar

Demichelis, Francesca, Magni, Paolo, Piergiorgi, Paolo, Rubin, Mark, and Bellazzi, Riccardo. 2006. A hierarchical Naive Bayes Model for handling sample heterogeneity in classification problems: An application to tissue microarrays. BMC Bioinformatics 514(7): 1–12.Google Scholar

Demirekler, Mübeccel, and Altinçay, Hakan. 2002. Plurality voting-based multiple classifier systems: Statistically independent with respect to dependent classifier sets. Pattern Recognition 35: 2365–79.Google Scholar

Domingos, Pedro, and Pazzani, Michael. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29 (2-3): 103–30.Google Scholar

Drake, Paul. 2009. Between tyranny and anarchy. Stanford, CA: Stanford University Press.Google Scholar

Drake, Philip D., and Nigrini, Mark J. 2000. Computer assisted analytical procedures using Benford's Law. Journal of Accounting Education 18: 127–46.Google Scholar

Eno, Josh, and Thompson, Craig W. 2008. Generating synthetic data to match mininig patterns. Internet Computing 12: 78–82.CrossRef Google Scholar

Fewster, R. M. 2009. A simple explanation of Benford's Law. The American Statistician 63: 26–32.Google Scholar

Grendar, Marian, Judge, George, and Schechter, Laura. 2007. An empirical non-parametric likelihood family of data-based Benford-like distributions. Physica A 380: 429–38.CrossRef Google Scholar

Haines, J. W., Lippmann, R. P., Fried, D. J., Tran, E., Boswell, S., and Zissman, M. A. 2001. Data intrusion detection system evaluation: Design and procedures. Technical Report 1062. Lexington, MA: MIT Lincoln Laboratory.Google Scholar

Hand, David J., and Yu, Keming. 2001. Idiot's Bayes—Not so stupid after all? International Statistical Review 3: 385–98.Google Scholar

Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The elements of statistical learning. New York: Springer.Google Scholar

Hill, Theodore P. 1995a. The significant-digit phenomenon. The American Mathematical Monthly 102: 322–7.Google Scholar

Hill, Theodore P. 1995b. A statistical derivation of the significant-digit law. Statistical Science 10: 354–63.Google Scholar

Hyde, Susan D. 2007. The observer effect in international politics: Evidence from a natural experiment. World Politics 50(1): 37–63.CrossRef Google Scholar

Janvresse, Élise, and de la Rue, Thierry. 2004. From uniform distributions to Benford's Law. Journal of Applied Probability 41: 1203–10.Google Scholar

Katz, Jonathan N., and Sala, Brian R. 1996. Careerism, committee assignments, and the electoral connection. The American Political Science Review 90: 21–33.Google Scholar

Kotsiantis, S. B. 2007. Supervised machine learning: A review of classification techniques. Informatica 31: 249–68.Google Scholar

Kuncheva, Ludmila I. 2006. On the optimality of Naive Bayes with dependent binary features. Pattern Recognition Letters 27: 830–7.Google Scholar

Kvarnstrom, H., Lundin, E., and Jonsson, E. 2000. Combining fraud and intrusion detection—Meeting new requirements. Proceedings of the Fifth Nordic Workshop on Secure IT systems, Reykjavik, Iceland, October 12-13.Google Scholar

Leemis, Lawrence M., Schmeiser, Bruce, and Evans, Diane L. 2000. Survival distributions satisfying Benford's Law. The American Statistician 54: 236–41.Google Scholar

Lehoucq, Fabrice. 2003. Electoral fraud: Causes, types, and consequences. Annual Review of Political Science 6: 233–56.CrossRef Google Scholar

Levin, Ines, Cohn, Gabe, Michael Alvarez, R., and Ordeshook, Peter C. 2009. Detecting voter fraud in an electronic voting context: An analysis of the unlimited reelection vote in Venezuela. Online Proceedings of the Electronic Voting Technology Workshop.Google Scholar

Lundin, Emilie, Kvarnström, Håkan, and Jonsson, Erland. 2002. A Synthetic Fraud Data Generation Methodology. Lecture Notes in Computer Science. Berlin, Germany: Springer.Google Scholar

Lupu, Noam, and Stokes, Susan. 2009. The social bases of political parties in Argentina, 1912-2003. Latin American Research Review 44: 58–87.Google Scholar

Mebane, Walter R. 2006. Election forensics: Vote counts and Benford's Law. 2006 Summer Meeting of the Political Methodology Society, UC-Davis.Google Scholar

Mebane, Walter R. 2007. Statistics for digits. 2007 Summer Meeting of the Political Methodology Society, Penn State University, University Park, PA.Google Scholar

Mebane, Walter R. 2008a. Election forensics: Outlier and digit tests in America and Russia. Working paper.Google Scholar

Mebane, Walter R. 2008b. Elections forensics: The second-digit Benford Law's test and recent American presidential elections. In Election fraud, ed. Michael Alvarez, R., Hall, Thad E., and Hyde, Susan D. Washington, DC: The Brookings Institutions.Google Scholar

Mebane, Walter R. 2010. Fraud in the 2009 presidential election in Iran? Chance 23: 6–15.Google Scholar

Mitchell, T. 1997. Machine learning. New York: McGraw-Hill.Google Scholar

Nye, John, and Moul, Charles. 2007. The political economy of numbers: On the application of Benford's Law to international macroeconomics statistics. The B. E. Journal of Macroeconomics 7(1): 1–12.Google Scholar

Pericchi, Luis R., and Torres, David. 2004. La Ley de Newcomb-Benford y sus aplicaciones al Referendum Revocatorio en Venezuela. Working paper.Google Scholar

Phua, Clifton, Lee, Vincent, Smith, Kate, and Gayler, Ross. 2005. A comprehensive survey of data mining-based fraud detection research. Victoria: 1-14. http://arxiv.org/abs/1009.6119.Google Scholar

Potash, Robert A. 1969. The army and politics in Argentina: 1928-1945. Stanford, CA: Stanford University Press.Google Scholar

Przeworski, Adam. 2010. Democracy and the limits of self-government. New York: Cambridge University Press.Google Scholar

Puketza, N. J., Zhang, K., Chung, M., Mukherjee, B., and Olsson, R. A. 1996. A methodology for testing intrusion detection systems. Software Engineering 22 (10): 719–29.Google Scholar

Reiter, J. P. 2004. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30: 235–42.Google Scholar

Rish, Irina. 2001. An empirical study of the naive Bayes classifier. Proceedings of the IJCAI workshop on “Empirical Methods in AI.”Google Scholar

Roukema, Boudewijn F. 2009. Benford's Law anomalies in the 2009 Iranian presidential election. Unpublished manuscript.Google Scholar

Rubin, Donald B. 1993. Discussion statistical disclosure limitation. Journal of Official Statistics 9: 461–8.Google Scholar

Schäfer, Christin, Schräpler, Jörg-Peter, Müller, Klaus-Robert, and Wagner, Gert G. 2004. Automatic identification of faked and fraudulent interviews in surveys by two different methods. Working paper.Google Scholar

Tan, Aik Choon, and Gilbert, David. 2003. An empirical comparison of supervised machine learning techniques in bioinformatics. First Asia-Pacific Bioinformatics Conference, Adelaide, Australia, February 4-7.Google Scholar

Varian, Hal A. 1972. Benford's Law. The American Statistician 26(3): 65–6.Google Scholar

Walter, Richard J. 1985. The province of Buenos Aires and Argentine politics, 1912-1943. New York: Cambridge University Press.Google Scholar

Wong, Weng-Keen, Moore, Andrew, Cooper, Gregory, and Wagner, Michael. 2003. Bayesian network anomaly pattern detection for disease outbreaks. Proceedings of the International Conference on Machine Learning, Washington, DC, August 21-24.Google Scholar

Yu, Lei, and Liu, Huan. 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5: 1205–24.Google Scholar

Zetter, Kim. 2009. Crunching Iranian election numbers for evidence of fraud. http://www.wired.com/threatlevel/2009/06/iran_numbers.Google Scholar

Zhang, Harry. 2004. The optimality of naive Bayes. Proceedings of the Seventeenth Florida Artificial Intelligence Research Society Conference. The AAAI Press, pp. 562-67.Google Scholar

Zhang, Harry, and Su, Jiang. 2004. Naive Bayesian classifiers for ranking. In ECML 2004, lecture notes in Computer Science, ed. Boulicaut, J.-F. Berlin, Germany: Springer.Google Scholar

Article contents

Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Abstract

Information

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests