Hostname: page-component-cc8bf7c57-qfg88 Total loading time: 0 Render date: 2024-12-09T18:28:49.985Z Has data issue: false hasContentIssue false

Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Published online by Cambridge University Press:  04 January 2017

Francisco Cantú
Affiliation:
Department of Political Science, University of California, San Diego, CA 92093
Sebastián M. Saiegh*
Affiliation:
Department of Political Science, University of California, San Diego, CA 92093
*
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper, we introduce an innovative method to diagnose electoral fraud using vote counts. Specifically, we use synthetic data to develop and train a fraud detection prototype. We employ a naive Bayes classifier as our learning algorithm and rely on digital analysis to identify the features that are most informative about class distinctions. To evaluate the detection capability of the classifier, we use authentic data drawn from a novel data set of district-level vote counts in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Our results corroborate the validity of our approach: The elections considered to be irregular (legitimate) by most historical accounts are unambiguously classified as fraudulent (clean) by the learner. More generally, our findings demonstrate the feasibility of generating and using synthetic data for training and testing an electoral fraud detection system.

Type
Articles
Copyright
Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology 

References

Medina, Abal, Manuel, Juan, and Cao, Julieta Suárez. 2003. Partisan competition in Argentina. From closed and predictable to open and unpredictable. Meeting of the Latin American Studies Association, Dallas, TX.Google Scholar
Alston, Lee J., and Gallo, Andres A. 2010. Electoral fraud, the rise of Peron and demise of checks and balances in Argentina. Explorations in Economic History 47: 179–97.Google Scholar
Altinçay, Hakan. 2005. On naive Bayesian fusion of dependent classifiers. Pattern Recognition Letters 26: 2463–73.CrossRefGoogle Scholar
Alvarez, R. Michael, Hall, Thad E., and Hyde, Susan D. 2008. Election fraud: Detecting and deterring electoral manipulation. New York: The Brookings Institution.Google Scholar
Beber, Bernd, and Scacco, Alexandra. 2008. What the numbers say: A digit-based test for election fraud using new data from Nigeria. Working paper.Google Scholar
Bejar, María Dolores. 2005. El Régimen Fraudulento. La política en la provincia de Buenos Aires, 1930-1943. Buenos Aires: Siglo XXI editores.Google Scholar
Busta, Bruce, and Weinberg, Randy. 1998. Using Benford's Law and neural networks as review procedure. Managerial Auditing Journal 13: 356–66.CrossRefGoogle Scholar
Bustamante, Carlos, Garrido, Leonardo, and Soto, Rogelio. 2006. Comparing fuzzy naive Bayes and Gaussian naive Bayes for decision making in RoboCup 3D. In Advances in artificial intelligence. Berlin, Germany: Springer.Google Scholar
Buttorf, Gail. 2008. Detecting fraud in America's gilded age. Working paper.Google Scholar
Chan, P., Fan, W., Prodromiris, A., and Stolfo, S. 1999. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems 14: 6774.CrossRefGoogle Scholar
Cho, Wendy K. Tam, and Gaines, Brian J. 2007. Breaking the (Benford) law: Statistical fraud detection in campaign finance. The American Statistician 61: 218–23.Google Scholar
Ciofalo, Michele. 2009. Entropy, Benford's first digit law, and the distribution of everything. Palermo, Italy: Dipartamento di Ingenieria Nucleare, Universita degli Studi di Palermo.Google Scholar
Ciria, Alberto. 1974. Parties and power in modern Argentina. Albany, NY: State University of New York Press.Google Scholar
Clifford, P., and Heath, A. F. 1993. The political consequences of social mobility. Journal of the Royal Statistical Society. Series A (Statistics in Society) 156: 5161.CrossRefGoogle Scholar
Cox, Gary W. 1997. Making votes count. New York: Cambridge University Press.Google Scholar
Cox, Gary W., and Kousser, Morgan. 1981. Turnout and rural corruption: New York as a test case. American Journal of Political Science 25: 646–63.Google Scholar
Debar, H., Dacier, M., Wespi, A., and Lampart, S. 1998. An experimentation workbench for intrusion detection systems. Technical Report RZ2998. Zurich, Switzerland: IBM Research Division.Google Scholar
Demichelis, Francesca, Magni, Paolo, Piergiorgi, Paolo, Rubin, Mark, and Bellazzi, Riccardo. 2006. A hierarchical Naive Bayes Model for handling sample heterogeneity in classification problems: An application to tissue microarrays. BMC Bioinformatics 514(7): 112.Google Scholar
Demirekler, Mübeccel, and Altinçay, Hakan. 2002. Plurality voting-based multiple classifier systems: Statistically independent with respect to dependent classifier sets. Pattern Recognition 35: 2365–79.Google Scholar
Domingos, Pedro, and Pazzani, Michael. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29 (2-3): 103–30.Google Scholar
Drake, Paul. 2009. Between tyranny and anarchy. Stanford, CA: Stanford University Press.Google Scholar
Drake, Philip D., and Nigrini, Mark J. 2000. Computer assisted analytical procedures using Benford's Law. Journal of Accounting Education 18: 127–46.Google Scholar
Eno, Josh, and Thompson, Craig W. 2008. Generating synthetic data to match mininig patterns. Internet Computing 12: 7882.CrossRefGoogle Scholar
Fewster, R. M. 2009. A simple explanation of Benford's Law. The American Statistician 63: 2632.Google Scholar
Grendar, Marian, Judge, George, and Schechter, Laura. 2007. An empirical non-parametric likelihood family of data-based Benford-like distributions. Physica A 380: 429–38.CrossRefGoogle Scholar
Haines, J. W., Lippmann, R. P., Fried, D. J., Tran, E., Boswell, S., and Zissman, M. A. 2001. Data intrusion detection system evaluation: Design and procedures. Technical Report 1062. Lexington, MA: MIT Lincoln Laboratory.Google Scholar
Hand, David J., and Yu, Keming. 2001. Idiot's Bayes—Not so stupid after all? International Statistical Review 3: 385–98.Google Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The elements of statistical learning. New York: Springer.Google Scholar
Hill, Theodore P. 1995a. The significant-digit phenomenon. The American Mathematical Monthly 102: 322–7.Google Scholar
Hill, Theodore P. 1995b. A statistical derivation of the significant-digit law. Statistical Science 10: 354–63.Google Scholar
Hyde, Susan D. 2007. The observer effect in international politics: Evidence from a natural experiment. World Politics 50(1): 3763.CrossRefGoogle Scholar
Janvresse, Élise, and de la Rue, Thierry. 2004. From uniform distributions to Benford's Law. Journal of Applied Probability 41: 1203–10.Google Scholar
Katz, Jonathan N., and Sala, Brian R. 1996. Careerism, committee assignments, and the electoral connection. The American Political Science Review 90: 2133.Google Scholar
Kotsiantis, S. B. 2007. Supervised machine learning: A review of classification techniques. Informatica 31: 249–68.Google Scholar
Kuncheva, Ludmila I. 2006. On the optimality of Naive Bayes with dependent binary features. Pattern Recognition Letters 27: 830–7.Google Scholar
Kvarnstrom, H., Lundin, E., and Jonsson, E. 2000. Combining fraud and intrusion detection—Meeting new requirements. Proceedings of the Fifth Nordic Workshop on Secure IT systems, Reykjavik, Iceland, October 12-13.Google Scholar
Leemis, Lawrence M., Schmeiser, Bruce, and Evans, Diane L. 2000. Survival distributions satisfying Benford's Law. The American Statistician 54: 236–41.Google Scholar
Lehoucq, Fabrice. 2003. Electoral fraud: Causes, types, and consequences. Annual Review of Political Science 6: 233–56.CrossRefGoogle Scholar
Levin, Ines, Cohn, Gabe, Michael Alvarez, R., and Ordeshook, Peter C. 2009. Detecting voter fraud in an electronic voting context: An analysis of the unlimited reelection vote in Venezuela. Online Proceedings of the Electronic Voting Technology Workshop.Google Scholar
Lundin, Emilie, Kvarnström, Håkan, and Jonsson, Erland. 2002. A Synthetic Fraud Data Generation Methodology. Lecture Notes in Computer Science. Berlin, Germany: Springer.Google Scholar
Lupu, Noam, and Stokes, Susan. 2009. The social bases of political parties in Argentina, 1912-2003. Latin American Research Review 44: 5887.Google Scholar
Mebane, Walter R. 2006. Election forensics: Vote counts and Benford's Law. 2006 Summer Meeting of the Political Methodology Society, UC-Davis.Google Scholar
Mebane, Walter R. 2007. Statistics for digits. 2007 Summer Meeting of the Political Methodology Society, Penn State University, University Park, PA.Google Scholar
Mebane, Walter R. 2008a. Election forensics: Outlier and digit tests in America and Russia. Working paper.Google Scholar
Mebane, Walter R. 2008b. Elections forensics: The second-digit Benford Law's test and recent American presidential elections. In Election fraud, ed. Michael Alvarez, R., Hall, Thad E., and Hyde, Susan D. Washington, DC: The Brookings Institutions.Google Scholar
Mebane, Walter R. 2010. Fraud in the 2009 presidential election in Iran? Chance 23: 615.Google Scholar
Mitchell, T. 1997. Machine learning. New York: McGraw-Hill.Google Scholar
Nye, John, and Moul, Charles. 2007. The political economy of numbers: On the application of Benford's Law to international macroeconomics statistics. The B. E. Journal of Macroeconomics 7(1): 112.Google Scholar
Pericchi, Luis R., and Torres, David. 2004. La Ley de Newcomb-Benford y sus aplicaciones al Referendum Revocatorio en Venezuela. Working paper.Google Scholar
Phua, Clifton, Lee, Vincent, Smith, Kate, and Gayler, Ross. 2005. A comprehensive survey of data mining-based fraud detection research. Victoria: 1-14. http://arxiv.org/abs/1009.6119.Google Scholar
Potash, Robert A. 1969. The army and politics in Argentina: 1928-1945. Stanford, CA: Stanford University Press.Google Scholar
Przeworski, Adam. 2010. Democracy and the limits of self-government. New York: Cambridge University Press.Google Scholar
Puketza, N. J., Zhang, K., Chung, M., Mukherjee, B., and Olsson, R. A. 1996. A methodology for testing intrusion detection systems. Software Engineering 22 (10): 719–29.Google Scholar
Reiter, J. P. 2004. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30: 235–42.Google Scholar
Rish, Irina. 2001. An empirical study of the naive Bayes classifier. Proceedings of the IJCAI workshop on “Empirical Methods in AI.”Google Scholar
Roukema, Boudewijn F. 2009. Benford's Law anomalies in the 2009 Iranian presidential election. Unpublished manuscript.Google Scholar
Rubin, Donald B. 1993. Discussion statistical disclosure limitation. Journal of Official Statistics 9: 461–8.Google Scholar
Schäfer, Christin, Schräpler, Jörg-Peter, Müller, Klaus-Robert, and Wagner, Gert G. 2004. Automatic identification of faked and fraudulent interviews in surveys by two different methods. Working paper.Google Scholar
Tan, Aik Choon, and Gilbert, David. 2003. An empirical comparison of supervised machine learning techniques in bioinformatics. First Asia-Pacific Bioinformatics Conference, Adelaide, Australia, February 4-7.Google Scholar
Varian, Hal A. 1972. Benford's Law. The American Statistician 26(3): 65–6.Google Scholar
Walter, Richard J. 1985. The province of Buenos Aires and Argentine politics, 1912-1943. New York: Cambridge University Press.Google Scholar
Wong, Weng-Keen, Moore, Andrew, Cooper, Gregory, and Wagner, Michael. 2003. Bayesian network anomaly pattern detection for disease outbreaks. Proceedings of the International Conference on Machine Learning, Washington, DC, August 21-24.Google Scholar
Yu, Lei, and Liu, Huan. 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5: 1205–24.Google Scholar
Zetter, Kim. 2009. Crunching Iranian election numbers for evidence of fraud. http://www.wired.com/threatlevel/2009/06/iran_numbers.Google Scholar
Zhang, Harry. 2004. The optimality of naive Bayes. Proceedings of the Seventeenth Florida Artificial Intelligence Research Society Conference. The AAAI Press, pp. 562-67.Google Scholar
Zhang, Harry, and Su, Jiang. 2004. Naive Bayesian classifiers for ranking. In ECML 2004, lecture notes in Computer Science, ed. Boulicaut, J.-F. Berlin, Germany: Springer.Google Scholar