Hostname: page-component-89b8bd64d-r6c6k Total loading time: 0 Render date: 2026-05-12T23:19:39.613Z Has data issue: false hasContentIssue false

Predicting politicians’ misconduct: Evidence from Colombia

Published online by Cambridge University Press:  14 November 2022

Jorge Gallego*
Affiliation:
School of Economics, Universidad del Rosario, Bogota, Colombia
Mounu Prem
Affiliation:
Einaudi Institute for Economics and Finance, Rome, Italy
Juan F. Vargas
Affiliation:
School of Economics, Universidad del Rosario, Bogota, Colombia
*
*Corresponding author. E-mail: jorge.gallego@urosario.edu.co

Abstract

Corruption has pervasive effects on economic development and the well-being of the population. Despite being crucial and necessary, fighting corruption is not an easy task because it is a difficult phenomenon to measure and detect. However, recent advances in the field of artificial intelligence may help in this quest. In this article, we propose the use of machine-learning models to predict municipality-level corruption in a developing country. Using data from disciplinary prosecutions conducted by an anti-corruption agency in Colombia, we trained four canonical models (Random Forests, Gradient Boosting Machine, Lasso, and Neural Networks), and ensemble their predictions, to predict whether or not a mayor will commit acts of corruption. Our models achieve acceptable levels of performance, based on metrics such as the precision and the area under the receiver-operating characteristic curve, demonstrating that these tools are useful in predicting where misbehavior is most likely to occur. Moreover, our feature-importance analysis shows us which groups of variables are most important in predicting corruption.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Model’s parameters

Figure 1

Figure 1. ROC curve. This figure presents the ROC curves for all our models. In blue, we present the ROC curve for the Random Forest model, in black, for the Gradient Boosting Machine, in red, for Lasso, in orange, for Neural Networks, and in green, for the Super Learner.

Figure 2

Table 2. Model performance

Figure 3

Figure 2. Group importance. This figure presents the relative importance of group of covariates as described in Section 3.4.

Figure 4

Figure 3. Covariates importance. This figure presents the relative importance of covariates as described in Section 3.4.

Supplementary material: File

Gallego et al. supplementary material

Gallego et al. supplementary material

Download Gallego et al. supplementary material(File)
File 47.5 KB
Submit a response

Comments

No Comments have been published for this article.