Skip to main content Accessibility help
Hostname: page-component-768ffcd9cc-nzrtw Total loading time: 0.344 Render date: 2022-12-03T19:26:42.285Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia

Published online by Cambridge University Press:  04 January 2017

Cyrus Samii*
Department of Politics, New York University, 19 West 14th Street, New York, NY 10012
Laura Paler
Department of Political Science, University of Pittsburgh, 4600 Wesley W. Posvar Hall, Pittsburgh, PA 15260 e-mail:
Sarah Zukerman Daly
Department of Political Science, University of Notre Dame, 217 O’Shaughnessy Hall, Notre Dame, IN 46556 e-mail:


We present new methods to estimate causal effects retrospectively from micro data with the assistance of a machine learning ensemble. This approach overcomes two important limitations in conventional methods like regression modeling or matching: (i) ambiguity about the pertinent retrospective counterfactuals and (ii) potential misspecification, overfitting, and otherwise bias-prone or inefficient use of a large identifying covariate set in the estimation of causal effects. Our method targets the analysis toward a well-defined “retrospective intervention effect” based on hypothetical population interventions and applies a machine learning ensemble that allows data to guide us, in a controlled fashion, on how to use a large identifying covariate set. We illustrate with an analysis of policy options for reducing ex-combatant recidivism in Colombia.

Copyright © The Author 2016. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Authors’ note: Authors are listed in reverse alphabetical order and are equal contributors to the project. All replication materials are available at the Political Analysis Dataverse (article url: We thank Carolina Serrano for excellent research assistance in Colombia and the team at Fundación Ideas para la Paz for their collaboration in the data collection. We also thank the Organization of American States, Misión de Apoyo al Proceso de Paz and the Agencia Colombiana para la Reintegración for their collaboration. Daly acknowledges funding from the Swedish Foreign Ministry, the Smith Richardson Foundation, and the Carroll L. Wilson Award. For helpful discussions, the authors thank Michael Alvarez, two anonymous Political Analysis reviewers, Deniz Aksoy, Peter Aronow, Neal Beck, Matthew Blackwell, Drew Dimmery, Ryan Jablonski, Michael Peress, Fredrik Savje, Maya Sen, Teppei Yamamoto, Rodrigo Zarazaga, and seminar participants at the American Political Science Association annual meetings, European Political Science Association annual meetings, Empirical Studies of Conflict working group, Massachussetts Institute of Technology, Midwest Political Science association annual meetings, New York University, and the University of Rochester. Supplementary materials for this article are available on the Political Analysis website.


Angrist, Joshua D., and Krueger, Alan B. 1999. Empirical strategies in labor economics. In Handbook of labor economics, eds. Ahsenfelter, Orley C. and Card, David, Vol. 3:1277–1366. Amsterdam: North Holland.CrossRefGoogle Scholar
Angrist, Joshua D., and Pischke, Jorn-Steffen. 2009. Mostly harmless econometrics: an empiricist's companion. Princeton, NJ: Princeton University Press.Google Scholar
Aronow, Peter M., and Samii, Cyrus. 2016. Does regression produce representative estimates of causal effects? American Journal of Political Science 60(1):250–67.CrossRefGoogle Scholar
Athey, Susan, and Imbens, Guido W. 2015. Machine learning methods for estimating heterogeneous causal effects. Working paper.Google Scholar
Bang, Heejung, and Robins, James M. 2005. Doubly robust estimation in missing data and causal inference models. Biometrics 61:962–72.CrossRefGoogle ScholarPubMed
Bickel, Peter J., and Li, Bo. 2006. Regularization in statistics. Test 15(2):271344.CrossRefGoogle Scholar
Blackwell, Matthew. 2013. A framework for dynamic causal inference in political science. American Journal of Political Science 57(2):504–19.CrossRefGoogle Scholar
Busso, Matias, DiNardo, John, and McCrary, Justin. 2014. New evidence on the finite sample properties of propensity score reweighting and matching estimators. The Review of Economics and Statistics 96(5):885–97.Google Scholar
Chalimourda, Athanassia, Schoelkopf, Bernhard, and Smola, Alex J. 2004. Experimentally optimal v in support vector regression for difference noise models and parameter settings. Neural Networks 17:127–41.CrossRefGoogle Scholar
Chen, Pai-Hsuen, Lin, Chih-Jen, and Schoelkopf, Bernhard. 2005. A tutorial on nu-support vector machines. Applied Stochastic Models in Business and Industry 21:111–36.CrossRefGoogle Scholar
Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. 2010. BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1):266–98.CrossRefGoogle Scholar
Cox, David R. 1958. Planning of experiments. New York: Wiley.Google Scholar
Crump, Richard K., Joseph Hotz, V., Imbens, Guido W., and Mitnik, Oscar A. 2009. Dealing with limited overlap in estimation of average treatment effects. Biometrika 96(1):187–99.CrossRefGoogle Scholar
Daly, Sarah Zukerman, Laura, Paler, and Cyrus, Samii. 2016. Wartime Networks and the Social Logic of Crime. Typescript, University of Notre Dame, University of Pittsburgh: New York University.Google Scholar
Gelman, Andrew, Jakulin, Aleks, Grazia Pittau, Maria, and Su, Yu-Sung. 2008. A weakly informative default prior for logistic and other regression models. Annals of Applied Statistics 2(4):1360–83.CrossRefGoogle Scholar
Geman, Stuart, and Hwang, Chii-Ruey. 1982. Nonparametric maximum likelihood estimation by the method of sieves. The Annals of Statistics 10(2):401–14.CrossRefGoogle Scholar
Green, Donald P., and Kern, Holger L. 2012. Modeling heterogenous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly 76(3):491511.CrossRefGoogle Scholar
Greenshtein, Eitan, and YaAcov, Ritov. 2004. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6):971–88.CrossRefGoogle Scholar
Grimmer, Justin, Messing, Solomon, and Westwood, Sean J. 2014. Estimating heterogenous treatment effects and the effects of heterogenous treatments with ensemble methods. Unpublished manuscript, Stanford University.Google Scholar
Hainmueller, Jens. 2011. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis 17(4):400–17.Google Scholar
Hainmueller, Jens, and Hazlett, Chad. 2014. Kernel regularized least squares: Reducing misspecification bias with a flexible ad interpretable machine learning approach. Political Analysis 22(2):143–68.CrossRefGoogle Scholar
Hansen, Ben B. 2008. The prognostic analogue to the propensity score. Biometrika 95(2):481–88.CrossRefGoogle Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.CrossRefGoogle Scholar
Hill, Jennifer. 2011. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20(1):217–40.CrossRefGoogle Scholar
Ho, Daniel E., Imai, Kosuke, King, Gary, and Stuart, Elizabeth A. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15(3):199236.CrossRefGoogle Scholar
Holland, Paul W. 1986. Statistics and causal inference. Journal of the American Statistical Association 81(396):945–60.Google Scholar
Hubbard, Alan E., and Van der Laan, Mark J. 2008. Population intervention models in causal inference. Biometrika 95(1):3547.CrossRefGoogle ScholarPubMed
Imai, Kosuke, and Ratkovic, Marc. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Annals of Applied Statistics 7(1):443–70.CrossRefGoogle Scholar
Imai, Kosuke, and Strauss, Aaron. 2011. Estimation of heterogenous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19(1):119.CrossRefGoogle Scholar
Imai, Kosuke, and van Dyk, David A. 2004. Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association 99(467):854–66.CrossRefGoogle Scholar
Imbens, Guido W. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics 86(1):429.CrossRefGoogle Scholar
Imbens, Guido W., and Wooldridge, Jeffrey M. 2009. Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47(1):586.CrossRefGoogle Scholar
International Crisis Group. 2012. Dismantling Colombia's new illegal armed groups: Lessons from a surrender. International Crisis Group Latin America Report 41.Google Scholar
King, Gary, and Zeng, Langche. 2002. Estimating risk and rate leveks, ratios, and differences in case–control studies. Statistics in Medicine 21(10):1409–27.CrossRefGoogle Scholar
King, Gary, and Zeng, Langche. 2006. The dangers of extreme counterfactuals. Political Analysis 14(2):131–59.Google Scholar
Korn, Edward L., and Graubard, Barry I. 1999. Analysis of health surveys. New York: Wiley.CrossRefGoogle Scholar
Little, Roderick J.A., and Rubin, Donald B. 2002. Statistical analysis with missing data, 2nd ed. Hoboken, NJ: Wiley.Google Scholar
Lumley, Thomas. 2010. Complex surveys: A guide to analysis in R. Hoboken, NJ: Wiley.CrossRefGoogle Scholar
Manski, Charles F. 1995. Identification problems in the social sciences. Cambridge, MA: Harvard University Press.Google Scholar
Montgomery, Jacob M., Hollanbach, Florian M., and Ward, Michael D. 2012. Improving predictions using ensemble Bayesian model averaging. Political Analysis 20:271–91.CrossRefGoogle Scholar
Myers, Jessica A., Rassen, Jeremy A., Gagne, Jashua J., Huybrechts, Krista F., Schneeweiss, Sebastian, Rothman, Kenneth J., Joffe, Marshall M., and Glynn, Robert J. 2011. Effects of adjusting for instrumental variables on bias and precision of effect estimates. American Journal of Epidemiology 174(11):1213–22.CrossRefGoogle ScholarPubMed
O’Brien, Peter C. 1984. Procedures for comparing samples with multiple endpoints. Biometrics 40(4):1079–87.Google ScholarPubMed
Pearl, Judea. 2009. Causality: Models, reasoning, and inference, 2nd ed. New York: Cambridge University Press.CrossRefGoogle Scholar
Pearl, Judea. 2010. On a class of bias-amplifying variables that endanger effect estimates. In Proceedings of UAI, eds. Grunwald, Peter and Spirtes, Peter, 417–24. Corvallis, OR: AUAI.Google Scholar
Petersen, Maya L., Porter, Kristin E., Gruber, Susan, Wang, Yue, and Van der Laan, Mark J. 2011. Positivity. In Targeted learning: Causal inference for observational and experimental data, eds. Van der Laan, Mark J. and Rose, Sherri, chap. 10, 161–86. New York: Springer.Google Scholar
Polley, Eric C., and Van der Laan, Mark J. 2012. SuperLearner: Super learner prediction. R package version 2.0–9. Scholar
Polley, Eric C., Rose, Sherri, and Van der Laan, Mark J. 2011. Super learning. In Targeted learning: Causal inference for observational and experimental data, eds. Van der Laan, Mark J. and Rose, Sherri, chap. 3, 4366. New York: Springer.CrossRefGoogle Scholar
Ratkovic, Marc. 2014. Balancing within the margin: Causal effect estimation with support vector machines. Unpublished manuscript, Princeton University.Google Scholar
Robins, James M., and Rotnitzky, Andrea. 1995. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association 90:122–29.Google Scholar
Rosenbaum, Paul R. 1984. The consequences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society, Series A 147(5):656–66.CrossRefGoogle Scholar
Rosenbaum, Paul R., and Rubin, Donald B. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):4155.CrossRefGoogle Scholar
Rothman, Kenneth J., Greenland, Sander, and Lash, Timothy L. 2008. Modern epidemiology, 3rd ed. Philadelphia, PA: Lippincott, Williams. and Wilkins.Google Scholar
Royston, Patrick. 2004. Multiple imputation of missing values. Stata Journal 4(3):227–41.Google Scholar
Rubin, Donald B. 1978. Bayesian inference for causal effects: The role of randomization. The Annals of Statistics 6(1):3458.CrossRefGoogle Scholar
Rubin, Donald B. 2008. For objective causal inference, design trumps analysis. The Annals of Applied Statistics 2(3):808–40.CrossRefGoogle Scholar
Rubin, Donald D. 1990. Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference 25:279–92.CrossRefGoogle Scholar
Samii, Cyrus. 2016. Replication data for: Retrospective causal inference with machine learning ensembles: An application to anti-recidivism policies in Colombia., Harvard Dataverse.CrossRefGoogle Scholar
Sekhon, Jasjeet S. 2009. Opiates for the matches: Matching methods for causal inference. Annual Review of Political Science 12(1):487508.CrossRefGoogle Scholar
Tourangeau, Roger, and Yan, Ting. 2005. Sensitive questions in surveys. Psychological Bulletin 133(5):859–83.Google Scholar
Van der Laan, Mark J., Polley, Eric C., and Hubbard, Alan E. 2007. Super learner. Statistical Applications in Genetic and Molecular Biology 6(1):121.Google ScholarPubMed
Van der Laan, Mark J., and Rose, Sherry. 2011. Targeted learning: Causal inference for observational and experimental data. New York: Springer.CrossRefGoogle Scholar
VanderWeele, Tyler, 2009. Concerning the consistency assumption in causal inference. Epidemiology 20(6):880–83.CrossRefGoogle ScholarPubMed
Young, Jessica G., Hubbard, Alan E., Eshkenazi, Brenda, and Jewell, Nicholas P. 2009. A machine-learning algorithm for estimating and ranking the impact of environmental risk factors in exploratory epidemiological studies. University of California Berkeley Division of Biostatistics Working Paper Series 250.Google Scholar
Supplementary material: PDF

Samii et al. Supplementary Material

Supplementary Material

Download Samii et al. Supplementary Material(PDF)
PDF 149 KB
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *