Skip to main content Accessibility help
×
  • Cited by 7

Book description

Causal inference and machine learning are typically introduced in the social sciences separately as theoretically distinct methodological traditions. However, applications of machine learning in causal inference are increasingly prevalent. This Element provides theoretical and practical introductions to machine learning for social scientists interested in applying such methods to experimental data. We show how machine learning can be useful for conducting robust causal inference and provide a theoretical foundation researchers can use to understand and apply new methods in this rapidly developing field. We then demonstrate two specific methods – the prediction rule ensemble and the causal random forest – for characterizing treatment effect heterogeneity in survey experiments and testing the extent to which such heterogeneity is robust to out-of-sample prediction. We conclude by discussing limitations and tradeoffs of such methods, while directing readers to additional related methods available on the Comprehensive R Archive Network (CRAN).

References

Abramson, Scott F., Kocak, Korhan, Magazinnik, Asya, and Strezhnev, Anton. 2020. “Improving Preference Elicitation in Conjoint Designs Using Machine Learning for Heterogeneous Effects.” Working paper. www.korhankocak.com/publication/akms/.
Athey, Susan, and Imbens, Guido. 2016. “Recursive Partitioning for Heterogeneous Causal Effects.” Proceedings of the National Academy of Sciences 113 (27): 73537360.
Athey, Susan, Tibshirani, Julie, and Wager, Stefan. 2019. “Generalized Random Forests.” Annals of Statistics 47 (2): 11481178.
Ballarini, Nicolas M., Thomas, Marius, Rosenkranz, Gerd K., and Bornkamp, Björn. 2021. “Subtee: An R Package for Subgroup Treatment Effect Estimation in Clinical Trials.” Journal of Statistical Software 99 (14): 117.
Bates, Stephen, Hastie, Trevor, and Tibshirani, Robert. 2021. “Cross-Validation: What Does It Estimate and How Well Does It Do It?” Working paper. https://arxiv.org/abs/2104.00673.
Beebee, Helen, Hitchcock, Christopher, and Menzies, Peter. 2009. The Oxford Handbook of Causation. Oxford: Oxford University Press.
Beiser-McGrath, Janina, and Liam, Beiser-McGrath. 2020. “Problems with Products? Control Strategies for Models with Interaction and Quadratic Effects.” Political Science Research and Methods 8 (4): 707730.
Blackwell, Matthew, and Olson, Michael. 2022a. Inters: Flexible Tools for Estimating Interactions. https://CRAN.R-project.org/package=inters.
Blackwell, Matthew, and Olson, Michael 2022b. “Reducing Model Misspecification and Bias in the Estimation of Interactions.” Political Analysis 30 (4): 495514.
Blair, Elizabeth. 2020. “‘Ugly,’ ‘Discordant’: New Executive Order Takes Aim at Modern Architecture.” NPR, December 21. www.npr.org/2020/02/13/805256707/just-plain-ugly-proposed-executive-order-takes-aim-at-modern-architecture.
Bon, Joshua J. 2022. Tidytreatment: Tidy Methods for Bayesian Treatment Effect Models. https://CRAN.R-project.org/package=tidytreatment.
Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24: 123140.
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45: 532.
Bryan, Christopher J., Tipton, Elizabeth, and Yeager, David S.. 2021. “Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution.” Nature Human Behavior 5: 980989.
Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book. Andriy Burkov.
Campbell, Donald T. 1973. “The Social Scientist As Methodological Servant of the Experimenting Society.” Policy Studies and the Social Sciences 2 (1): 2732.
Chen, Shuai, Tian, Lu, Cai, Tianxi, and Yu, Menggang. 2017. “A General Statistical Framework for Subgroup Identification and Comparative Treatment Scoring.” Biometrics 73 (4): 11991209. https://doi.org/10.1111/biom.12676.
Chen, Tianqi, and Guestrin, Carlos. 2016. “XGBoost: A Scalable Tree Boosting System.” In KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785794. New York: Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785.
Chen, Tianqi, Tong, He, Benesty, Michael et al. 2022. Xgboost: Extreme Gradient Boosting. https://CRAN.R-project.org/package=xgboost.
Chernozhukov, Victor, Demirer, Mert, Duflo, Esther, and Fernandez-Val, Ivan. 2018. “Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India.” National Bureau of Economic Research. Working Paper No. 24678.
Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.
Crandall, Christian S., Silvia, Paul J., N’Gbala, Ahogni Nicolas, Tsang, Jo-Ann, and Dawson, Karen. 2007. “Balance Theory, Unit Relations, and Attribution: The Underlying Integrity of Heiderian Theory.” Review of General Psychology 11 (1): 1230.
Cranmer, Skyler, and Desmarais, Bruce. 2017. “What Can We Learn from Predictive Modeling?Political Analysis 25 (2): 145166.
Cronbach, Lee J. 1975. “Beyond the Two Disciplines of Scientific Psychology.” American Psychologist 30 (2): 116127.
Dusseldorp, Elise, Doove, Lisa, and van Mechelen, Iven. 2016. “Quint: An R Package for the Identification of Subgroups of Clients Who Differ in Which Treatment Alternative Is Best for Them.” Behavior Research Methods 48 (2): 650663.
Dusseldorp, Elise, and Van Mechelen, Iven. 2014. “Qualitative Interaction Trees: A Tool to Identify Qualitative Treatment–Subgroup Interactions.” Statistics in Medicine 33 (2): 219237.
Ebersole, Charles R., Atherton, Olivia E., Belanger, Aimee L. et al. 2016. “Many Labs 3: Evaluating Participant Pool Quality across the Academic Semester via Replication.” Journal of Experimental Social Psychology 67: 6882.
Ebersole, Charles R., Mathur, Maya B., Baranski, Erica et al. 2020. “Many Labs 5: Testing Pre-Data-Collection Peer Review As an Intervention to Increase Replicability.” Advances in Methods and Practices in Psychological Science 3 (3): 309331.
Fariss, Christopher, and Jones, Zachary. 2018. “Enhancing Validity in Observational Settings When Replication Is Not Possible.” Political Science Research and Methods 6 (2): 365380.
Fokkema, Marjolein. 2020. “Fitting Prediction Rule Ensembles with R Package pre.” Journal of Statistical Software 92 (12): 130.
Fokkema, Marjolein, and Strobl, Carolin. 2020. “Fitting Prediction Rule Ensembles to Psychological Research Data: An Introduction and Tutorial.” Psychological Methods 25 (5): 636652.
Foster, Jared C., Taylor, Jeremy M. G., and Ruberg, Stephen J.. 2011. “Subgroup Identification from Randomized Clinical Trial Data.” Statistics in Medicine 30 (24): 28672880.
Freund, Yoav, and Schapire, Robert E.. 1996. “Experiments with a New Boosting Algorithm.” In Saitta, Lorenza, ed., ICML ’96: Proceedings of the Thirteenth International Conference on Machine Learning, 148156. San Francisco, CA: Morgan Kaufmann.
Friedman, Jerome. 2002. “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis 38 (4): 367378.
Gelman, Andrew. 2015. “The Connection between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective.” Journal of Management 41 (2): 632643.
Gelman, Andrew, and Loken, Eric. 2013. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘Fishing Expedition’ or ‘P-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.” [Online]. www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf.
Gentzkow, Matthew, Jesse, Shapiro, and Taddy, Matthew. 2019. “Measuring Group Differences in High Dimensional Choices: Method and Application to Congressional Speech.” Econometrica 87 (4): 13071340.
Géron, Aurélien. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, CA: O’Reilly Media.
Glass, Gene V. 1976. “Primary, Secondary, and Meta-Analysis of Research.” Educational Researcher 5 (10): 38.
Green, Donald, and Kern, Holger. 2012. “Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76 (3): 491511.
Green, Donald P., and Gerber, Alan S.. 2004. Get Out the Vote! How to Increase Voter Turnout. Washington, DC: Brookings Institution Press.
Green, Jon, Schaffner, Brian, and Luks, Sam. 2023. “Strategic Discrimination in the 2020 Democratic Primary.” Public Opinion Quarterly nfac051. https://doi.org/10.1093/poq/nfac051.
Grimmer, Justin, Messing, Solomon, and Westwood, Sean J.. 2017. “Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods.” Political Analysis 25 (4): 413434.
Ham, Dae Woong, Imai, Kosuke, and Janson, Lucas. 2022. “Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis.” arXiv. https://arxiv.org/abs/2201.08343.
Hare, Christopher, and Kutsuris, Mikayla. 2022. “Measuring Swing Voters with a Supervised Machine Learning Ensemble.” Political Analysis, 117. www.cambridge.org/core/journals/political-analysis/article/measuring-swing-voters-with-a-supervised-machine-learning-ensemble/145B1D6B0B2877FC454FBF446F9F1032.
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer Science & Business Media.
Head, Megan L., Holman, Luke, Lanfear, Rob, Kahn, Andrew T, and Jennions, Michael D. 2015. “The Extent and Consequences of P-Hacking in Science.” PLoS Biology 13 (3): e1002106.
Heider, Fritz. 1958. The Psychology of Interpersonal Relations. New York: Wiley.
Hernàn, Miguel A., and VanderWeele, Tyler J.. 2011. “Compound Treatments and Transportability of Causal Inference.” Epidemiology 22 (3): 368377.
Hoffman, Jake M., Sharma, Amit, and Watts, Duncan J.. 2021. “Prediction and Explanation in Social Systems.” Science 355 (6324): 486488. https://science.sciencemag.org/content/355/6324/486.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945960.
Huling, Jared D., and Yu, Menggang. 2021. “Subgroup Identification Using the personalized Package.” Journal of Statistical Software 98 (5): 160. https://doi.org/10.18637/jss.v098.i05.
Imai, Kosuke, and Ratkovic, Marc. 2013. “Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.” Annals of Applied Statistics 7 (1): 443470.
Imai, Kosuke, and Strauss, Aaron. 2011. “Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign.” Political Analysis 19 (1): 119.
James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Robert. 2013. An Introduction to Statistical Learning. New York: Springer.
Keele, Luke. 2015. “The Statistics of Causal Inference: A View from Political Methodology.” Political Analysis 23 (3): 313335.
Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3): 196217.
Klein, Richard A., Cook, Corey L., Ebersole, Charles R. et al. 2019. “Many Labs 4: Failure to Replicate Mortality Salience Effect with and without Original Author Involvement.” PsyArXiv. https://doi.org/10.31234/osf.io/vef2c.
Klein, Richard A., Vianello, Michelangelo, Hasselman, Fred et al. 2018. “Many Labs 2: Investigating Variation in Replicability across Samples and Settings.” Advances in Methods and Practices in Psychological Science 1 (4): 443490.
Kuhn, Max, and Johnson, Kjell. 2013. Applied Predictive Modeling. Vol. 26. New York: Springer.
Kuhn, Max, and Silge, Julia. 2022. Tidy Modeling with R: A Framework for Modeling in the Tidyverse. Sebastopol, CA: O’Reilly Media.
Künzel, Sören R., Sekhon, Jasjeet S., Bickel, Peter J., and Bin, Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 41564165.
Lipkovich, Ilya, Dmitrienko, Alex, Denne, Jonathan, and Enas, Gregory. 2011. “Subgroup Identification Based on Differential Effect Search: A Recursive Partitioning Method for Establishing Response to Treatment in Patient Subpopulations.” Statistics in Medicine 30 (21): 26012621.
McClelland, Gary H., and Judd, Charles M.. 1993. “Statistical Difficulties of Detecting Interactions and Moderator Effects.” Psychological Bulletin 114 (2): 376.
Montgomery, Jacob M., and Olivella, Santiago. 2018. “Tree-Based Models for Political Science Data.” American Journal of Political Science 62 (3): 729744.
Nicholson, Stephen. 2012. “Polarizing Cues.” American Journal of Political Science 56 (1): 5266.
Nicosia, Jessica, Cohen-Shikora, Emily R., and Balota, David A.. 2021. “Re-examining Age Differences in the Stroop Effect: The Importance of the Trees in the Forest (Plot).” Psychology and Aging 36 (2): 214231.
Nie, Xinkun, and Wager, Stefan. 2021. “Quasi-Oracle Estimation of Heterogeneous Treatment Effects.” Biometrika 108 (2): 299319.
Nosek, Brian A., Ebersole, Charles R., Alexander, C. DeHaven, and Mellor, David T.. 2018. “The Preregistration Revolution.” Proceedings of the National Academy of Sciences 115 (11): 26002606.
Peterson, Andrew, and Spirling, Arthur. 2018. “Classification Accuracy As a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26 (1): 120128.
Polley, Eric, LeDell, Erin, Kennedy, Chris, and van der Laan, Mark. 2021. SuperLearner: Super Learner Prediction. https://CRAN.R-project.org/package=SuperLearner.
Ratkovic, Marc. 2021. “Subgroup Analysis: Pitfalls, Promise, and Honesty.” In Druckman, James N. and Green, Donald P. (Eds.), Advances in Experimental Political Science, 271288. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108777919.020.
Ratkovic, Marc, and Tingley, Dustin. 2017. “Sparse Estimation and Uncertainty with Application to Subgroup Analysis.” Political Analysis 25 (1): 140.
Ripley, Brian. 2021. Tree: Classification and Regression Trees. https://CRAN.R-project.org/package=tree.
Riviere, Marie-Karelle. 2021. SIDES: Subgroup Identification Based on Differential Effect Search. https://CRAN.R-project.org/package=SIDES.
Rosenthal, Robert. 1979. “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin 86 (3): 638.
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688701.
Rubin, Donald B. 2008. “For Objective Causal Inference, Design Trumps Analysis.” Annals of Applied Statistics 2 (3): 808840.
Rubin, Mark, and Donkin, Chris. 2022. “Exploratory Hypothesis Tests Can Be More Compelling Than Confirmatory Hypothesis Tests.” Philosophical Psychology. https://doi.org/10.1080/09515089.2022.2113771.
Seibold, Heidi, Zeileis, Achim, and Hothorn, Torsten. 2019. “Model4you: An R Package for Personalised Treatment Effect Estimation.” Journal of Open Research Software 7 (1). http://doi.org/10.5334/jors.219.
Shmueli, Galit. 2010. “To Explain or to Predict?Statistical Science 25 (3): 289310.
Shrout, Patrick E., and Rodgers, Joseph L.. 2018. “Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis.” Annual Review of Psychology 69 (1): 487510. https://doi.org/10.1146/annurev-psych-122216-011845.
Silberzahn, Raphael, Uhlmann, Eric L., Martin, Daniel P. et al. 2018. “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results.” Advances in Methods and Practices in Psychological Science 1 (3): 337356.
Simmons, Joseph P., Nelson, Leif D., and Simonsohn, Uri. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything As Significant.” Psychological Science 22 (11): 13591366.
Simonsohn, Uri, Nelson, Leif D., and Simmons, Joseph P.. 2014. “P-Curve: A Key to the File-Drawer.” Journal of Experimental Psychology: General 143 (2): 534.
Soderberg, Courtney K., Errington, Timothy M., Schiavone et al, Sarah R.. 2021. “Initial Evidence of Research Quality of Registered Reports Compared with the Standard Publishing Model.” Nature Human Behaviour 5: 990997. https://doi.org/10.1038/s41562-021-01142-4.
Sparapani, Rodney, Spanbauer, Charles, and Robert, McCulloch. 2021. “Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package.” Journal of Statistical Software 97 (1): 166. https://doi.org/10.18637/jss.v097.i01.
Stieger, James H. 1990. “Structural Model Evaluation and Modification: An Interval Estimation Approach.” Multivariate Behavioral Research 25 (2): 173180.
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, and Zeileis, Achim. 2008. “Conditional Variable Importance for Random Forests.” BMC Bioinformatics 9 (307). https://doi.org/10.1186/1471-2105-9-307.
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, and Hothorn, Torsten. 2007. “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics 8 (25). https://doi.org/10.1186/1471-2105-8-25.
Tibshirani, Julie, Athey, Susan, Sverdrup, Erik, and Wager, Stefan. 2021. Grf: Generalized Random Forests. https://CRAN.R-project.org/package=grf.
Vieille, Francois, and Foster, Jared. 2018. AVirtualTwins: Adaptation of Virtual Twins Method from Jared Foster. https://CRAN.R-project.org/package=aVirtualTwins.
Wager, Stefan, and Athey, Susan. 2018. “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.” Journal of the American Statistical Association 113 (523): 12281242.
Wang, Chenguang, Louis, Thomas A., Henderson, Nicholas C., Weiss, Carlos O., and Varadhan, Ravi. 2018. “Beanz: An R Package for Bayesian Analysis of Heterogeneous Treatment Effects with a Graphical User Interface.” Journal of Statistical Software 85 (7): 131.
Wright, Marvin N., and Ziegler, Andreas. 2017. “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1): 117. https://doi.org/10.18637/jss.v077.i01.
Yadlowsky, Steve, Fleming, Scott, Shah, Nigam, Brunskill, Emma, and Wager, Stefan. 2021. “Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects.” arXiv. https://arxiv.org/abs/2111.07966.
Yarkoni, Tal, and Westfall, Jacob. 2017. “Choosing Prediction over Explanation in Psychology: Lessons from Machine Learning.” Perspectives on Psychological Science 12 (6): 11001122.

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.