Skip to main content Accessibility help

Using Split Samples to Improve Inference on Causal Effects

  • Marcel Fafchamps (a1) and Julien Labonne (a2)


We discuss a statistical procedure to carry out empirical research that combines recent insights about preanalysis plans (PAPs) and replication. Researchers send their datasets to an independent third party who randomly generates training and testing samples. Researchers perform their analysis on the training sample and are able to incorporate feedback from both colleagues, editors, and referees. Once the paper is accepted for publication the method is applied to the testing sample and it is those results that are published. Simulations indicate that, under empirically relevant settings, the proposed method delivers more power than a PAP. The effect mostly operates through a lower likelihood that relevant hypotheses are left untested. The method appears better suited for exploratory analyses where there is significant uncertainty about the outcomes of interest. We do not recommend using the method in situations where the treatment are very costly and thus the available sample size is limited. An interpretation of the method is that it allows researchers to perform direct replication of their work. We also discuss a number of practical issues about the method’s feasibility and implementation.


Corresponding author


Hide All

Author’s note: We thank Michael Alvarez (Co-Editor), two anonymous referees, Rob Garlick and Kate Vyborny for discussions and comments. All remaining errors are ours. Replication data are available on the Harvard Dataverse (Fachamps and Labonne 2017). Supplementary materials for this article are available on the Political Analysis Web site.

Contributing Editor: R. Michael Alvarez



Hide All
Anderson, Michael L. 2008. Multiple inference and gender differences in the effects of early intervention: A reevaluation of the abecedaian, perry preschool, and early training projects. Journal of the American Statistical Association 103(484):14811495.
Athey, Susan, and Imbens, Guido. 2015. Machine learning methods for estimating heterogeneous causal effects. Stanford University. Mimeo.
Bell, Mark, and Miller, Nicholas. 2015. Questioning the effect of nuclear weapons on conflict. Journal of Conflct Resolution 59(1):7492.
Belloni, Alexandre, Chernozhukov, Victor, and Hansen, Christian. 2014. High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives 28(2):2950.
Benjamini, Yoav, Krieger, Abba M., and Yekutieli, Daniel. 2006. Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3):491507.
Benjamini, Yoav, and Yekutieli, Daniel. 2001. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29(4):11651188.
Benjamini, Yoav, and Hochberg, Yosef. 1995. Controlling the false discovery rate: A pactrical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1):289300.
Blair, Graeme, Cooper, Jasper, Coppock, Alexander, and Humphreys, Macartan. 2016. Declaring and diagnosing research designs. Columbia University. Mimeo.
Brodeur, Abel, Le, Mathias, Sangnier, Marc, and Zylberberg, Yanos. 2016. Star wars: The empirics strike back. American Economic Journal: Applied Economics 8(1):132.
Coffman, Lucas C., and Niederle, Muriel. 2015. Pre-analysis plans are not the solution replications might be. Journal of Economic Perspectives 29(3):8198.
Dunning, Thad. 2016. Transparency, replication, and cumulative learning: What experiments alone cannot achieve. Annual Review of Political Science 19(1):S1S23.
Einav, Liran, and Levin, Jonathan. 2014. Economics in the age of big data. Science 346(6210):715.
Fachamps, Marcel, and Labonne, Julien. 2017. Replication data for “Using split samples to improve inference on causal effects”. doi:10.7910/DVN/Q0IXQY, Harvard Dataverse, V1.
Findley, Michael G., Jensen, Nathan M., Malesky, Edmund J., and Pepinsky, Thomas B.. Forthcoming. Can results-free review reduce publication bias? The results and implications of a pilot study. Comparative Political Studies.
Franco, Annie, Malhotra, Neil, and Simonovits, Gabor. 2014. Publication bias in the social sciences: Unlocking the file drawer. Science 345(6203):15021505.
Gelman, Andrew. 2014. Preregistration: What’s in it for you? preregistration-whats/.
Gelman, Andrew. 2015. The connection between varying treatment effects and the crisis of unreplicable research. Journal of Management 41(2):632643.
Gelman, Andrew, Carlin, John, Stern, Hal, Dunson, David, Vehtari, Aki, and Rubin, Donald. 2013. Bayesian data analysis . 3rd edn. London: Chapman and Hall/CRC.
Gerber, Alan, and Malhotra, Neil. 2008. Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quaterly Journal of Political Science 3(3):313326.
Gerber, Alan S., Green, Donald P., and Nickerson, David. 2001. Testing for Publication Bias in Political Science. Political Analysis 9(4):385392.
Green, Don, Humphreys, Macartan, and Smith, Jenny. 2013. Read it, understand it, believe it, use it: Principles and proposals for a more credible research publication. Columbia University. mimeo.
Grimmer, Justin. 2015. We are all social scientists now: How big data, machine learning, and causal inference work together. PS: Political Science & Politics 48(1):8083.
Hainmueller, Jens, and Hazlett, Chad. 2013. Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Political Analysis 22(2):143168.
Hartman, Erin, and Hidalgo, F. Daniel. 2015. What’s the alternative?: An equivalence approach to balance and placebo tests. UCLA. mimeo.
Humphreys, Macartan, Sanchez de la Sierra, Raul, and van der Windt, Peter. 2013. Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Analysis 21(1):120.
Ioannidis, John. 2005. Why most published research findings are false. PLOS Medicine 2(8):e124.
Laitin, David D. 2013. Fisheries management. Political Analysis 21:4247.
Leamer, Edward. 1974. False models and post-data model construction. Journal of the American Statistical Association 69(345):122131.
Leamer, Edward. 1978. Specification searches. Ad hocinference with nonexperimental data . New York, NY: Wiley.
Leamer, Edward. 1983. Let’s take the Con out of econometrics. American Economic Review 73(1):3143.
Lin, Winston, and Green, Donald P.. 2016. Standard operating procedures: A safety net for pre-analysis plans. PS: Political Science & Politics 49(3):495500.
Lovell, M. 1983. Data mining. Review of Economic and Statistics 65(1):112.
Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K. M., Gerber, A., Glennerster, R., Green, D. P., Humphreys, M., Imbens, G., Laitin, D., Madon, T., Nelson, L., Nosek, B. A., Petersen, M., Sedlmayr, R., Simmons, J. P., Simonsohn, U., and Van der Laan, M.. 2014. Promoting transparency in social science research. Science 343(6166):3031.
Monogan, James E. 2015. Research preregistration in political science: The case, counterarguments, and a response to critiques. PS: Political Science & Politics 48(3):425429.
Nyhan, Brendan. 2015. Increasing the credibility of political science research: A proposal for journal reforms. PS: Political Science & Politics 48(S1):7883.
Olken, Benjamin. 2015. Pre-analysis plans in economics. Journal of Economic Perspectives 29(3):6180.
Pepinsky, Tom. 2013. The perilous peer review process. peer-review-process/.
Rauchhaus, Robert. 2009. Evaluating the nuclear peace hypothesis a quantitative approach. Journal of Conflict Resolution 53(2):258277.
Sankoh, A. J., Huque, M. F., and Dubey, S. D.. 1997. Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Statistics in Medicine 16(22):25292542.
MathJax is a JavaScript display engine for mathematics. For more information see

Related content

Powered by UNSILO
Type Description Title
Supplementary materials

Fafchamps and Labonne supplementary material
Fafchamps and Labonne supplementary material 1

 Unknown (168 KB)
168 KB

Using Split Samples to Improve Inference on Causal Effects

  • Marcel Fafchamps (a1) and Julien Labonne (a2)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.