Skip to main content Accessibility help

Enhancing Validity in Observational Settings When Replication is Not Possible*

  • Christopher J. Fariss and Zachary M. Jones


We argue that political sciexntists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher’s control, the reproduction of statistical analyses is possible but replication of the data-generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate—regularization—provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data-generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study’s conclusions.



Hide All

Christopher J. Fariss, Assistant Professor, Department of Political Science and Faculty Associate, Center for Political Studies, Institute for Social Research, University of Michigan, Center for Political Studies (CPS) Institute for Social Research, 4200 Bay, University of Michigan, Ann Arbor, Michigan 48106-1248 USA ( Zachary M. Jones, Ph.D. Candidate, Pennsylvania State University; Pond Laboratory, Pennsylvania State University, State College, PA 16801 ( The authors would like to thank Michael Alvarez, Neil Beck, Bernd Bischl, Charles Crabtree, Allan Dafoe, Cassy Dorff, Dan Enemark, Matt Golder, Sophia Hatz, Danny Hill, Luke Keele, Lars Kotthoff, Fridolin Linder, Mark Major, Michael Nelson, Keith Schnakenberg, and Tara Slough for many helpful comments and suggestions. This research was supported in part by The McCourtney Institute for Democracy Innovation Grant, and the College of Liberal Arts, both at Pennsylvania State University.



Hide All
Adcock, Robert, and Collier, David. 2001. ‘Measurement Validity: A Shared Standard for Qualitative and Quantitative Research’. American Political Science Review 95(3):529546.
Arlot, Sylvain, and Celisse, Alain. 2010. ‘A Survey of Cross-Validation Procedures for Model Selection’. Statistics Surveys 4:4079.
Athey, Susan, and Imbens, Guido. 2015. ‘Machine Learning Methods for Estimating Heterogeneous Causal Effects’. ArXiv Preprint ArXiv:1504.01132.
Bailey, Michael A. 2007. ‘Comparable Preference Estimates Across Time and Institutions for the Court, Congress, and Presidency’. American Journal of Political Science 51(3):433448.
Bareinboim, Elias, and Pearl, Judea. 2012. ‘Transportability of Causal Effects: Completeness Results’, vol. R-390. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Sheraton Centre Toronto, Toronto, Ontario, July 22–26, 2012.
Beck, Nathaniel, King, Gary, and Zeng, Langche. 2000. ‘Improving Quantitative Studies of International Conflict: A Conjecture’. American Political Science Review 94(1):2135.
Beck, Nathaniel, and Jackman, Simon. 1998. ‘Beyond Linearity by Default: Generalized Additive Models’. American Journal of Political Science 42(2), 596627.
Beger, Andreas, Dorff, Cassy L., and Ward, Michael D.. 2014. ‘Ensemble Forecasting of Irregular Leadership Change’. Research & Politics 1(3):
Bengio, Yoshua. 2000. ‘Gradient-Based Optimization of Hyperparameters’. Neural Computation 12(8):18891900.
Bergstra, James, and Bengio, Yoshua. 2012. ‘Random Search for Hyper-Parameter Optimization’. The Journal of Machine Learning Research 13(1):281305.
Berk, Richard A. 2004. Regression Analysis: A Constructive Critique, vol. 11. Thousand oaks, CA: Sage.
Bischl, Bernd, Mersmann, Olaf, Trautmann, Heike, and Weihs, Claus. 2012. ‘Resampling Methods for Meta-Model Validation With Recommendations for Evolutionary Computation’. Evolutionary Computation 20(2):249275.
Brady, Henry E. 1986. ‘The Perils of Survey Research: Inter-Personally Incomparable Responses’. Political Methodology 11:269291.
Breiman, Leo. 1996. ‘Stacked Regressions’. Machine Learning 24(1):4964.
Chenoweth, Erica, and Ulfelder, Jay. 2015. ‘Can Structural Conditions Explain the Onset of Nonviolent Uprisings?’. Journal of Conflict Resolution 61(2), 2017.
Dafoe, Allan. 2014. ‘Science Deserves Better: The Imperative to Share Complete Replication Files’. PS: Political Science & Politics 47(1):6066.
Douglass, Rex W. 2015. ‘Understanding Civil War Violence Through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War’. ArXiv Preprint arXiv:1506.05413v1.
Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge: Cambridge University Press.
Efron, Bradley. 1982. The Jackknife, the Bootstrap and Other Resampling Plans, vol. 38. Philadelphia, PA: SIAM.
Efron, Bradley, and Tibshirani, Robert J.. 1994. An Introduction to the Bootstrap. Boca Raton, FL: CRC press.
Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert, and Stefan Wager. 2004. ‘Least Angle Regression’. The Annals of Statistics 32(2):407499.
Elkins, Zachary, and Sides, John. 2014. ‘The Vodka is Potent, but the Meat is Rotten1: Evaluating Measurement Equivalence Across Contexts’. Working Paper.
Fariss, Christopher J. 2014. ‘Respect for Human Rights Has Improved Over Time: Modeling the Changing Standard of Accountability in Human Rights Documents’. American Political Science Review 108(2):297318.
Fariss, Christopher J. Forthcoming. ‘Human Rights Treaty Compliance and the Changing Standard of Accountability’. British Journal of Political Science.
Friedman, Jerome H. 2001. ‘Greedy Function Approximation: A Gradient Boosting Machine’. Annals of Statistics 29(5):11891232.
Gartzke, Erik. 1999. ‘War is in the Error Term’. International Organization 53(3):567587.
Gelman, Andrew. 2003. ‘A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing’. International Statistical Review 71(2):369382.
Gelman, Andrew. 2004. ‘Exploratory Data Analysis for Complex Models’. Journal of Computational and Graphical Statistics 13(4):755–779.
Gelman, Andrew, and Shalizi, Cosma Rohilla. 2012. ‘Philosophy and the Practice of Bayesian Statistics’. British Journal of Mathematical and Statistical Psychology 66(1):838.
Givens, Geof H., and Hoeting, Jennifer A.. 2012. Computational Statistics, vol. 708. Hoboken, NJ: John Wiley & Sons.
Graham, Benjamin A. T., Gartzke, Erik A., and Fariss, Christopher J.. 2015. ‘Regime Type, Coalition Size, and Victory’. Political Science Research and Methods, doi:
Hainmueller, Jens, and Hazlett, Chad. 2014. ‘Kernel Regularized Least Squares: Reducing Misspecification Bias With a Flexible and Interpretable Machine Learning Approach’. Political Analysis 22:143168.
Handcock, Mark S., Raftery, Adrian E., and Tantrum, Jeremy M.. 2007. ‘Model-Based Clustering for Social Networks’. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2):301354.
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition New York, NY: Springer.
Hastie, Trevor J., and Tibshirani, Robert J.. 1990. Generalized Additive Models, vol. 43 Boca Raton, FL: CRC Press.
Herrnson, Paul S. 1995. ‘Replication, Verification, Secondary Analysis, and Data Collection in Political Science’. PS: Political Science & Politics 28(3):452455.
Hill, Daniel W. Jr., and Jones, Zachary M.. 2014. ‘An Empirical Evaluation of Explanations for State Repression’. American Political Science Reivew 108(3):661687.
Hoff, Peter D. 2005. ‘Bilinear Mixed-Effects Models for Dyadic Data’. Journal of the American Statistical Association 100(469):286295.
Hoff, P. D. 2009. ‘Multiplicative Latent Factor Models for Description and Prediction of Social Networks’. Computational & Mathematical Organization Theory 15(4):261272.
Hoff, Peter D., Raftery, Adrian E., and Handcock, Mark S.. 2002. ‘Latent Space Approaches to Social Network Analysis’. Journal of the American Statistical Association 97(460):10901098.
Hothorn, Torsten, Hornik, Kurt, and Zeileis, Achim. 2006. ‘Unbiased Recursive Partitioning: A Conditional Inference Framework’. Journal of Computational and Graphical Statistics 15(3):651674.
Hothorn, Torsten, Bühlmann, Peter, Kneib, Thomas, Schmid, Matthias, and Hofner, Benjamin. 2010. ‘Model-Based Boosting 2.0’. The Journal of Machine Learning Research 11:21092113.
Hothorn, Torsten, Buehlmann, Peter, Kneib, Thomas, Schmid, Matthias, and Hofner, Benjamin. 2014. ‘Model-Based Boosting’.
Jones, Zachary M. 2013. ‘Git/Github, Transparency, and Legitimacy in Quantitative Research’. The Political Methodologist 21(1):67.
Jones, Zachary M., and Linder, Fridolin. 2016. ‘edarf: Exploratory Data Analysis using Random Forests’. The Journal of Open Source Software.
Keele, Luke. 2015. ‘The Statistics of Causal Inference: A View from Political Methodology’. Political Analysis 23:313335.
Keele, Luke John. 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John Wiley & Sons.
Keele, Luke, and Titiunik, Rocí­o. 2015. ‘Natural Experiments Based on Geography’. Political Science Research and Methods 4(1):6595.
Kenkel, Brenton, and Signorino, Curtis S.. 2013. ‘Bootstrapped Basis Regression With Variable Selection: A New Method for Flexible Functional Form Estimation’. Manuscript, University of Rochester, Rochester, NY.
King, Gary. 1995. ‘Replication, Replication’. PS: Political Science and Politics XXVIII:494499.
King, Gary. 2006. ‘Publication, Publication’. PS: Political Science and Politics XXXIX(1):119125.
King, Gary, Murray, Christopher J. L., Solomon, Joshua A., and Tandon, Ajay. 2004. ‘Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research’. American Political Science Review 98(1):191207.
Lahiri, Soumendra Nath. 2003. Resampling Methods for Dependent Data. New York, NY: Springer.
Lake, David A. 2013. ‘Theory is Dead, Long Live Theory: The End of the Great Debates and the Rise of Eclecticism in International Relations’. European Journal of International Relations 19(3):567587.
LeBlanc, Michael, and Tibshirani, Robert. 1996. ‘Combining Estimates in Regression and Classification’. Journal of the American Statistical Association 91(436):16411650.
McDonald, Daniel J., Shalizi, Cosma Rohilla, and Schervish, Mark. 2012. ‘Time Series Forecasting: Model Evaluation and Selection Using Nonparametric Risk Bounds’. ArXiv Preprint arXiv:1212.0463.
Mentch, Lucas, and Hooker, Giles. 2014. ‘Ensemble Trees and Clts: Statistical Inference for Supervised Learning’. ArXiv Preprint ArXiv:1404.6473.
Mingers, John. 1989. ‘An Empirical Comparison of Pruning Methods for Decision Tree Induction’. Machine Learning 4(2):227243.
Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M.. 2008. ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’. Political Analysis 16(4):372403.
Park, Trevor, and Casella, George. 2008. ‘The Bayesian Lasso’. Journal of the American Statistical Association 103(482):681686.
Quinn, Kevin M., Monroe, Burt L., Colaresi, Michael, Crespin, Michael H., and Radev, Dragomir R.. 2010. ‘How to Analyze Political Attention With Minimal Assumptions and Costs’. American Journal of Political Science 54(1):209228.
Schapire, Robert E., and Freund, Yoav. 2012. Boosting: Foundations and Algorithms. Cambridge, MA: MIT Press.
Schnakenberg, Keith E., and Fariss, Christopher J.. 2014. ‘Dynamic Patterns of Human Rights Practices’. Political Science Research and Methods 2(1):131.
Sexton, Joseph, and Laake, Petter. 2009. ‘Standard Errors for Bagged and Random Forest Estimators’. Computational Statistics & Data Analysis 53(3):801811.
Shadish, William R., Cook, Thomas D., and Campbell, Donald T.. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Bellmont, CA: Wadsworth Publishing.
Shadish, William R. 2010. ‘Campbell and Rubin: A Primer and Comparison of Their Approaches to Causal Inference in Field Setting’. Psychological Methods 12(1):317.
Shmueli, Galit. 2010. ‘To Explain or to Predict?’. Statistical Science 25(3):289310.
Tibshirani, Robert. 1996. ‘Regression Shrinkage and Selection Via the Lasso’. Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267288.
Vapnik, Vladimir Naumovich. 1998. Statistical Learning Theory 2nd ed. New York, NY: Wiley.
Wager, Stefan, and Athey, Susan. 2015. ‘Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests’. ArXiv Preprint ArXiv:1510.04342.
Wager, Stefan, Hastie, Trevor, and Efron, Bradley. 2014. ‘Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife’. The Journal of Machine Learning Research 15(1):16251651.
Ward, Michael D., Greenhill, Brian D., and Bakke, Kristin M.. 2010. ‘The Perils of Policy by P-Value: Predicting Civil Conflicts’. Journal of Peace Research 47(4):363375.
Western, Bruce. 1998. ‘Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modeling Approach’. American Journal of Political Science 42(4):12331259.
Wilcox, Clyde, Sigleman, Lee, and Cook, Elizabeth. 1989. ‘Some Like it Hot: Individual Differences in Responses to Group Feeling Thermometers’. Public Opinion Quarterly 53(2):246257.
Wood, Simon, and Wood, Maintainer Simon. 2015. ‘Package “Mgcv”’. R Package Version, 1–7.
Zou, Hui, and Hastie, Trevor. 2005. ‘Regularization and Variable Selection Via the Elastic Net’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301320.

Related content

Powered by UNSILO
Type Description Title
Supplementary materials

Fariss and Jones Dataset


Enhancing Validity in Observational Settings When Replication is Not Possible*

  • Christopher J. Fariss and Zachary M. Jones


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.