Skip to main content

Enhancing Validity in Observational Settings When Replication is Not Possible*


We argue that political sciexntists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher’s control, the reproduction of statistical analyses is possible but replication of the data-generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate—regularization—provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data-generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study’s conclusions.

Hide All

Christopher J. Fariss, Assistant Professor, Department of Political Science and Faculty Associate, Center for Political Studies, Institute for Social Research, University of Michigan, Center for Political Studies (CPS) Institute for Social Research, 4200 Bay, University of Michigan, Ann Arbor, Michigan 48106-1248 USA ( Zachary M. Jones, Ph.D. Candidate, Pennsylvania State University; Pond Laboratory, Pennsylvania State University, State College, PA 16801 ( The authors would like to thank Michael Alvarez, Neil Beck, Bernd Bischl, Charles Crabtree, Allan Dafoe, Cassy Dorff, Dan Enemark, Matt Golder, Sophia Hatz, Danny Hill, Luke Keele, Lars Kotthoff, Fridolin Linder, Mark Major, Michael Nelson, Keith Schnakenberg, and Tara Slough for many helpful comments and suggestions. This research was supported in part by The McCourtney Institute for Democracy Innovation Grant, and the College of Liberal Arts, both at Pennsylvania State University.

Hide All
Adcock Robert, and Collier David. 2001. ‘Measurement Validity: A Shared Standard for Qualitative and Quantitative Research’. American Political Science Review 95(3):529546.
Arlot Sylvain, and Celisse Alain. 2010. ‘A Survey of Cross-Validation Procedures for Model Selection’. Statistics Surveys 4:4079.
Athey Susan, and Imbens Guido. 2015. ‘Machine Learning Methods for Estimating Heterogeneous Causal Effects’. ArXiv Preprint ArXiv:1504.01132.
Bailey Michael A. 2007. ‘Comparable Preference Estimates Across Time and Institutions for the Court, Congress, and Presidency’. American Journal of Political Science 51(3):433448.
Bareinboim Elias, and Pearl Judea. 2012. ‘Transportability of Causal Effects: Completeness Results’, vol. R-390. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Sheraton Centre Toronto, Toronto, Ontario, July 22–26, 2012.
Beck Nathaniel, King Gary, and Zeng Langche. 2000. ‘Improving Quantitative Studies of International Conflict: A Conjecture’. American Political Science Review 94(1):2135.
Beck Nathaniel, and Jackman Simon. 1998. ‘Beyond Linearity by Default: Generalized Additive Models’. American Journal of Political Science 42(2), 596627.
Beger Andreas, Dorff Cassy L., and Ward Michael D.. 2014. ‘Ensemble Forecasting of Irregular Leadership Change’. Research & Politics 1(3):
Bengio Yoshua. 2000. ‘Gradient-Based Optimization of Hyperparameters’. Neural Computation 12(8):18891900.
Bergstra James, and Bengio Yoshua. 2012. ‘Random Search for Hyper-Parameter Optimization’. The Journal of Machine Learning Research 13(1):281305.
Berk Richard A. 2004. Regression Analysis: A Constructive Critique, vol. 11. Thousand oaks, CA: Sage.
Bischl Bernd, Mersmann Olaf, Trautmann Heike, and Weihs Claus. 2012. ‘Resampling Methods for Meta-Model Validation With Recommendations for Evolutionary Computation’. Evolutionary Computation 20(2):249275.
Brady Henry E. 1986. ‘The Perils of Survey Research: Inter-Personally Incomparable Responses’. Political Methodology 11:269291.
Breiman Leo. 1996. ‘Stacked Regressions’. Machine Learning 24(1):4964.
Chenoweth Erica, and Ulfelder Jay. 2015. ‘Can Structural Conditions Explain the Onset of Nonviolent Uprisings?’. Journal of Conflict Resolution 61(2), 2017.
Dafoe Allan. 2014. ‘Science Deserves Better: The Imperative to Share Complete Replication Files’. PS: Political Science & Politics 47(1):6066.
Douglass Rex W. 2015. ‘Understanding Civil War Violence Through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War’. ArXiv Preprint arXiv:1506.05413v1.
Dunning Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge: Cambridge University Press.
Efron Bradley. 1982. The Jackknife, the Bootstrap and Other Resampling Plans, vol. 38. Philadelphia, PA: SIAM.
Efron Bradley, and Tibshirani Robert J.. 1994. An Introduction to the Bootstrap. Boca Raton, FL: CRC press.
Efron Bradley, Hastie Trevor, Johnstone Iain, Tibshirani Robert, and Stefan Wager. 2004. ‘Least Angle Regression’. The Annals of Statistics 32(2):407499.
Elkins Zachary, and Sides John. 2014. ‘The Vodka is Potent, but the Meat is Rotten1: Evaluating Measurement Equivalence Across Contexts’. Working Paper.
Fariss Christopher J. 2014. ‘Respect for Human Rights Has Improved Over Time: Modeling the Changing Standard of Accountability in Human Rights Documents’. American Political Science Review 108(2):297318.
Fariss Christopher J. Forthcoming. ‘Human Rights Treaty Compliance and the Changing Standard of Accountability’. British Journal of Political Science.
Friedman Jerome H. 2001. ‘Greedy Function Approximation: A Gradient Boosting Machine’. Annals of Statistics 29(5):11891232.
Gartzke Erik. 1999. ‘War is in the Error Term’. International Organization 53(3):567587.
Gelman Andrew. 2003. ‘A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing’. International Statistical Review 71(2):369382.
Gelman Andrew. 2004. ‘Exploratory Data Analysis for Complex Models’. Journal of Computational and Graphical Statistics 13(4):755–779.
Gelman Andrew, and Shalizi Cosma Rohilla. 2012. ‘Philosophy and the Practice of Bayesian Statistics’. British Journal of Mathematical and Statistical Psychology 66(1):838.
Givens Geof H., and Hoeting Jennifer A.. 2012. Computational Statistics, vol. 708. Hoboken, NJ: John Wiley & Sons.
Graham Benjamin A. T., Gartzke Erik A., and Fariss Christopher J.. 2015. ‘Regime Type, Coalition Size, and Victory’. Political Science Research and Methods, doi:
Hainmueller Jens, and Hazlett Chad. 2014. ‘Kernel Regularized Least Squares: Reducing Misspecification Bias With a Flexible and Interpretable Machine Learning Approach’. Political Analysis 22:143168.
Handcock Mark S., Raftery Adrian E., and Tantrum Jeremy M.. 2007. ‘Model-Based Clustering for Social Networks’. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2):301354.
Hastie Trevor, Tibshirani Robert, and Friedman Jerome. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition New York, NY: Springer.
Hastie Trevor J., and Tibshirani Robert J.. 1990. Generalized Additive Models, vol. 43 Boca Raton, FL: CRC Press.
Herrnson Paul S. 1995. ‘Replication, Verification, Secondary Analysis, and Data Collection in Political Science’. PS: Political Science & Politics 28(3):452455.
Hill Daniel W. Jr., and Jones Zachary M.. 2014. ‘An Empirical Evaluation of Explanations for State Repression’. American Political Science Reivew 108(3):661687.
Hoff Peter D. 2005. ‘Bilinear Mixed-Effects Models for Dyadic Data’. Journal of the American Statistical Association 100(469):286295.
Hoff P. D. 2009. ‘Multiplicative Latent Factor Models for Description and Prediction of Social Networks’. Computational & Mathematical Organization Theory 15(4):261272.
Hoff Peter D., Raftery Adrian E., and Handcock Mark S.. 2002. ‘Latent Space Approaches to Social Network Analysis’. Journal of the American Statistical Association 97(460):10901098.
Hothorn Torsten, Hornik Kurt, and Zeileis Achim. 2006. ‘Unbiased Recursive Partitioning: A Conditional Inference Framework’. Journal of Computational and Graphical Statistics 15(3):651674.
Hothorn Torsten, Bühlmann Peter, Kneib Thomas, Schmid Matthias, and Hofner Benjamin. 2010. ‘Model-Based Boosting 2.0’. The Journal of Machine Learning Research 11:21092113.
Hothorn Torsten, Buehlmann Peter, Kneib Thomas, Schmid Matthias, and Hofner Benjamin. 2014. ‘Model-Based Boosting’.
Jones Zachary M. 2013. ‘Git/Github, Transparency, and Legitimacy in Quantitative Research’. The Political Methodologist 21(1):67.
Jones Zachary M., and Linder Fridolin. 2016. ‘edarf: Exploratory Data Analysis using Random Forests’. The Journal of Open Source Software.
Keele Luke. 2015. ‘The Statistics of Causal Inference: A View from Political Methodology’. Political Analysis 23:313335.
Keele Luke John. 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John Wiley & Sons.
Keele Luke, and Titiunik Rocí­o. 2015. ‘Natural Experiments Based on Geography’. Political Science Research and Methods 4(1):6595.
Kenkel Brenton, and Signorino Curtis S.. 2013. ‘Bootstrapped Basis Regression With Variable Selection: A New Method for Flexible Functional Form Estimation’. Manuscript, University of Rochester, Rochester, NY.
King Gary. 1995. ‘Replication, Replication’. PS: Political Science and Politics XXVIII:494499.
King Gary. 2006. ‘Publication, Publication’. PS: Political Science and Politics XXXIX(1):119125.
King Gary, Murray Christopher J. L., Solomon Joshua A., and Tandon Ajay. 2004. ‘Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research’. American Political Science Review 98(1):191207.
Lahiri Soumendra Nath. 2003. Resampling Methods for Dependent Data. New York, NY: Springer.
Lake David A. 2013. ‘Theory is Dead, Long Live Theory: The End of the Great Debates and the Rise of Eclecticism in International Relations’. European Journal of International Relations 19(3):567587.
LeBlanc Michael, and Tibshirani Robert. 1996. ‘Combining Estimates in Regression and Classification’. Journal of the American Statistical Association 91(436):16411650.
McDonald Daniel J., Shalizi Cosma Rohilla, and Schervish Mark. 2012. ‘Time Series Forecasting: Model Evaluation and Selection Using Nonparametric Risk Bounds’. ArXiv Preprint arXiv:1212.0463.
Mentch Lucas, and Hooker Giles. 2014. ‘Ensemble Trees and Clts: Statistical Inference for Supervised Learning’. ArXiv Preprint ArXiv:1404.6473.
Mingers John. 1989. ‘An Empirical Comparison of Pruning Methods for Decision Tree Induction’. Machine Learning 4(2):227243.
Monroe Burt L., Colaresi Michael P., and Quinn Kevin M.. 2008. ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’. Political Analysis 16(4):372403.
Park Trevor, and Casella George. 2008. ‘The Bayesian Lasso’. Journal of the American Statistical Association 103(482):681686.
Quinn Kevin M., Monroe Burt L., Colaresi Michael, Crespin Michael H., and Radev Dragomir R.. 2010. ‘How to Analyze Political Attention With Minimal Assumptions and Costs’. American Journal of Political Science 54(1):209228.
Schapire Robert E., and Freund Yoav. 2012. Boosting: Foundations and Algorithms. Cambridge, MA: MIT Press.
Schnakenberg Keith E., and Fariss Christopher J.. 2014. ‘Dynamic Patterns of Human Rights Practices’. Political Science Research and Methods 2(1):131.
Sexton Joseph, and Laake Petter. 2009. ‘Standard Errors for Bagged and Random Forest Estimators’. Computational Statistics & Data Analysis 53(3):801811.
Shadish William R., Cook Thomas D., and Campbell Donald T.. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Bellmont, CA: Wadsworth Publishing.
Shadish William R. 2010. ‘Campbell and Rubin: A Primer and Comparison of Their Approaches to Causal Inference in Field Setting’. Psychological Methods 12(1):317.
Shmueli Galit. 2010. ‘To Explain or to Predict?’. Statistical Science 25(3):289310.
Tibshirani Robert. 1996. ‘Regression Shrinkage and Selection Via the Lasso’. Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267288.
Vapnik Vladimir Naumovich. 1998. Statistical Learning Theory 2nd ed. New York, NY: Wiley.
Wager Stefan, and Athey Susan. 2015. ‘Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests’. ArXiv Preprint ArXiv:1510.04342.
Wager Stefan, Hastie Trevor, and Efron Bradley. 2014. ‘Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife’. The Journal of Machine Learning Research 15(1):16251651.
Ward Michael D., Greenhill Brian D., and Bakke Kristin M.. 2010. ‘The Perils of Policy by P-Value: Predicting Civil Conflicts’. Journal of Peace Research 47(4):363375.
Western Bruce. 1998. ‘Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modeling Approach’. American Journal of Political Science 42(4):12331259.
Wilcox Clyde, Sigleman Lee, and Cook Elizabeth. 1989. ‘Some Like it Hot: Individual Differences in Responses to Group Feeling Thermometers’. Public Opinion Quarterly 53(2):246257.
Wood Simon, and Wood Maintainer Simon. 2015. ‘Package “Mgcv”’. R Package Version, 1–7.
Zou Hui, and Hastie Trevor. 2005. ‘Regularization and Variable Selection Via the Elastic Net’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301320.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Science Research and Methods
  • ISSN: 2049-8470
  • EISSN: 2049-8489
  • URL: /core/journals/political-science-research-and-methods
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary Materials

Fariss and Jones Dataset



Altmetric attention score

Full text views

Total number of HTML views: 3
Total number of PDF views: 34 *
Loading metrics...

Abstract views

Total abstract views: 272 *
Loading metrics...

* Views captured on Cambridge Core between 5th April 2017 - 25th November 2017. This data will be updated every 24 hours.