Hostname: page-component-5db6c4db9b-bhjbq Total loading time: 0 Render date: 2023-03-23T08:47:28.688Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Enhancing Validity in Observational Settings When Replication is Not Possible*

Published online by Cambridge University Press:  05 April 2017


We argue that political sciexntists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher’s control, the reproduction of statistical analyses is possible but replication of the data-generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate—regularization—provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data-generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study’s conclusions.

Research Notes
© The European Political Science Association 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)



Christopher J. Fariss, Assistant Professor, Department of Political Science and Faculty Associate, Center for Political Studies, Institute for Social Research, University of Michigan, Center for Political Studies (CPS) Institute for Social Research, 4200 Bay, University of Michigan, Ann Arbor, Michigan 48106-1248 USA ( Zachary M. Jones, Ph.D. Candidate, Pennsylvania State University; Pond Laboratory, Pennsylvania State University, State College, PA 16801 ( The authors would like to thank Michael Alvarez, Neil Beck, Bernd Bischl, Charles Crabtree, Allan Dafoe, Cassy Dorff, Dan Enemark, Matt Golder, Sophia Hatz, Danny Hill, Luke Keele, Lars Kotthoff, Fridolin Linder, Mark Major, Michael Nelson, Keith Schnakenberg, and Tara Slough for many helpful comments and suggestions. This research was supported in part by The McCourtney Institute for Democracy Innovation Grant, and the College of Liberal Arts, both at Pennsylvania State University.


Adcock, Robert, and Collier, David. 2001. ‘Measurement Validity: A Shared Standard for Qualitative and Quantitative Research’. American Political Science Review 95(3):529546.CrossRefGoogle Scholar
Arlot, Sylvain, and Celisse, Alain. 2010. ‘A Survey of Cross-Validation Procedures for Model Selection’. Statistics Surveys 4:4079.CrossRefGoogle Scholar
Athey, Susan, and Imbens, Guido. 2015. ‘Machine Learning Methods for Estimating Heterogeneous Causal Effects’. ArXiv Preprint ArXiv:1504.01132.Google Scholar
Bailey, Michael A. 2007. ‘Comparable Preference Estimates Across Time and Institutions for the Court, Congress, and Presidency’. American Journal of Political Science 51(3):433448.CrossRefGoogle Scholar
Bareinboim, Elias, and Pearl, Judea. 2012. ‘Transportability of Causal Effects: Completeness Results’, vol. R-390. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Sheraton Centre Toronto, Toronto, Ontario, July 22–26, 2012.Google Scholar
Beck, Nathaniel, King, Gary, and Zeng, Langche. 2000. ‘Improving Quantitative Studies of International Conflict: A Conjecture’. American Political Science Review 94(1):2135.CrossRefGoogle Scholar
Beck, Nathaniel, and Jackman, Simon. 1998. ‘Beyond Linearity by Default: Generalized Additive Models’. American Journal of Political Science 42(2), 596627.CrossRefGoogle Scholar
Beger, Andreas, Dorff, Cassy L., and Ward, Michael D.. 2014. ‘Ensemble Forecasting of Irregular Leadership Change’. Research & Politics 1(3): Scholar
Bengio, Yoshua. 2000. ‘Gradient-Based Optimization of Hyperparameters’. Neural Computation 12(8):18891900.CrossRefGoogle ScholarPubMed
Bergstra, James, and Bengio, Yoshua. 2012. ‘Random Search for Hyper-Parameter Optimization’. The Journal of Machine Learning Research 13(1):281305.Google Scholar
Berk, Richard A. 2004. Regression Analysis: A Constructive Critique, vol. 11. Thousand oaks, CA: Sage.CrossRefGoogle Scholar
Bischl, Bernd, Mersmann, Olaf, Trautmann, Heike, and Weihs, Claus. 2012. ‘Resampling Methods for Meta-Model Validation With Recommendations for Evolutionary Computation’. Evolutionary Computation 20(2):249275.Google ScholarPubMed
Brady, Henry E. 1986. ‘The Perils of Survey Research: Inter-Personally Incomparable Responses’. Political Methodology 11:269291.Google Scholar
Breiman, Leo. 1996. ‘Stacked Regressions’. Machine Learning 24(1):4964.CrossRefGoogle Scholar
Chenoweth, Erica, and Ulfelder, Jay. 2015. ‘Can Structural Conditions Explain the Onset of Nonviolent Uprisings?’. Journal of Conflict Resolution 61(2), 2017.Google Scholar
Dafoe, Allan. 2014. ‘Science Deserves Better: The Imperative to Share Complete Replication Files’. PS: Political Science & Politics 47(1):6066.Google Scholar
Douglass, Rex W. 2015. ‘Understanding Civil War Violence Through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War’. ArXiv Preprint arXiv:1506.05413v1.Google Scholar
Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Efron, Bradley. 1982. The Jackknife, the Bootstrap and Other Resampling Plans, vol. 38. Philadelphia, PA: SIAM.CrossRefGoogle Scholar
Efron, Bradley, and Tibshirani, Robert J.. 1994. An Introduction to the Bootstrap. Boca Raton, FL: CRC press.Google Scholar
Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert, and Stefan Wager. 2004. ‘Least Angle Regression’. The Annals of Statistics 32(2):407499.Google Scholar
Elkins, Zachary, and Sides, John. 2014. ‘The Vodka is Potent, but the Meat is Rotten1: Evaluating Measurement Equivalence Across Contexts’. Working Paper.Google Scholar
Fariss, Christopher J. 2014. ‘Respect for Human Rights Has Improved Over Time: Modeling the Changing Standard of Accountability in Human Rights Documents’. American Political Science Review 108(2):297318.CrossRefGoogle Scholar
Fariss, Christopher J. Forthcoming. ‘Human Rights Treaty Compliance and the Changing Standard of Accountability’. British Journal of Political Science. Scholar
Friedman, Jerome H. 2001. ‘Greedy Function Approximation: A Gradient Boosting Machine’. Annals of Statistics 29(5):11891232.CrossRefGoogle Scholar
Gartzke, Erik. 1999. ‘War is in the Error Term’. International Organization 53(3):567587.CrossRefGoogle Scholar
Gelman, Andrew. 2003. ‘A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing’. International Statistical Review 71(2):369382.CrossRefGoogle Scholar
Gelman, Andrew. 2004. ‘Exploratory Data Analysis for Complex Models’. Journal of Computational and Graphical Statistics 13(4):755–779.CrossRefGoogle Scholar
Gelman, Andrew, and Shalizi, Cosma Rohilla. 2012. ‘Philosophy and the Practice of Bayesian Statistics’. British Journal of Mathematical and Statistical Psychology 66(1):838.CrossRefGoogle ScholarPubMed
Givens, Geof H., and Hoeting, Jennifer A.. 2012. Computational Statistics, vol. 708. Hoboken, NJ: John Wiley & Sons.CrossRefGoogle Scholar
Graham, Benjamin A. T., Gartzke, Erik A., and Fariss, Christopher J.. 2015. ‘Regime Type, Coalition Size, and Victory’. Political Science Research and Methods, doi: CrossRefGoogle Scholar
Hainmueller, Jens, and Hazlett, Chad. 2014. ‘Kernel Regularized Least Squares: Reducing Misspecification Bias With a Flexible and Interpretable Machine Learning Approach’. Political Analysis 22:143168.CrossRefGoogle Scholar
Handcock, Mark S., Raftery, Adrian E., and Tantrum, Jeremy M.. 2007. ‘Model-Based Clustering for Social Networks’. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2):301354.CrossRefGoogle Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition New York, NY: Springer.CrossRefGoogle Scholar
Hastie, Trevor J., and Tibshirani, Robert J.. 1990. Generalized Additive Models, vol. 43 Boca Raton, FL: CRC Press.Google Scholar
Herrnson, Paul S. 1995. ‘Replication, Verification, Secondary Analysis, and Data Collection in Political Science’. PS: Political Science & Politics 28(3):452455.Google Scholar
Hill, Daniel W. Jr., and Jones, Zachary M.. 2014. ‘An Empirical Evaluation of Explanations for State Repression’. American Political Science Reivew 108(3):661687.CrossRefGoogle Scholar
Hoff, Peter D. 2005. ‘Bilinear Mixed-Effects Models for Dyadic Data’. Journal of the American Statistical Association 100(469):286295.CrossRefGoogle Scholar
Hoff, P. D. 2009. ‘Multiplicative Latent Factor Models for Description and Prediction of Social Networks’. Computational & Mathematical Organization Theory 15(4):261272.CrossRefGoogle Scholar
Hoff, Peter D., Raftery, Adrian E., and Handcock, Mark S.. 2002. ‘Latent Space Approaches to Social Network Analysis’. Journal of the American Statistical Association 97(460):10901098.CrossRefGoogle Scholar
Hothorn, Torsten, Hornik, Kurt, and Zeileis, Achim. 2006. ‘Unbiased Recursive Partitioning: A Conditional Inference Framework’. Journal of Computational and Graphical Statistics 15(3):651674.CrossRefGoogle Scholar
Hothorn, Torsten, Bühlmann, Peter, Kneib, Thomas, Schmid, Matthias, and Hofner, Benjamin. 2010. ‘Model-Based Boosting 2.0’. The Journal of Machine Learning Research 11:21092113.Google Scholar
Hothorn, Torsten, Buehlmann, Peter, Kneib, Thomas, Schmid, Matthias, and Hofner, Benjamin. 2014. ‘Model-Based Boosting’.Google Scholar
Jones, Zachary M. 2013. ‘Git/Github, Transparency, and Legitimacy in Quantitative Research’. The Political Methodologist 21(1):67.Google Scholar
Jones, Zachary M., and Linder, Fridolin. 2016. ‘edarf: Exploratory Data Analysis using Random Forests’. The Journal of Open Source Software. Scholar
Keele, Luke. 2015. ‘The Statistics of Causal Inference: A View from Political Methodology’. Political Analysis 23:313335.CrossRefGoogle Scholar
Keele, Luke John. 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John Wiley & Sons.Google Scholar
Keele, Luke, and Titiunik, Rocí­o. 2015. ‘Natural Experiments Based on Geography’. Political Science Research and Methods 4(1):6595.CrossRefGoogle Scholar
Kenkel, Brenton, and Signorino, Curtis S.. 2013. ‘Bootstrapped Basis Regression With Variable Selection: A New Method for Flexible Functional Form Estimation’. Manuscript, University of Rochester, Rochester, NY.Google Scholar
King, Gary. 1995. ‘Replication, Replication’. PS: Political Science and Politics XXVIII:494499.Google Scholar
King, Gary. 2006. ‘Publication, Publication’. PS: Political Science and Politics XXXIX(1):119125.Google Scholar
King, Gary, Murray, Christopher J. L., Solomon, Joshua A., and Tandon, Ajay. 2004. ‘Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research’. American Political Science Review 98(1):191207.CrossRefGoogle Scholar
Lahiri, Soumendra Nath. 2003. Resampling Methods for Dependent Data. New York, NY: Springer.CrossRefGoogle Scholar
Lake, David A. 2013. ‘Theory is Dead, Long Live Theory: The End of the Great Debates and the Rise of Eclecticism in International Relations’. European Journal of International Relations 19(3):567587.CrossRefGoogle Scholar
LeBlanc, Michael, and Tibshirani, Robert. 1996. ‘Combining Estimates in Regression and Classification’. Journal of the American Statistical Association 91(436):16411650.Google Scholar
McDonald, Daniel J., Shalizi, Cosma Rohilla, and Schervish, Mark. 2012. ‘Time Series Forecasting: Model Evaluation and Selection Using Nonparametric Risk Bounds’. ArXiv Preprint arXiv:1212.0463. Google Scholar
Mentch, Lucas, and Hooker, Giles. 2014. ‘Ensemble Trees and Clts: Statistical Inference for Supervised Learning’. ArXiv Preprint ArXiv:1404.6473.Google Scholar
Mingers, John. 1989. ‘An Empirical Comparison of Pruning Methods for Decision Tree Induction’. Machine Learning 4(2):227243.CrossRefGoogle Scholar
Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M.. 2008. ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’. Political Analysis 16(4):372403.CrossRefGoogle Scholar
Park, Trevor, and Casella, George. 2008. ‘The Bayesian Lasso’. Journal of the American Statistical Association 103(482):681686.CrossRefGoogle Scholar
Quinn, Kevin M., Monroe, Burt L., Colaresi, Michael, Crespin, Michael H., and Radev, Dragomir R.. 2010. ‘How to Analyze Political Attention With Minimal Assumptions and Costs’. American Journal of Political Science 54(1):209228.CrossRefGoogle Scholar
Schapire, Robert E., and Freund, Yoav. 2012. Boosting: Foundations and Algorithms. Cambridge, MA: MIT Press.Google Scholar
Schnakenberg, Keith E., and Fariss, Christopher J.. 2014. ‘Dynamic Patterns of Human Rights Practices’. Political Science Research and Methods 2(1):131.CrossRefGoogle Scholar
Sexton, Joseph, and Laake, Petter. 2009. ‘Standard Errors for Bagged and Random Forest Estimators’. Computational Statistics & Data Analysis 53(3):801811.CrossRefGoogle Scholar
Shadish, William R., Cook, Thomas D., and Campbell, Donald T.. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Bellmont, CA: Wadsworth Publishing.Google Scholar
Shadish, William R. 2010. ‘Campbell and Rubin: A Primer and Comparison of Their Approaches to Causal Inference in Field Setting’. Psychological Methods 12(1):317.CrossRefGoogle Scholar
Shmueli, Galit. 2010. ‘To Explain or to Predict?’. Statistical Science 25(3):289310.CrossRefGoogle Scholar
Tibshirani, Robert. 1996. ‘Regression Shrinkage and Selection Via the Lasso’. Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267288.Google Scholar
Vapnik, Vladimir Naumovich. 1998. Statistical Learning Theory 2nd ed. New York, NY: Wiley.Google Scholar
Wager, Stefan, and Athey, Susan. 2015. ‘Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests’. ArXiv Preprint ArXiv:1510.04342.Google Scholar
Wager, Stefan, Hastie, Trevor, and Efron, Bradley. 2014. ‘Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife’. The Journal of Machine Learning Research 15(1):16251651.Google ScholarPubMed
Ward, Michael D., Greenhill, Brian D., and Bakke, Kristin M.. 2010. ‘The Perils of Policy by P-Value: Predicting Civil Conflicts’. Journal of Peace Research 47(4):363375.CrossRefGoogle Scholar
Western, Bruce. 1998. ‘Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modeling Approach’. American Journal of Political Science 42(4):12331259.CrossRefGoogle Scholar
Wilcox, Clyde, Sigleman, Lee, and Cook, Elizabeth. 1989. ‘Some Like it Hot: Individual Differences in Responses to Group Feeling Thermometers’. Public Opinion Quarterly 53(2):246257.CrossRefGoogle Scholar
Wood, Simon, and Wood, Maintainer Simon. 2015. ‘Package “Mgcv”’. R Package Version, 1–7.Google Scholar
Zou, Hui, and Hastie, Trevor. 2005. ‘Regularization and Variable Selection Via the Elastic Net’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301320.CrossRefGoogle Scholar
Supplementary material: Link

Fariss and Jones Dataset