Skip to main content
    • Aa
    • Aa

Enhancing Validity in Observational Settings When Replication is Not Possible*


We argue that political sciexntists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher’s control, the reproduction of statistical analyses is possible but replication of the data-generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate—regularization—provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data-generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study’s conclusions.

Hide All

Christopher J. Fariss, Assistant Professor, Department of Political Science and Faculty Associate, Center for Political Studies, Institute for Social Research, University of Michigan, Center for Political Studies (CPS) Institute for Social Research, 4200 Bay, University of Michigan, Ann Arbor, Michigan 48106-1248 USA ( Zachary M. Jones, Ph.D. Candidate, Pennsylvania State University; Pond Laboratory, Pennsylvania State University, State College, PA 16801 ( The authors would like to thank Michael Alvarez, Neil Beck, Bernd Bischl, Charles Crabtree, Allan Dafoe, Cassy Dorff, Dan Enemark, Matt Golder, Sophia Hatz, Danny Hill, Luke Keele, Lars Kotthoff, Fridolin Linder, Mark Major, Michael Nelson, Keith Schnakenberg, and Tara Slough for many helpful comments and suggestions. This research was supported in part by The McCourtney Institute for Democracy Innovation Grant, and the College of Liberal Arts, both at Pennsylvania State University.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

Robert Adcock , and David Collier . 2001. ‘Measurement Validity: A Shared Standard for Qualitative and Quantitative Research’. American Political Science Review 95(3):529546.

Sylvain Arlot , and Alain Celisse . 2010. ‘A Survey of Cross-Validation Procedures for Model Selection’. Statistics Surveys 4:4079.

Michael A Bailey . 2007. ‘Comparable Preference Estimates Across Time and Institutions for the Court, Congress, and Presidency’. American Journal of Political Science 51(3):433448.

Nathaniel Beck , Gary King , and Langche Zeng . 2000. ‘Improving Quantitative Studies of International Conflict: A Conjecture’. American Political Science Review 94(1):2135.

Nathaniel Beck , and Simon Jackman . 1998. ‘Beyond Linearity by Default: Generalized Additive Models’. American Journal of Political Science 42(2), 596627.

Andreas Beger , Cassy L. Dorff , and Michael D. Ward . 2014. ‘Ensemble Forecasting of Irregular Leadership Change’. Research & Politics 1(3):

Yoshua Bengio . 2000. ‘Gradient-Based Optimization of Hyperparameters’. Neural Computation 12(8):18891900.

Bernd Bischl , Olaf Mersmann , Heike Trautmann , and Claus Weihs . 2012. ‘Resampling Methods for Meta-Model Validation With Recommendations for Evolutionary Computation’. Evolutionary Computation 20(2):249275.

Leo Breiman . 1996. ‘Stacked Regressions’. Machine Learning 24(1):4964.

Thad Dunning . 2012. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge: Cambridge University Press.

Bradley Efron . 1982. The Jackknife, the Bootstrap and Other Resampling Plans, vol. 38. Philadelphia, PA: SIAM.

Christopher J Fariss . 2014. ‘Respect for Human Rights Has Improved Over Time: Modeling the Changing Standard of Accountability in Human Rights Documents’. American Political Science Review 108(2):297318.

Jerome H. Friedman 2001. ‘Greedy Function Approximation: A Gradient Boosting Machine’. Annals of Statistics 29(5):11891232.

Erik Gartzke . 1999. ‘War is in the Error Term’. International Organization 53(3):567587.

Andrew Gelman . 2003. ‘A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing’. International Statistical Review 71(2):369382.

Andrew Gelman . 2004. ‘Exploratory Data Analysis for Complex Models’. Journal of Computational and Graphical Statistics 13(4):755–779.

Andrew Gelman , and Cosma Rohilla Shalizi . 2012. ‘Philosophy and the Practice of Bayesian Statistics’. British Journal of Mathematical and Statistical Psychology 66(1):838.

Geof H. Givens , and Jennifer A. Hoeting . 2012. Computational Statistics, vol. 708. Hoboken, NJ: John Wiley & Sons.

Mark S. Handcock , Adrian E. Raftery , and Jeremy M. Tantrum . 2007. ‘Model-Based Clustering for Social Networks’. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2):301354.

Trevor Hastie , Robert Tibshirani , and Jerome Friedman . 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition New York, NY: Springer.

Daniel W. Hill Jr., and Zachary M. Jones . 2014. ‘An Empirical Evaluation of Explanations for State Repression’. American Political Science Reivew 108(3):661687.

Peter D Hoff . 2005. ‘Bilinear Mixed-Effects Models for Dyadic Data’. Journal of the American Statistical Association 100(469):286295.

P. D Hoff . 2009. ‘Multiplicative Latent Factor Models for Description and Prediction of Social Networks’. Computational & Mathematical Organization Theory 15(4):261272.

Torsten Hothorn , Kurt Hornik , and Achim Zeileis . 2006. ‘Unbiased Recursive Partitioning: A Conditional Inference Framework’. Journal of Computational and Graphical Statistics 15(3):651674.

Luke John Keele . 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John Wiley & Sons.

Gary King , Christopher J. L. Murray , Joshua A. Solomon , and Ajay Tandon . 2004. ‘Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research’. American Political Science Review 98(1):191207.

Soumendra Nath Lahiri . 2003. Resampling Methods for Dependent Data. New York, NY: Springer.

David A Lake . 2013. ‘Theory is Dead, Long Live Theory: The End of the Great Debates and the Rise of Eclecticism in International Relations’. European Journal of International Relations 19(3):567587.

Michael LeBlanc , and Robert Tibshirani . 1996. ‘Combining Estimates in Regression and Classification’. Journal of the American Statistical Association 91(436):16411650.

Burt L. Monroe , Michael P. Colaresi , and Kevin M. Quinn . 2008. ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’. Political Analysis 16(4):372403.

Trevor Park , and George Casella . 2008. ‘The Bayesian Lasso’. Journal of the American Statistical Association 103(482):681686.

Kevin M. Quinn , Burt L. Monroe , Michael Colaresi , Michael H. Crespin , and Dragomir R. Radev . 2010. ‘How to Analyze Political Attention With Minimal Assumptions and Costs’. American Journal of Political Science 54(1):209228.

Keith E. Schnakenberg , and Christopher J. Fariss . 2014. ‘Dynamic Patterns of Human Rights Practices’. Political Science Research and Methods 2(1):131.

Joseph Sexton , and Petter Laake . 2009. ‘Standard Errors for Bagged and Random Forest Estimators’. Computational Statistics & Data Analysis 53(3):801811.

Galit Shmueli . 2010. ‘To Explain or to Predict?’. Statistical Science 25(3):289310.

Michael D. Ward , Brian D. Greenhill , and Kristin M. Bakke . 2010. ‘The Perils of Policy by P-Value: Predicting Civil Conflicts’. Journal of Peace Research 47(4):363375.

Bruce Western . 1998. ‘Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modeling Approach’. American Journal of Political Science 42(4):12331259.

Clyde Wilcox , Lee Sigleman , and Elizabeth Cook . 1989. ‘Some Like it Hot: Individual Differences in Responses to Group Feeling Thermometers’. Public Opinion Quarterly 53(2):246257.

Hui Zou , and Trevor Hastie . 2005. ‘Regularization and Variable Selection Via the Elastic Net’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301320.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Science Research and Methods
  • ISSN: 2049-8470
  • EISSN: 2049-8489
  • URL: /core/journals/political-science-research-and-methods
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary Materials

Fariss and Jones Dataset



Altmetric attention score

Full text views

Total number of HTML views: 2
Total number of PDF views: 19 *
Loading metrics...

Abstract views

Total abstract views: 199 *
Loading metrics...

* Views captured on Cambridge Core between 5th April 2017 - 20th July 2017. This data will be updated every 24 hours.