Enhancing Validity in Observational Settings When Replication is Not Possible*

Christopher J. Fariss; Zachary M. Jones

doi:10.1017/psrm.2017.5

Enhancing Validity in Observational Settings When Replication is Not Possible*

Published online by Cambridge University Press: 05 April 2017

Christopher J. Fariss and

Zachary M. Jones

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We argue that political sciexntists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher’s control, the reproduction of statistical analyses is possible but replication of the data-generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate—regularization—provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data-generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study’s conclusions.

Information

Type: Research Notes
Information: Political Science Research and Methods , Volume 6 , Issue 2 , April 2018 , pp. 365 - 380

DOI: https://doi.org/10.1017/psrm.2017.5 [Opens in a new window]
Copyright: © The European Political Science Association 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Christopher J. Fariss, Assistant Professor, Department of Political Science and Faculty Associate, Center for Political Studies, Institute for Social Research, University of Michigan, Center for Political Studies (CPS) Institute for Social Research, 4200 Bay, University of Michigan, Ann Arbor, Michigan 48106-1248 USA (cjf0006@gmail.com). Zachary M. Jones, Ph.D. Candidate, Pennsylvania State University; Pond Laboratory, Pennsylvania State University, State College, PA 16801 (zmj@zmjones.com). The authors would like to thank Michael Alvarez, Neil Beck, Bernd Bischl, Charles Crabtree, Allan Dafoe, Cassy Dorff, Dan Enemark, Matt Golder, Sophia Hatz, Danny Hill, Luke Keele, Lars Kotthoff, Fridolin Linder, Mark Major, Michael Nelson, Keith Schnakenberg, and Tara Slough for many helpful comments and suggestions. This research was supported in part by The McCourtney Institute for Democracy Innovation Grant, and the College of Liberal Arts, both at Pennsylvania State University.

References

Adcock, Robert, and Collier, David. 2001. ‘Measurement Validity: A Shared Standard for Qualitative and Quantitative Research’. American Political Science Review 95(3):529–546.CrossRef Google Scholar

Arlot, Sylvain, and Celisse, Alain. 2010. ‘A Survey of Cross-Validation Procedures for Model Selection’. Statistics Surveys 4:40–79.CrossRef Google Scholar

Athey, Susan, and Imbens, Guido. 2015. ‘Machine Learning Methods for Estimating Heterogeneous Causal Effects’. ArXiv Preprint ArXiv:1504.01132.Google Scholar

Bailey, Michael A. 2007. ‘Comparable Preference Estimates Across Time and Institutions for the Court, Congress, and Presidency’. American Journal of Political Science 51(3):433–448.CrossRef Google Scholar

Bareinboim, Elias, and Pearl, Judea. 2012. ‘Transportability of Causal Effects: Completeness Results’, vol. R-390. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Sheraton Centre Toronto, Toronto, Ontario, July 22–26, 2012.Google Scholar

Beck, Nathaniel, King, Gary, and Zeng, Langche. 2000. ‘Improving Quantitative Studies of International Conflict: A Conjecture’. American Political Science Review 94(1):21–35.CrossRef Google Scholar

Beck, Nathaniel, and Jackman, Simon. 1998. ‘Beyond Linearity by Default: Generalized Additive Models’. American Journal of Political Science 42(2), 596–627.CrossRef Google Scholar

Beger, Andreas, Dorff, Cassy L., and Ward, Michael D.. 2014. ‘Ensemble Forecasting of Irregular Leadership Change’. Research & Politics 1(3): http://journals.sagepub.com/doi/abs/10.1177/2053168014557511.CrossRef Google Scholar

Bengio, Yoshua. 2000. ‘Gradient-Based Optimization of Hyperparameters’. Neural Computation 12(8):1889–1900.CrossRef Google Scholar PubMed

Bergstra, James, and Bengio, Yoshua. 2012. ‘Random Search for Hyper-Parameter Optimization’. The Journal of Machine Learning Research 13(1):281–305.Google Scholar

Berk, Richard A. 2004. Regression Analysis: A Constructive Critique, vol. 11. Thousand oaks, CA: Sage.CrossRef Google Scholar

Bischl, Bernd, Mersmann, Olaf, Trautmann, Heike, and Weihs, Claus. 2012. ‘Resampling Methods for Meta-Model Validation With Recommendations for Evolutionary Computation’. Evolutionary Computation 20(2):249–275.Google Scholar PubMed

Brady, Henry E. 1986. ‘The Perils of Survey Research: Inter-Personally Incomparable Responses’. Political Methodology 11:269–291.Google Scholar

Breiman, Leo. 1996. ‘Stacked Regressions’. Machine Learning 24(1):49–64.CrossRef Google Scholar

Chenoweth, Erica, and Ulfelder, Jay. 2015. ‘Can Structural Conditions Explain the Onset of Nonviolent Uprisings?’. Journal of Conflict Resolution 61(2), 2017.Google Scholar

Dafoe, Allan. 2014. ‘Science Deserves Better: The Imperative to Share Complete Replication Files’. PS: Political Science & Politics 47(1):60–66.Google Scholar

Douglass, Rex W. 2015. ‘Understanding Civil War Violence Through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War’. ArXiv Preprint arXiv:1506.05413v1.CrossRef Google Scholar

Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge: Cambridge University Press.CrossRef Google Scholar

Efron, Bradley. 1982. The Jackknife, the Bootstrap and Other Resampling Plans, vol. 38. Philadelphia, PA: SIAM.CrossRef Google Scholar

Efron, Bradley, and Tibshirani, Robert J.. 1994. An Introduction to the Bootstrap. Boca Raton, FL: CRC press.CrossRef Google Scholar

Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert, and Stefan Wager. 2004. ‘Least Angle Regression’. The Annals of Statistics 32(2):407–499.CrossRef Google Scholar

Elkins, Zachary, and Sides, John. 2014. ‘The Vodka is Potent, but the Meat is Rotten1: Evaluating Measurement Equivalence Across Contexts’. Working Paper.Google Scholar

Fariss, Christopher J. 2014. ‘Respect for Human Rights Has Improved Over Time: Modeling the Changing Standard of Accountability in Human Rights Documents’. American Political Science Review 108(2):297–318.CrossRef Google Scholar

Fariss, Christopher J. Forthcoming. ‘Human Rights Treaty Compliance and the Changing Standard of Accountability’. British Journal of Political Science. http://dx.doi.org/10.1017/S000712341500054X.CrossRef Google Scholar

Friedman, Jerome H. 2001. ‘Greedy Function Approximation: A Gradient Boosting Machine’. Annals of Statistics 29(5):1189–1232.CrossRef Google Scholar

Gartzke, Erik. 1999. ‘War is in the Error Term’. International Organization 53(3):567–587.CrossRef Google Scholar

Gelman, Andrew. 2003. ‘A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing’. International Statistical Review 71(2):369–382.CrossRef Google Scholar

Gelman, Andrew. 2004. ‘Exploratory Data Analysis for Complex Models’. Journal of Computational and Graphical Statistics 13(4):755–779.CrossRef Google Scholar

Gelman, Andrew, and Shalizi, Cosma Rohilla. 2012. ‘Philosophy and the Practice of Bayesian Statistics’. British Journal of Mathematical and Statistical Psychology 66(1):8–38.CrossRef Google Scholar PubMed

Givens, Geof H., and Hoeting, Jennifer A.. 2012. Computational Statistics, vol. 708. Hoboken, NJ: John Wiley & Sons.CrossRef Google Scholar

Graham, Benjamin A. T., Gartzke, Erik A., and Fariss, Christopher J.. 2015. ‘Regime Type, Coalition Size, and Victory’. Political Science Research and Methods, doi:https://doi.org/10.1017/psrm.2015.52 CrossRef Google Scholar

Hainmueller, Jens, and Hazlett, Chad. 2014. ‘Kernel Regularized Least Squares: Reducing Misspecification Bias With a Flexible and Interpretable Machine Learning Approach’. Political Analysis 22:143–168.CrossRef Google Scholar

Handcock, Mark S., Raftery, Adrian E., and Tantrum, Jeremy M.. 2007. ‘Model-Based Clustering for Social Networks’. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2):301–354.CrossRef Google Scholar

Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition New York, NY: Springer.CrossRef Google Scholar

Hastie, Trevor J., and Tibshirani, Robert J.. 1990. Generalized Additive Models, vol. 43 Boca Raton, FL: CRC Press.Google Scholar

Herrnson, Paul S. 1995. ‘Replication, Verification, Secondary Analysis, and Data Collection in Political Science’. PS: Political Science & Politics 28(3):452–455.Google Scholar

Hill, Daniel W. Jr., and Jones, Zachary M.. 2014. ‘An Empirical Evaluation of Explanations for State Repression’. American Political Science Reivew 108(3):661–687.CrossRef Google Scholar

Hoff, Peter D. 2005. ‘Bilinear Mixed-Effects Models for Dyadic Data’. Journal of the American Statistical Association 100(469):286–295.CrossRef Google Scholar

Hoff, P. D. 2009. ‘Multiplicative Latent Factor Models for Description and Prediction of Social Networks’. Computational & Mathematical Organization Theory 15(4):261–272.CrossRef Google Scholar

Hoff, Peter D., Raftery, Adrian E., and Handcock, Mark S.. 2002. ‘Latent Space Approaches to Social Network Analysis’. Journal of the American Statistical Association 97(460):1090–1098.CrossRef Google Scholar

Hothorn, Torsten, Hornik, Kurt, and Zeileis, Achim. 2006. ‘Unbiased Recursive Partitioning: A Conditional Inference Framework’. Journal of Computational and Graphical Statistics 15(3):651–674.CrossRef Google Scholar

Hothorn, Torsten, Bühlmann, Peter, Kneib, Thomas, Schmid, Matthias, and Hofner, Benjamin. 2010. ‘Model-Based Boosting 2.0’. The Journal of Machine Learning Research 11:2109–2113.Google Scholar

Hothorn, Torsten, Buehlmann, Peter, Kneib, Thomas, Schmid, Matthias, and Hofner, Benjamin. 2014. ‘Model-Based Boosting’.Google Scholar

Jones, Zachary M. 2013. ‘Git/Github, Transparency, and Legitimacy in Quantitative Research’. The Political Methodologist 21(1):6–7.Google Scholar

Jones, Zachary M., and Linder, Fridolin. 2016. ‘edarf: Exploratory Data Analysis using Random Forests’. The Journal of Open Source Software. http://dx.doi.org/10.21105/joss.00092.CrossRef Google Scholar

Keele, Luke. 2015. ‘The Statistics of Causal Inference: A View from Political Methodology’. Political Analysis 23:313–335.CrossRef Google Scholar

Keele, Luke John. 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John Wiley & Sons.Google Scholar

Keele, Luke, and Titiunik, Rocío. 2015. ‘Natural Experiments Based on Geography’. Political Science Research and Methods 4(1):65–95.CrossRef Google Scholar

Kenkel, Brenton, and Signorino, Curtis S.. 2013. ‘Bootstrapped Basis Regression With Variable Selection: A New Method for Flexible Functional Form Estimation’. Manuscript, University of Rochester, Rochester, NY.Google Scholar

King, Gary. 1995. ‘Replication, Replication’. PS: Political Science and Politics XXVIII:494–499.Google Scholar

King, Gary. 2006. ‘Publication, Publication’. PS: Political Science and Politics XXXIX(1):119–125.Google Scholar

King, Gary, Murray, Christopher J. L., Solomon, Joshua A., and Tandon, Ajay. 2004. ‘Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research’. American Political Science Review 98(1):191–207.CrossRef Google Scholar

Lahiri, Soumendra Nath. 2003. Resampling Methods for Dependent Data. New York, NY: Springer.CrossRef Google Scholar

Lake, David A. 2013. ‘Theory is Dead, Long Live Theory: The End of the Great Debates and the Rise of Eclecticism in International Relations’. European Journal of International Relations 19(3):567–587.CrossRef Google Scholar

LeBlanc, Michael, and Tibshirani, Robert. 1996. ‘Combining Estimates in Regression and Classification’. Journal of the American Statistical Association 91(436):1641–1650.Google Scholar

McDonald, Daniel J., Shalizi, Cosma Rohilla, and Schervish, Mark. 2012. ‘Time Series Forecasting: Model Evaluation and Selection Using Nonparametric Risk Bounds’. ArXiv Preprint arXiv:1212.0463. Google Scholar

Mentch, Lucas, and Hooker, Giles. 2014. ‘Ensemble Trees and Clts: Statistical Inference for Supervised Learning’. ArXiv Preprint ArXiv:1404.6473.Google Scholar

Mingers, John. 1989. ‘An Empirical Comparison of Pruning Methods for Decision Tree Induction’. Machine Learning 4(2):227–243.CrossRef Google Scholar

Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M.. 2008. ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’. Political Analysis 16(4):372–403.CrossRef Google Scholar

Park, Trevor, and Casella, George. 2008. ‘The Bayesian Lasso’. Journal of the American Statistical Association 103(482):681–686.CrossRef Google Scholar

Quinn, Kevin M., Monroe, Burt L., Colaresi, Michael, Crespin, Michael H., and Radev, Dragomir R.. 2010. ‘How to Analyze Political Attention With Minimal Assumptions and Costs’. American Journal of Political Science 54(1):209–228.CrossRef Google Scholar

Schapire, Robert E., and Freund, Yoav. 2012. Boosting: Foundations and Algorithms. Cambridge, MA: MIT Press.CrossRef Google Scholar

Schnakenberg, Keith E., and Fariss, Christopher J.. 2014. ‘Dynamic Patterns of Human Rights Practices’. Political Science Research and Methods 2(1):1–31.CrossRef Google Scholar

Sexton, Joseph, and Laake, Petter. 2009. ‘Standard Errors for Bagged and Random Forest Estimators’. Computational Statistics & Data Analysis 53(3):801–811.CrossRef Google Scholar

Shadish, William R., Cook, Thomas D., and Campbell, Donald T.. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Bellmont, CA: Wadsworth Publishing.Google Scholar

Shadish, William R. 2010. ‘Campbell and Rubin: A Primer and Comparison of Their Approaches to Causal Inference in Field Setting’. Psychological Methods 12(1):3–17.CrossRef Google Scholar

Shmueli, Galit. 2010. ‘To Explain or to Predict?’. Statistical Science 25(3):289–310.CrossRef Google Scholar

Tibshirani, Robert. 1996. ‘Regression Shrinkage and Selection Via the Lasso’. Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267–288.CrossRef Google Scholar

Vapnik, Vladimir Naumovich. 1998. Statistical Learning Theory 2nd ed. New York, NY: Wiley.Google Scholar

Wager, Stefan, and Athey, Susan. 2015. ‘Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests’. ArXiv Preprint ArXiv:1510.04342.Google Scholar

Wager, Stefan, Hastie, Trevor, and Efron, Bradley. 2014. ‘Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife’. The Journal of Machine Learning Research 15(1):1625–1651.Google Scholar PubMed

Ward, Michael D., Greenhill, Brian D., and Bakke, Kristin M.. 2010. ‘The Perils of Policy by P-Value: Predicting Civil Conflicts’. Journal of Peace Research 47(4):363–375.CrossRef Google Scholar

Western, Bruce. 1998. ‘Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modeling Approach’. American Journal of Political Science 42(4):1233–1259.CrossRef Google Scholar

Wilcox, Clyde, Sigleman, Lee, and Cook, Elizabeth. 1989. ‘Some Like it Hot: Individual Differences in Responses to Group Feeling Thermometers’. Public Opinion Quarterly 53(2):246–257.CrossRef Google Scholar

Wood, Simon, and Wood, Maintainer Simon. 2015. ‘Package “Mgcv”’. R Package Version, 1–7.Google Scholar

Zou, Hui, and Hastie, Trevor. 2005. ‘Regularization and Variable Selection Via the Elastic Net’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301–320.CrossRef Google Scholar

Fariss and Jones Dataset

Dataset

http://dx.doi.org/10.7910/DVN/O2BK85

Link

Article contents

Enhancing Validity in Observational Settings When Replication is Not Possible*

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Fariss and Jones Dataset

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests