Skip to main content

What Can We Learn from Predictive Modeling?

  • Skyler J. Cranmer (a1) and Bruce A. Desmarais (a2)

The large majority of inferences drawn in empirical political research follow from model-based associations (e.g., regression). Here, we articulate the benefits of predictive modeling as a complement to this approach. Predictive models aim to specify a probabilistic model that provides a good fit to testing data that were not used to estimate the model’s parameters. Our goals are threefold. First, we review the central benefits of this under-utilized approach from a perspective uncommon in the existing literature: we focus on how predictive modeling can be used to complement and augment standard associational analyses. Second, we advance the state of the literature by laying out a simple set of benchmark predictive criteria. Third, we illustrate our approach through a detailed application to the prediction of interstate conflict.

Corresponding author
* Email:
Hide All

Authors’ note: Many thanks to Alison Craig for research assistance. Sincere thanks also to Matt Blackwell and Michael Neblo for helpful comments on an earlier draft. The authors are grateful for the support of the National Science Foundation (SES-1558661, SES-1619644, SES-1637089, CISE-1320219, SES-1357622, SES-1514750, and SES-1461493) and the Alexander von Humboldt Foundation. Replication data are posted to the Political Analysis Dataverse (Cranmer and Desmarais 2016a).

Contributing Editor: Jonathan Katz

Hide All
Achen Christopher H. 2002. Toward a new political methodology: Microfoundations and ART. Annual Review of Political Science 5(1):423450.
Adamic Lada A., and Adar Eytan. 2003. Friends and neighbors on the web. Social Networks 25(3):211230.
Airoldi Edoardo M., Blei David M., Fienberg Stephen E., and Xing Eric P.. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9:19812014.
Attewell Paul, Monaghan David B., and Kwong Darren. 2015. Preparing training and test datasets . 1 edn. University of California Press, pp. 6371.
Beck Nathaniel, Katz Jonathan N., and Tucker Richard. 1998. Taking time seriously: Time-series-cross-section analysis with a binary dependent variable. American Journal of Political Science 42(4):12601288.
Beck Nathaniel, King Gary, and Zeng Langche. 2000. Improving quantitative studies of international conflict: A conjecture. American Political Science Review 94(1):2135.
Brandt Patrick T., Freeman John R., and Schrodt Philip A.. 2011. Real time, time series forecasting of political conflict. Conflict Management and Peace Science 28(1):4164.
Cawley Gavin C., and Talbot Nicola L. C.. 2010. On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research 11:20792107.
Clarke Kevin A., and Primo David M.. 2007. Modernizing political science: A model-based approach. Perspectives on Politics 5(4):741753.
Collopy Fred, Adya Monica, and Armstrong J. Scott. 1994. Principles for examining predictive validity – The case of information systems spending forecasts. Information Systems Research 5(2):170179.
Cranmer Skyler J., and Desmarais Bruce A.. 2011. Inferential network analysis with exponential random graph models. Political Analysis 19(1):6686.
Cranmer Skyler, and Desmarais Bruce. 2016a. Replication data for: What can we learn from predictive modeling?, Harvard Dataverse.
Cranmer Skyler J., and Desmarais Bruce A.. 2016b. A critique of dyadic design. International Studies Quarterly 60(2):355362.
Cranmer Skyler J., Desmarais Bruce A., and Kirkland Justin H.. 2012. Towards a network theory of alliance formation. International Interactions 38(3):295324.
Cranmer Skyler J., Desmarais Bruce A., and Menninga Elizabeth J.. 2012. Complex dependencies in the alliance network. Conflict Management and Peace Science 29(3):279313.
Cranmer Skyler J., Menninga Elizabeth J., and Mucha Peter J.. 2015. Kantian fractionalization predicts the conflict propensity of the international system. Proceedings of the National Academy of Sciences 112(38):1181211816.
Cranmer Skyler J., Rice Douglas, and Siverson Randolph M.. 2015. What to do about atheoretic lags. Political Science Research Methods, doi:10.1017/psrm.2015.36.
Cybenko George. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2(4):303314.
Dafoe Allan. 2011. Statistical critiques of the democratic peace: Caveat emptor. American Journal of Political Science 55(2):247262.
Davis Jesse, and Goadrich Mark. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06 . New York, NY: ACM, pp. 233240.
Desmarais Bruce A., and Cranmer Skyler J.. 2010. Consistent confidence intervals for maximum pseudolikelihood estimators. In Proceedings of the Neural Information Processing Systems 2010 Workshop on Computational Social Science and the Wisdom of Crowds .
Desmarais Bruce A., and Cranmer Skyler J.. 2011. Forecasting the locational dynamics of transnational terrorism: A network analytic approach. In Proceedings of the European Intelligence and Security Informatics Conference (EISIC) 2011 , Athens, Greece: IEEE Computer Society.
Desmarais Bruce A., and Cranmer Skyler J.. 2012. Statistical mechanics of networks: Estimation and uncertainty. Physica A 391(4):18651876.
Dettling Marcel, and Bühlmann Peter. 2003. Boosting for tumor classification with gene expression data. Bioinformatics 19(9):10611069.
Droge Bernd. 1999. Asymptotic optimality of full cross-validation for selecting linear regression models. Statistics and Probability Letters 44(4):351357.
Druckman James N., Green Donald P., Kuklinski James H., and Lupia Arthur. 2006. The growth and development of experimental research in political science. American Political Science Review 100(04):627635.
Esteban Cristóbal, Schmidt Danilo, Krompaß Denis, and Tresp Volker. 2015. Predicting sequences of clinical events by using a personalized temporal latent embedding model. In Healthcare Informatics (ICHI), 2015 International Conference on IEEE , pp. 130139.
Faraway Julian James. 2006. Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models, vol. 66 . New York: CRC press.
Fawcett Tom. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27(8):861874.
Gartzke Erik. 2007. The capitalist peace. American Journal of Political Science 51(1):166191.
Gill Jeff. 2014. Bayesian methods: A social and behavioral sciences approach . Boca Raton: Chapman and Hall/CRC.
Gleditsch Kristian Skrede. 2002. Expanded trade and GDP data. Journal of Conflict Resolution 46(5):712724.
Gleditsch Kristian S., and Ward Michael D.. 2001. Measuring space: A minimum-distance database and applications to international studies. Journal of Peace Research 38(6):739758.
Goeman J. J., Meijer R. J., and Chaturvedi N.. 2016. Penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model. R package version 0.9-46.
Goldstone Jack A., Bates Robert H., Epstein David L., Gurr Ted Robert, Lustik Michael B., Marshall Monty G., Ulfelder Jay, and Woodward Mark. 2010. A global model for forecasting political instability. American Journal of Political Science 54(1):190208.
Gurbaxani Vijay, and Mendelson Haim. 1990. An integrative model of information systems spending growth. Information Systems Research 1(1):2346.
Gurbaxani Vijay, and Mendelson Haim. 1994. Modeling vs. forecasting—The case of information systems spending. Information Systems Research 5(2):180190.
Hall Peter. 1983. Large sample optimality of least squares cross-validation in density estimation. The Annals of Statistics 11(4):11561174.
Hanneke Steve, Fu Wenjie, and Xing Eric P.. 2010. Discrete temporal models of social networks. The Electronic Journal of Statistics 4:585605.
Hastie Trevor, Tibshirani Robert, and Friedman Jerome. 2009. The elements of statistical learning: Data mining, inference, and prediction . 2nd edn. New York: Springer.
Hoff Peter D., Raftery Adrian E., and Handcock Mark S.. 2002. Latent space approaches to social network analysis. Journal of the American Statistical association 97(460):10901098.
Hua Wang, Cuiqin Ma, and Lijuan Zhou. 2009. A brief review of machine learning and its application. In Information engineering and computer science, 2009. ICIECS 2009. International Conference on IEEE , pp. 14.
Jensen David D., and Cohen Paul R.. 2000. Multiple comparisons in induction algorithms. Machine Learning 38(3):309338.
Keele Luke. 2015. The statistics of causal inference: A view from political methodology. Political Analysis 23:313335.
Kuhn Max, and Johnson Kjell. 2013. Applied predictive modeling . New York: Springer.
van der Laan Mark J., Dudoit Sandrine, and Keles Sunduz. 2004. Asymptotic optimality of likelihood-based cross-validation. Statistical Applications in Genetics and Molecular Biology 3(1):123.
Leicht Elizabeth A., Holme Petter, and Newman Mark E. J.. 2006. Vertex similarity in networks. Physical Review E 73(2):026120.
Lopes Miguel, and Bontempi Gianluca. 2014. On the null distribution of the precision and recall curve. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases Berlin: Springer, pp. 322337.
Marshall Monty G., and Jaggers Keith. 2002. Polity IV project: Political regime characteristics and transitions, pp. 1800–2002.
Meyer Patrick E., Lafitte Frederic, and Bontempi Gianluca. 2008. MINET: An open source R/Bioconductor package for mutual information based network inference. BMC Bioinformatics 9.
Muchlinski David, Siroky David, He Jingrui, and Kocher Matthew. 2016. Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data. Political Analysis 24(1):87103.
Nadeau Claude, and Bengio Yoshua. 2003. Inference for the generalization error. Machine Learning 52(3):239281.
Nowak Robert D. 1997. Optimal signal estimation using cross-validation. IEEE Signal Processing Letters 4(1):2325.
Oneal John, and Russett Bruce M.. 1999. The Kantian peace: The Pacific benefits of democracy, interdependence, and international organization. World Politics 52(1):137.
Oneal John R., and Russett Bruce. 2005. Rule of three, let it be? When more really is better. Conflict Management and Peace Science 22(4):293310.
Ozenne Brice, Subtil Fabien, and Maucort-Boulch Delphine. 2015. The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. Journal of Clinical Epidemiology 68(8):855859.
Pevenhouse Jon C., and Goldstein Joshua S.. 1999. Serbian compliance or defiance in Kosovo? Statistical analysis and real-time predictions. Journal of Conflict Resolution 43(4):538546.
Pevehouse Jon, Nordstrom Timothy, and Warnke Kevin. 2004. The correlates of war 2 international governmental organizations data version 2.0. Conflict Management and Peace Science 21(2):101119.
Pons Pascal, and Latapy Matthieu. 2005. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10(2):191218.
Rakotomalala Ricco, Chauchat Jean-Hughes, and Pellegrino Francois. 2006. Accuracy estimation with clustered dataset. In Proceedings of the Fifth Australasian Conference on Data Mining and Analystics 61:17–22 . Sydney, Australia: Australian Computer Society, Inc.
Ripley Brian, and Venables William. 2016. Nnet: Feed-forward neural networks and multinomial log-linear models ., R package version 7.3-12.
Rost Nicholas, Schneider Gerald, and Kleibl Johannes. 2009. A global risk assessment model for civil wars. Social Science Research 38(4):921933.
Schneider Gerald. 2012. Banking on the Broker: Forecasting conflict in the levant with financial data. In Illuminating the shadow of the future: scientific prediction and the human condition , ed. Wayman Frank, Williamson Paul, and Bueno de Mesquita Bruce. Ann Arbor: University of Michigan Press.
Schneider Gerald, Gleditsch Nils Petter, and Carey Sabine. 2010. Exploring the past, anticipating the future: A symposium. International Studies Review 12(1):17.
Schneider Gerald, Gleditsch Nils Petter, and Carey Sabine. 2011. Forecasting in international relations: One quest, three approaches. Conflict Management and Peace Science 28(5):514.
Schrodt Philip A., and Gerner Deborah J.. 2000. Using cluster analysis to derive early warning indicators for political change in the middle east, 1979–1996. American Political Science Review 94(4):803818.
Shmueli Galit. 2010. To explain or to predict?. Statistical Science 25(3):289310.
Sing T., Sander O., Beerenwinkel N., and Lengauer T.. 2005. ROCR: Visualizing classifier performance in R. Bioinformatics 21(20):7881.
Singer J. David, Bremer Stuart, and Stuckey John. 1972. Capability distribution, uncertainty, and major power war, 1820–1965. Peace, war, and numbers 19:48.
Stinnett Douglas M., Tir Jaroslav, Diehl Paul F., Schafer Philip, and Gochman Charles. 2002. The Correlates of War (COW) project direct contiguity data, version 3.0. Conflict Management and Peace Science 19(2):5967.
Stone M. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B 36(2):111147.
Stone M. 1977. Asymptotics for and against cross-validation. Biometrika 64(1):2935.
Tuszynski Jarek. 2014. caTools: Tools: Moving window statistics, GIF, Base64, ROC AUC, etc. R package version 1.17.
Van Maanen John, Sørensen Jesper B., and Mitchell Terence R.. 2007. The interplay between theory and method. Academy of Management Review 32(4):11451154.
Ward Michael D., Greenhill Brian D., and Bakke Kristin M.. 2010. The perils of policy by p-value: Predicting civil conflicts. Journal of Peace Research 47(4):363375.
Ward Michael D., Siverson Randolph M., and Cao Xun. 2007. Disputes, democracies, and dependencies: A reexamination of the Kantian peace. American Journal of Political Science 51(3):583601.
Zou Hui, and Hastie Trevor. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301320.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 8
Total number of PDF views: 189 *
Loading metrics...

Abstract views

Total abstract views: 636 *
Loading metrics...

* Views captured on Cambridge Core between 24th April 2017 - 18th November 2017. This data will be updated every 24 hours.