Skip to main content

Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data

  • David Muchlinski (a1), David Siroky (a2), Jingrui He (a3) and Matthew Kocher (a4)

The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L 1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

Corresponding author
e-mail: (corresponding author)
Hide All

Author's note: Replication data are available on the Political Analysis Dataverse at

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 265 *
Loading metrics...

Abstract views

Total abstract views: 1103 *
Loading metrics...

* Views captured on Cambridge Core between 4th January 2017 - 19th August 2018. This data will be updated every 24 hours.