Hostname: page-component-5db58dd55d-d6ndz Total loading time: 0 Render date: 2026-06-04T07:39:35.000Z Has data issue: false hasContentIssue false

Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data: A Comment

Published online by Cambridge University Press:  31 December 2018

Yu Wang*
Affiliation:
Department of Political Science, Unversity of Rochester, Rochester, NY 14627, USA. Email: ywang176@ur.rochester.edu
Rights & Permissions [Opens in a new window]

Abstract

Information

Type
Letter
Copyright
Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology. 
Figure 0

Figure 1. One way to spot the error is to visually inspect the receiver operating characteristic (ROC) plot. The dashed bounding box has an area of 0.9. The AUC of the random forest’s ROC curve is supposed to be 0.91, suggesting that the curve is not presented correctly.

Figure 1

Figure 2. The dark curves are plotted using cross validation. The gray ones are from the original paper and are plotted using models trained with the entirety of the dataset.

Figure 2

Figure 3. When evaluated using cross validation rather than the entire dataset, all classifiers perform worse, but particularly so for the random forest model.

Figure 3

Figure 4. For almost all false positive rates, the AdaBoost model and the gradient boosted trees achieve a higher true positive rate than the random forest model.

Supplementary material: File

Wang supplementary material

Wang supplementary material 1

Download Wang supplementary material(File)
File 192.2 KB