Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-08T14:36:17.252Z Has data issue: false hasContentIssue false

Improving Supreme Court Forecasting Using Boosted Decision Trees

Published online by Cambridge University Press:  19 February 2019

Aaron Russell Kaufman*
Affiliation:
PhD Candidate, Department of Government, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138, USA. Email: aaronkaufman@fas.harvard.edu
Peter Kraft
Affiliation:
PhD Candidate, Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, CA 94305, USA. Email: kraftp@cs.stanford.edu
Maya Sen
Affiliation:
Associate Professor, John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Cambridge, MA 02138, USA. Email: maya_sen@hks.harvard.edu
Rights & Permissions [Opens in a new window]

Abstract

Though used frequently in machine learning, boosted decision trees are largely unused in political science, despite many useful properties. We explain how to use one variant of boosted decision trees, AdaBoosted decision trees (ADTs), for social science predictions. We illustrate their use by examining a well-known political prediction problem, predicting U.S. Supreme Court rulings. We find that our ADT approach outperforms existing predictive models. We also provide two additional examples of the approach, one predicting the onset of civil wars and the other predicting county-level vote shares in U.S. presidential elections.

Information

Type
Letter
Copyright
Copyright © The Author(s) 2019. Published by Cambridge University Press on behalf of the Society for Political Methodology. 
Figure 0

Figure 1. Cross-Validation Accuracy for KKS compared to the “petitioner always wins” baseline, CourtCast, $\{\text{Marshall}\}+$, and a generic random forest. We compare these across three datasets: Supreme Court Database (“SCDB”), oral argument data, and both datasets jointly. For $\{\text{Marshall}\}+$ and CourtCast, black dots indicate the original dataset on which those models were trained.

Figure 1

Table 1. Accuracy (self-reported) for (1) the “petitioner always wins” baseline, (2) Katz et al. (2017)’s Random Forest, (3) $\{\text{Marshall}\}+$, (4) CourtCast, and (5) KKS. “Data” indicates the training dataset: case-level covariates from the Supreme Court Database (“SCDB”), transcript data from the oral arguments (“oral argument”), or both. The KKS model using all covariates almost triples the added accuracy of the next best model.

Figure 2

Table 2. KKS model accuracy by decision margin.

Figure 3

Table 3. ADTs outperform other methods in predicting both county-level vote share in the 2016 U.S. Presidential Election and civil war incidence.

Supplementary material: Link

Kaufman et al. dataset

Link
Supplementary material: File

Kaufman et al. supplementary material

Kaufman et al. supplementary material 1

Download Kaufman et al. supplementary material(File)
File 431.7 KB