Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-07T04:35:30.574Z Has data issue: false hasContentIssue false

Can nonexperts really emulate statistical learning methods? A comment on “The accuracy, fairness, and limits of predicting recidivism”

Published online by Cambridge University Press:  08 November 2018

Kirk Bansak*
Affiliation:
Department of Political Science, Stanford University, Stanford, CA 94305-6044, USA. Email: kbansak@stanford.edu
Rights & Permissions [Opens in a new window]

Abstract

Recent research has questioned the value of statistical learning methods for producing accurate predictions in the criminal justice context. Using results from respondents on Amazon Mechanical Turk (MTurkers) who were asked to predict recidivism, Dressel and Farid (2018) argue that nonexperts can achieve predictive accuracy and fairness on par with algorithmic approaches that employ statistical learning models. Analyzing the same data from the original study, this comment employs additional techniques and compares the quality of the predicted probabilities output from statistical learning procedures versus the MTurkers’ evaluations. The metrics presented indicate that statistical approaches do, in fact, outperform the nonexperts in important ways. Based on these new analyses, it is difficult to accept the conclusion presented in Dressel and Farid (2018) that their results “cast significant doubt on the entire effort of algorithmic recidivism prediction.”

Information

Type
Letter
Copyright
Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology. 
Figure 0

Figure 1. Variation in MTurkers’ individual performance. The figure displays a histogram of the proportion of correct predictions for each individual MTurker, when not presented with race (results for MTurkers presented with race are similar). The vertical lines denote the mean and median proportion correct.

Figure 1

Figure 2. Probability calibration across methods (MTurkers not told defendant race), using Sample approach to model uncertainty. The top two panels display probability calibration plots. Each point and interval in the upper two panels correspond to a bin of predicted probabilities. The black triangles comprise the MTurkers’ calibration points for the evaluations where MTurkers were not provided with the defendants’ race. Each point’s position along the $x$-axis signifies the mean predicted probability within the bin, while its position on the $y$-axis signifies the actual proportion of positives among the units contained within the bin. The gray points represent the mean proportion of positives within each bin across 1000 evaluations of each of the statistical learning methods, while the error bars provide 95% confidence intervals for the proportion of positives within each bin, with uncertainty modeled using the Sample approach. The three bottom panels display histograms of the predicted probabilities for each method.

Figure 2

Figure 3. Probability calibration across methods (MTurkers not told defendant race), using Bootstrap approach to model uncertainty. The top two panels display probability calibration plots. Each point and interval in the upper two panels correspond to a bin of predicted probabilities. The black triangles comprise the MTurkers’ calibration points for the evaluations where MTurkers were not provided with the defendants’ race. Each point’s position along the $x$-axis signifies the mean predicted probability within the bin, while its position on the $y$-axis signifies the actual proportion of positives among the units contained within the bin. The gray points represent the mean proportion of positives within each bin across 1000 evaluations of each of the statistical learning methods, while the error bars provide 95% confidence intervals for the proportion of positives within each bin, with uncertainty modeled using the Bootstrap approach. The three bottom panels display histograms of the predicted probabilities for each method.

Figure 3

Table 1. Model performance results. The table displays several performance metrics for the statistical learning methods—gradient boosted trees (GBM) and logistic regression (Logit)—under both approaches to modeling uncertainty (Sample and Bootstrap), along with the results for the MTurkers’ pooled evaluations both without and with race presented. For the statistical learning methods, 95% confidence intervals are displayed. A cut point of $0.5$ is employed for the PCC, FPR, and FNR.

Supplementary material: File

Bansak supplementary material

Bansak supplementary material 1

Download Bansak supplementary material(File)
File 170.6 KB