Skip to main content Accessibility help
×
Home

Can nonexperts really emulate statistical learning methods? A comment on “The accuracy, fairness, and limits of predicting recidivism”

  • Kirk Bansak (a1)

Abstract

Recent research has questioned the value of statistical learning methods for producing accurate predictions in the criminal justice context. Using results from respondents on Amazon Mechanical Turk (MTurkers) who were asked to predict recidivism, Dressel and Farid (2018) argue that nonexperts can achieve predictive accuracy and fairness on par with algorithmic approaches that employ statistical learning models. Analyzing the same data from the original study, this comment employs additional techniques and compares the quality of the predicted probabilities output from statistical learning procedures versus the MTurkers’ evaluations. The metrics presented indicate that statistical approaches do, in fact, outperform the nonexperts in important ways. Based on these new analyses, it is difficult to accept the conclusion presented in Dressel and Farid (2018) that their results “cast significant doubt on the entire effort of algorithmic recidivism prediction.”

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Can nonexperts really emulate statistical learning methods? A comment on “The accuracy, fairness, and limits of predicting recidivism”
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Can nonexperts really emulate statistical learning methods? A comment on “The accuracy, fairness, and limits of predicting recidivism”
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Can nonexperts really emulate statistical learning methods? A comment on “The accuracy, fairness, and limits of predicting recidivism”
      Available formats
      ×

Copyright

Corresponding author

Footnotes

Hide All

Contributing Editor: Jeff Gill

Author’s note: For helpful advice, the author thanks Jens Hainmueller, Justin Grimmer, Mike Tomz, Sharad Goel, and two anonymous reviewers. Replication materials are available in Bansak (2018). The author declares that he has no competing interests.

Footnotes

References

Hide All
Angwin, Julia, Larson, Jeff, Mattu, Surya, and Kirchner, Lauren. 2016. Machine bias. ProPublica , May 23, 2016. Available at: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Bansak, Kirk. 2018 Replication materials for: Can non-experts really emulate statistical learning methods? https://doi.org/10.7910/DVN/KT20FE, Harvard Dataverse, V1.
Bansak, Kirk, Ferwerda, Jeremy, Hainmueller, Jens, Dillon, Andrea, Hangartner, Dominik, Lawrence, Duncan, and Weinstein, Jeremy. 2018. Improving refugee integration through data-driven algorithmic assignment. Science 359(6373):325329.
Blattenberger, Gail, and Lad, Frank. 1985. Separating the brier score into calibration and refinement components: a graphical exposition. The American Statistician 39(1):2632.
Brier, Glenn W. 1950. Verification of forecasts expressed in terms of probability. Monthly Weather Review 78(1):13.
Chodosh, Sara. 2018. Courts use algorithms to help determine sentencing, but random people get the same results. Popular Science , January 18, 2018. Available at: https://www.popsci.com/recidivism-algorithm-random-bias.
Chokshi, Niraj. 2018. Can software predict crime? Maybe so, but no better than a human. New York Times , January 19, 2018. Available at: https://www.nytimes.com/2018/01/19/us/computer-software-human-decisions.html.
Cohen, Ira, and Goldszmidt, Moises. 2004. Properties and benefits of calibrated classifiers. In Proceedings of the 8th European Conference on Principles of Data Mining and Knowledge Discovery , ed. Boulicaut, Jean-François, Esposito, Floriana, Giannotti, Fosca, and Pedreschi, Dino. New York: Springer-Verlag, pp. 125136.
Corbett-Davies, Sam, Pierson, Emma, Feller, Avi, Goel, Sharad, and Huq, Aziz. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York: Association for Computing Machinery, pp. 797806.
Corbett-Davies, Sam, Goel, Sharad, and González-Bailón, Sandra. 2017. Even imperfect algorithms can improve the criminal justice system. New York Times , December 20, 2017. Available at: https://www.nytimes.com/2017/12/20/upshot/algorithms-bail-criminal-justice-system.html.
Devlin, Hannah. 2018. Software ‘no more accurate than untrained humans’ at judging reoffending risk. The Guardian , January 17, 2018. Available at: https://www.theguardian.com/us-news/2018/jan/17/software-no-more-accurate-than-untrained-humans-at-judging-reoffending-risk.
Dressel, Julia, and Farid, Hany. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances 4(1):eaao5580.
Gneiting, Tilmann, and Raftery, Adrian E.. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477):359378.
Hanczar, Blaise, Hua, Jianping, Sima, Chao, Weinstein, John, Bittner, Michael, and Dougherty, Edward R.. 2010. Small-sample precision of ROC-related estimates. Bioinformatics 26(6):822830.
Hand, David J. 2009. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning 77(1):103123.
Hernández-Orallo, José, Flach, Peter, and Ferri, Cèsar. 2012. A unified view of performance metrics: translating threshold choice into expected classification loss. Journal of Machine Learning Research 13:28132869.
Hofman, Jake M., Sharma, Amit, and Watts, Duncan J.. 2017. Prediction and explanation in social systems. Science 355(6324):486488.
James, Nathan. 2015. Risk and Needs Assessment in the Criminal Justice System . Congressional Research Service.
Kleinberg, Jon, Lakkaraju, Himabindu, Leskovec, Jure, Ludwig, Jens, and Mullainathan, Sendhil. 2017. Human decisions and machine predictions. The Quarterly Journal of Economics 133(1):237293.
Kleinberg, Jon, Mullainathan, Sendhil, and Raghavan, Manish. 2017. Inherent trade-offs in the fair determination of risk scores. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference , ed. Papadimitriou, Christos H.. Wadern, Germany: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, pp. 43:1–43:23.
Milgrom, Paul R., and Tadelis, Steven. 2018 How artificial intelligence and machine learning can impact market design. Technical report, National Bureau of Economic Research.
Monahan, John, and Skeem, Jennifer L.. 2016. Risk assessment in criminal sentencing. Annual Review of Clinical Psychology 12:489513.
Murphy, Allan H. 1973. A new vector partition of the probability score. Journal of Applied Meteorology 12(4):595600.
Niculescu-Mizil, Alexandru, and Caruana, Rich. 2005a. Obtaining calibrated probabilities from boosting. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence , ed. Cohn, Anthony. Arlington, VA: AAAI Press, pp. 413420.
Niculescu-Mizil, Alexandru, and Caruana, Rich. 2005b. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning . New York: Association for Computing Machinery, pp. 625632.
Platt, John C. 2000. Probabilities for SV machines. In Advances in Large Margin Classifiers , ed. Smola, Alexander, Bartlett, Peter, Schölkopf, Bernhard, and Schuurmans, Dale. Cambridge, MA: MIT Press, pp. 6173.
Steyerberg, Ewout W., Vickers, Andrew J., Cook, Nancy R., Gerds, Thomas, Gonen, Mithat, Obuchowski, Nancy, Pencina, Michael J., and Kattan, Michael W.. 2010. Assessing the performance of prediction models. Epidemiology 21(1):128138.
Winkler, Robert L., and Murphy, Allan H.. 1968. “Good” probability assessors. Journal of Applied Meteorology 7(5):751758.
Zadrozny, Bianca, and Elkan, Charles. 2001. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the 18th International Conference on Machine Learning , ed. Brodley, Carla E. and Danyluk, Andrea Pohoreckyj. San Francisco: Morgan Kaufmann Publishers, pp. 609616.
MathJax
MathJax is a JavaScript display engine for mathematics. For more information see http://www.mathjax.org.

Keywords

Type Description Title
UNKNOWN
Supplementary materials

Bansak supplementary material
Bansak supplementary material 1

 Unknown (171 KB)
171 KB

Can nonexperts really emulate statistical learning methods? A comment on “The accuracy, fairness, and limits of predicting recidivism”

  • Kirk Bansak (a1)

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed