Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-07T20:42:54.282Z Has data issue: false hasContentIssue false

Testing the ability of the surprisingly popular method to predict NFL games

Published online by Cambridge University Press:  01 January 2023

Michael D. Lee*
Affiliation:
Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, 92697-5100.
Irina Danileiko
Affiliation:
Department of Cognitive Sciences, University of California, Irvine
Julie Vi
Affiliation:
Department of Cognitive Sciences, University of California, Irvine
*
*Email: mdlee@uci.edu.
Rights & Permissions [Opens in a new window]

Abstract

We consider the recently-developed “surprisingly popular” method for aggregating decisions across a group of people (Prelec, Seung and McCoy, 2017). The method has shown impressive performance in a range of decision-making situations, but typically for situations in which the correct answer is already established. We consider the ability of the surprisingly popular method to make predictions in a situation where the correct answer does not exist at the time people are asked to make decisions. Specifically, we tested its ability to predict the winners of the 256 US National Football League (NFL) games in the 2017–2018 season. Each of these predictions used participants who self-rated as “extremely knowledgeable” about the NFL, drawn from a set of 100 participants recruited through Amazon Mechanical Turk (AMT). We compare the accuracy and calibration of the surprisingly popular method to a variety of alternatives: the mode and confidence-weighted predictions of the expert AMT participants, the individual and aggregated predictions of media experts, and a statistical Elo method based on the performance histories of the NFL teams. Our results are exploratory, and need replication, but we find that the surprisingly popular method outperforms all of these alternatives, and has reasonable calibration properties relating the confidence of its predictions to the accuracy of those predictions.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - 3.0
The authors license this article under the terms of the Creative Commons Attribution 3.0 License
Copyright
Copyright © The Authors [2018] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Figure 1: An example of the surprisingly popular method choosing the correct minority answer for the question “Is Seattle the capital of Washington?” The upper (blue) distribution shows the meta-cognitive estimates of agreement, in 10% bins centered from 5% to 95%, provided by people who answered “yes”. The lower (gray) distribution shows, on a downward oriented y-axis, the meta-cognitive estimates of agreement provided by people who answered “no”. The proportion of observed “yes” answers and the proportion of expected “yes” answers based on these data are shown by vertical lines, and listed. Also listed is the answer of the surprisingly popular method, which is “no” because the observed proportion is less than expected. The tick mark indicates the answer is correct.

Figure 1

Figure 2: The left panel shows the distribution of prediction accuracy for six different groups of people. At the top is the distribution for media experts and below are the distributions for AMT participants with different levels of self-rated overall knowledge. The middle panels shows the relationship between accuracy and confidence ratings for each self-rated knowledge group. The right panels show the relationship between the meta-cognitive estimate and true level of agreement in each prediction for each self-rated knowledge group.

Figure 2

Figure 3: The number of games correctly predicted by the surprising popular (sp), confidence-weighted tally (conf), mode of AMT participants (mode), mode of media experts (media), and Elo (elo) methods, and for 94 individual media experts. The stick figures represent the distribution of the number of games correctly predicted by the media experts. The labeled lines show the number of games correctly predicted by the methods.

Figure 3

Figure 4: Illustrative examples of the surprisingly popular and confidence-weighted tally methods for three NFL games. The three games correspond to the rows of panels, with the left-hand panel corresponding to the surprisingly popular method, and the right-hand panel corresponding to the confidence-weighted tally method. For the surprisingly popular method, as in Figure 1, the distributions of meta-cognitive estimates of agreement are shown for people choosing each team, and the observed and expected percentages of first-named home-team prediction are detailed, along with the answer of the method and its accuracy. For the confidence-weighted tally method, the distributions of confidence on a 5-point scale are shown for people choosing each team, and the confidence tallies are detailed, along with the answer of the method and its accuracy.

Figure 4

Figure 5: Relationship between pairs of methods and the accuracy of their predictions. Each panel corresponds to one of the 10 unique pairings of the five methods: surprising popular (sp), confidence-weighted tally (conf), mode of AMT participants (mode), mode of media experts (media), and Elo (elo). Within each panel, correct predictions are labeled as “c” and incorrect predictions are labeled as “i”. The areas of squares and overlain numbers show counts of games in which both methods made the same correct prediction (top-left), the same incorrect prediction (bottom-right), the left-labeled row method made a correct prediction but the top-labeled column method did not (top-right), or the top-labeled column method made a correct prediction but the left-labeled row method did not (bottom-left)

Figure 5

Figure 6: Results of the calibration analysis for the five aggregation methods. In each panel, the inset histogram shows the posterior probability of the 5 possible logistic growth models (“c” = chance, “d” = deterministic, “sd” = shifted deterministic, “p” = probabilistic, “sp” = shifted probabilistic) used by (LeeLee, 2017). The most likely model is labeled in bold. The lines show samples from the posterior distribution of the most likely calibration model, and the circular markers show samples from the joint distribution of confidence and accuracy for the predictions of the NFL games.

Figure 6

Figure 7: The number of games correctly predicted by the surprising popular (sp) , confidence-weighted tally (conf), mode of AMT participants (mode), based on all of the AMT participants. The number of games correctly predicted by the mode of media experts (media) and Elo (elo) methods, and for 94 individual media experts, are shown.

Figure 7

Figure 8: The accuracy of the predictions made by the surprising popular (sp), confidence-weighted tally (conf), mode of AMT participants (mode), mode of media experts (media), and Elo (elo) methods, for every game of the NFL season. Panels correspond to weeks of the seasons, rows to methods, and columns to games. Dark blue circles indicate a correct prediction; light orange circles indicate an incorrect prediction; and grey circles indicate neither team was favored.

Figure 8

Figure 9: Logistic growth calibration curves relating confidence on the x-axis to accuracy on the y-axis. The central panel shows the general model, with a growth parameter β, a bound parameter α, and a shift parameter δ. Surrounding panels show specific-case nested models with natural interpretations. Based on (LeeLee, 2017, Figure 3).

Supplementary material: File

Lee et al. supplementary material

Testing the Surprisingly Popular algorithm for sporting predictions
Download Lee et al. supplementary material(File)
File 783.9 KB
Supplementary material: File

Lee et al. supplementary material

Corrigendum for “Testing the Ability of the Surprisingly Popular Method to Predict NFL Games”
Download Lee et al. supplementary material(File)
File 1 MB