Hostname: page-component-77f85d65b8-8wtlm Total loading time: 0 Render date: 2026-04-20T23:06:32.039Z Has data issue: false hasContentIssue false

The relationship between crowd majority and accuracy for binary decisions

Published online by Cambridge University Press:  01 January 2023

Michael D. Lee*
Affiliation:
Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, 92697-5100
Megan N. Lee
Affiliation:
Department of Cognitive Sciences, University of California, Irvine
*
* Email: mdlee@uci.edu
Rights & Permissions [Opens in a new window]

Abstract

We consider the wisdom of the crowd situation in which individuals make binary decisions, and the majority answer is used as the group decision. Using data sets from nine different domains, we examine the relationship between the size of the majority and the accuracy of the crowd decisions. We find empirically that these calibration curves take many different forms for different domains, and the distribution of majority sizes over decisions in a domain also varies widely. We develop a growth model for inferring and interpreting the calibration curve in a domain, and apply it to the same nine data sets using Bayesian methods. The modeling approach is able to infer important qualitative properties of a domain, such as whether it involves decisions that have ground truths or are inherently uncertain. It is also able to make inferences about important quantitative properties of a domain, such as how quickly the crowd accuracy increases as the size of the majority increases. We discuss potential applications of the measurement model, and the need to develop a psychological account of the variety of calibration curves that evidently exist.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2017] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Figure 1: Summary of observed behavior in nine data sets. In each panel, the y-axis corresponds to the number of individuals in the crowd making a decision, and the x-axis corresponds to the number of individuals choosing the correct alternative for that decision. The area of the circles corresponds to the number of decisions in the data set with each count of individuals being correct and crowd size. The total number of decisions in the domain is also listed.

Figure 1

Figure 2: Calibration curves relating the majority size to its average accuracy. Each panel corresponds to a data set. The x-axis corresponds to the proportion of individuals who chose the majority alternative, grouped into bins. The y-axis corresponds to the proportion of decisions in each bin for which the majority was correct. The area of each circle is proportional to how many decisions belong to each majority-size bin.

Figure 2

Figure 3: Empirical calibration curves relation the proportion of decision makers in the majority on the x-axis, to the average accuracy of the majority decision on the y-axis.

Figure 3

Figure 4: Results from applying the latent-mixture logistic growth model to the nine data sets. Each panel corresponds to a data set. The inset histogram shows the posterior probability of the 5 mixture component models (“c” = chance, “d” = deterministic, “sd” = shifted deterministic, “p” = probabilistic, “sp” = shifted probabilistic). The most likely model is labeled in bold. The lines show samples from the posterior distribution of the most likely calibration model, and the circular markers show samples from the joint distribution of majority proportion θ and crowd accuracy φ aggregated over all of the decisions.

Figure 4

Table 1: Model and parameter inferences applying the latent-mixture logistic-growth model to the nine data sets (“det” = deterministic, “shift det” = shifted deterministic, “prob” = probabilistic, “shift prob” = shifted probabilistic).

Supplementary material: File

Lee and Lee supplementary material

Lee and Lee supplementary material
Download Lee and Lee supplementary material(File)
File 30.1 KB