Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T21:10:43.088Z Has data issue: false hasContentIssue false

Aggregating multiple probability intervals to improve calibration

Published online by Cambridge University Press:  01 January 2023

Saemi Park*
Affiliation:
Department of Psychology, Fordham University
David V. Budescu*
Affiliation:
Corresponding author: Department of Psychology, Fordham University, Dealy Hall, 411 East Fordham Road, Bronx, NY, 10458
Rights & Permissions [Opens in a new window]

Abstract

We apply the principles of the “Wisdom of Crowds (WoC)” to improve the calibration of interval estimates. Previous research has documented the significant impact of the WoC on the accuracy of point estimates but only a few studies have examined its effectiveness in aggregating interval estimates. We demonstrate that collective probability intervals obtained by several heuristics can reduce the typical overconfidence of the individual estimates. We re-analyzed data from Glaser, Langer and Weber (2013) and from Soll and Klayman (2004) and applied four heuristics Averaging, Median, Enveloping, Probability averaging-suggested by Gaba, Tsetlin and Winkler (2014) and new heuristics, Averaging with trimming and Quartiles. We used the hit rate and the Mean Squared Error (MSE) to evaluate the quality of the methods. All methods reduced miscalibration to some degree, and Quartiles was the most beneficial securing accuracy and informativeness.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2015] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Table 1: Psychologically and statistically motivated aggregation methods studied by Lyon et al. (in press) Psychological improvement

Figure 1

Table 2: Description of heuristics applied to the current study

Figure 2

Table 3: The number of observations and group size used in the re-analysis of GLW data by the various heuristics

Figure 3

Table 4: The number of observations and group size used in re-analyzing the SK data

Figure 4

Figure 1: Hit rates of 4 variations on averaging with trimming as a function of group size in general knowledge and economics/finance questions (GLW data). Note. The largest group size only for A1_S25 is 47.

Figure 5

Figure 2: Hit rates of and of 5 heuristics and the best variation on the Averaging (A1) as a function of group size in general knowledge and economics/finance questions (GLW data). Note.The largest group size only for A1_S25 is 47

Figure 6

Figure 3: Mean Variance (MV), Mean Squared Bias (MSB), and Mean Squared Error (MSE) in general knowledge and economics/finance questions as a function of group size for six aggregation heuristics (GLW data).

Figure 7

Figure 4: Average hit rates across question domains and elicitation conditions as a function of group size for five aggregation heuristics (SK data). Note. The rate of log2 4(=2) in Quartiles replaced by the one in A1.

Figure 8

Figure 5: Mean square error across question domains and elicitation conditions as a function of group size for five aggregation heuristics (SK data).

Figure 9

Figure 6: Log base 2 of Median width of grouped intervals for two questions as a function of hit rate and groups size for Averaging, Quartiles, and Enveloping heuristics (GLW data). Circle size ordered by group size, 1...64.

Figure 10

Figure 7: Hit rates as a function of group size for five aggregation heuristics (DB data).

Figure 11

Figure 8: Mean variance, Mean squared bias, and Mean squared error as a function of group size for five aggregation heuristics (DB data)

Figure 12

Figure 9: Average Q-scores as a function of group size for the 5 methods for two items

Figure 13

Figure 10: Distributions of Kendall rank correlation between mean Q-scores and group size for all methods.

Supplementary material: File

Park and Budescu supplementary material

Park and Budescu supplementary material 1
Download Park and Budescu supplementary material(File)
File 123 KB
Supplementary material: File

Park and Budescu supplementary material

Park and Budescu supplementary material 2
Download Park and Budescu supplementary material(File)
File 35 KB
Supplementary material: File

Park and Budescu supplementary material

Park and Budescu supplementary material 3
Download Park and Budescu supplementary material(File)
File 524.8 KB
Supplementary material: File

Park and Budescu supplementary material

Park and Budescu supplementary material 4
Download Park and Budescu supplementary material(File)
File 289.9 KB