Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-07T05:13:23.482Z Has data issue: false hasContentIssue false

A universal method for evaluating the quality of aggregators

Published online by Cambridge University Press:  01 January 2023

Ying Han*
Affiliation:
Department of Psychology, Fordham University
David Budescu*
Affiliation:
Department of Psychology, Fordham University
Rights & Permissions [Opens in a new window]

Abstract

We propose a new method to facilitate comparison of aggregated forecasts based on different aggregation, elicitation and calibration methods. Aggregates are evaluated by their relative position on the cumulative distribution of the corresponding individual scores. This allows one to compare methods using different measures of quality that use different scales. We illustrate the use of the method by re-analyzing various estimates from Budescu and Du (Management Science, 2007).

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2019] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Table 1: The Number of observations and group sizes used in analyzing Budescu and Du (2007)

Figure 1

Table 2: Mean and Median Aggregation Percentiles of Aggregated Brier Scores, Q Scores and Hit Rates for 50% CI, 70% CI and 90% CI for Different Group Sizes

Figure 2

Figure 1: Mean and median aggregation of transformed Brier scores of 5 different group sizes plotted on empirical cumulative distribution of individual Brier scores (Figure 1A for mean aggregation and Figure 1B for median aggregation). Error bars that match the colors of dots represent 90% empirical confidence interval of averaged aggregated Brier scores of different group sizes.

Figure 3

Figure 2: Mean and median aggregation of Q scores of 50% CIs of 5 different group sizes plotted on empirical cumulative distribution of individual Q scores of 50% CI (Figure 2A for mean aggregation and Figure 2B for median aggregation). Error bars that match the colors of dots represent 90% empirical confidence interval of averaged aggregated Q scores of 50% CI of different group sizes.

Figure 4

Figure 3: Mean and median aggregation of Q scores of 70% CIs of 5 different group sizes plotted on empirical cumulative distribution of individual Q scores of 70% CI (Figure 3A for mean aggregation and Figure 3B for median aggregation). Error bars that match the colors of dots represent 90% empirical confidence interval of averaged aggregated Q scores of 70% CI of different group sizes.

Figure 5

Figure 4: Mean and median aggregation of Q scores of 90% CI of 5 different group sizes plotted on empirical cumulative distribution of individual Q scores of 90% CIs (Figure 4A for mean aggregation and Figure 4B for median aggregation). Error bars that match the colors of dots represent 90% empirical confidence interval of averaged aggregated Q scores of 90% CI of different group sizes.

Figure 6

Figure 5: Aggregated Brier scores, Q scores of 50% CI, 70% CI and 90% CI of 5 different group sizes using mean (5A) and median aggregation (5B).

Figure 7

Figure 6: Aggregated hit rates and Q scores of 50% CI of 5 different group sizes plotted on empirical cumulative distribution of individual hit rates and Q scores of 50% CIs using mean aggregation (Figure 6A for aggregated hit rates of 50% CI and Figure 6B for aggregated Q scores of 50% CI).

Figure 8

Table 3: Mean Percentiles and Corresponding Empirical 90% CI of Aggregated Q Scores for 50% CI, 70% CI and 90% CI for Different Group Sizes Using Mean Aggregation

Figure 9

Figure 7: Aggregated hit rates and Q scores of 70% CI of 5 different group sizes plotted on empirical cumulative distribution of individual hit rates and Q scores of 70% CIs using mean aggregation (Figure 7A for aggregated hit rates of 70% CI and Figure 7B for aggregated Q scores of 70% CI).

Figure 10

Figure 8: Aggregated hit rates and Q scores of 90% CI of 5 different group sizes plotted on empirical cumulative distribution of individual hit rates and Q scores of 90% CIs using mean aggregation (Figure 8A for aggregated hit rates of 90% CI and Figure 8B for aggregated Q scores of 90% CI).

Figure 11

Table 4: Probability That a Judge Who Is in the Top X% at t1 Would Also Be in the Top X% at t2 for Different Bivariate Normal Distributions

Figure 12

Table 5: Performance of Top Judges and Mean & Median Aggregations

Figure 13

Figure 9: Aggregated point probabilities under two extremization approaches (Figure 9A for extremization based on α1 and Figure 9B for extremization based on α2).

Figure 14

Table 6: Performance of Two Extremization Methods and Median Aggregation of Raw Forecasts

Supplementary material: File

Han and Budescu supplementary material

Han and Budescu supplementary material
Download Han and Budescu supplementary material(File)
File 280.7 KB