Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-05T10:00:36.608Z Has data issue: false hasContentIssue false

Bayesian combination of correlated subjective probability estimates

Published online by Cambridge University Press:  05 May 2026

Susanne Trick*
Affiliation:
Centre for Cognitive Science, Technical University of Darmstadt , Darmstadt, Germany Institute of Psychology, Technical University of Darmstadt, Darmstadt, Germany
Frank Jäkel
Affiliation:
Centre for Cognitive Science, Technical University of Darmstadt , Darmstadt, Germany Institute of Psychology, Technical University of Darmstadt, Darmstadt, Germany
Costantin A. Rothkopf
Affiliation:
Centre for Cognitive Science, Technical University of Darmstadt , Darmstadt, Germany Institute of Psychology, Technical University of Darmstadt, Darmstadt, Germany Frankfurt Institute for Advanced Studies, Goethe University, Frankfurt, Germany
*
Corresponding author: Susanne Trick; Email: susanne.trick@tu-darmstadt.de
Rights & Permissions [Opens in a new window]

Abstract

The combination of human forecasters’ subjective probability estimates usually improves upon the estimates provided by individual forecasters. In order to combine the probability estimates in a Bayes optimal way, prior work proposed a normative Bayesian fusion model that models the estimates with a beta distribution conditioned on their truth value. However, this model assumes conditionally independent probability estimates, although estimates provided by different forecasters are usually correlated. Here, we introduce a Bayesian model for combining subjective probability estimates that explicitly considers their correlation. We assume that an estimate provided by a forecaster for a given query depends on both the forecaster’s skill and the query’s difficulty. The correlation between probability estimates provided by different forecasters is assumed to be caused by the queries that make the forecasters provide similar estimates, for example, correct and highly confident estimates for very easy queries. Our model represents the probability estimates with a beta distribution conditioned on their truth value. It explicitly models the forecasters’ skills and the queries’ difficulties with skill parameters specific for each forecaster and difficulty parameters specific for each query. In this way, it can model the correlations between probability estimates and consider it when combining the estimates. Evaluations on a data set consisting of the subjective probability estimates of 85 human forecasters for 180 queries show improved fusion performance in terms of Brier score compared to related Bayesian fusion models. In particular, it outperforms independent fusion models that suffer from overconfidence.

Information

Type
Theory Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Society for Judgment and Decision Making and European Association for Decision Making
Figure 0

Figure 1 The graphical model of the Skill–Difficulty Correlated Fusion Model.

Figure 1

Figure 2 Graphical model of the Skill–Difficulty Correlated Fusion Model with all priors, reparameterized using a beta$^{\prime }$ distribution with parameters mean and precision and including N training queries with known truth values and M fusion queries with unknown truth values.

Figure 2

Figure 3 Three examples of simulated forecasts of two forecasters, $x^1$ and $x^2$, for 20,000 queries, 10,000 of which with truth value $t=0$ (left column) and 10,000 with truth value $t=1$ (right column), with different correlations, low (a), medium (b), and high (c). The two forecasters are equally skilled with skill parameters $\mu _1=\mu _2=0.25$ and $\rho _1=\rho _2=12$. The difficulty parameters $\eta _n$ and $\phi _n$ for different n are drawn from their prior distributions. For generating the low correlation close to 0 for $t=0$ and $t=1$ in (a), their parameters are chosen as $u_{\theta }=0.5, p_{\theta }=4, f_1=0.1, f_2=1,000$, and $w=0.2$. For generating a medium correlation of 0.46 in (b), they are $u_{\theta }=0.5, p_{\theta }=4, f_1=4, f_2=0.5$, and $w=0.2$. The forecasts in (c) with a correlation close to 1 are generated with $u_{\theta }=0.5, p_{\theta }=4, f_1=1,000, f_2=0.1$, and $w=0.2$.

Figure 3

Figure 4 An overview of the skills of different forecasters (a) and the difficulties of different queries (b) from the KTeC data set. We plot the skill means $\mu _k$ against the skill precisions $\rho _k$ of all 85 forecasters (a) and the difficulty means $\eta _n$ against the difficulty precisions $\phi _n$ of training queries 2–180 (b) in training split 1 of LOO cross-validation. Low means indicate unbiased forecasts provided by forecaster k or for query n, and high means indicate biased forecasts. Low precisions indicate high variability in forecasts provided by forecaster k or for query n, and high precisions indicate low variability in forecasts. We highlighted exemplary forecasters 10, 11, and 63 and exemplary queries 2, 6, 72, 92, 123, and 164. Error bars are not included because the standard errors of the mean are too small to be visible.

Figure 4

Figure 5 The forecasts provided by three exemplary forecasters along with their estimated distribution according to our Skill–Difficulty Correlated Fusion Model. We show the relative frequency of the forecasts provided by forecaster 11 (a), 63 (b), and 10 (c) on the 179 training queries in training split 1 as histograms. Forecasts for queries with truth value $t_n=1$ are inverted in order to display a forecaster’s skill regardless of the queries’ truth values. Thus, a forecast is correct if $x_n^k<0.5$. The curves plotted over the data illustrate the estimated distributions over all forecasts provided by forecaster k according to our model in (2.4). Since the difficulty parameters $\eta _n, \phi _n$ are different for every single forecast in the shown data, the shown distributions are equally-weighted mixture of beta$^{\prime }$ distributions consisting of 179 components according to (2.4) for $t_n=0$ with skill parameters $\mu _k,\rho _k$ for the respective forecaster k and the difficulty parameters $\eta _n,\phi _n$ of the 179 training queries in the training split.

Figure 5

Figure 6 The forecasts provided for six exemplary queries along with their estimated distribution according to our Skill–Difficulty Correlated Fusion Model. We show the relative frequency of the forecasts provided by all 85 forecasters for training queries 92 (a), 164 (b), 123 (c), 6 (d), 72 (e), and 2 (f) in training split 1 as histograms. Forecasts for queries with truth value $t_n=1$ are inverted in order to display a query’s difficulty regardless of its truth value. Thus, a forecast $x_n^k<0.5$ is a correct forecast. The curves plotted over the data are the distributions over the forecasts provided for the respective query according to our model in (2.4). Since the skill parameters $\mu _k, \rho _k$ are different for every single forecast in the shown data, the shown distributions are equally-weighted mixture of beta$^{\prime }$ distributions consisting of 85 components according to (2.4) for $t_n=0$ with difficulty parameters $\eta _n,\phi _n$ for the respective training query n and the skill parameters $\mu _k,\rho _k$ of the 85 forecasters.

Figure 6

Figure 7 Fusion performance of the Skill–Difficulty Correlated Fusion Model in comparison to related Bayesian fusion models on the KTeC data set. We compare the means and standard errors of the mean of Brier score, mean absolute error, and entropy of the Skill–Difficulty Correlated Fusion Model (SDCFM), the four independent beta fusion models, including the Hierarchical Beta Fusion Model (HB), the non-hierarchical Beta Fusion Model (B), the Hierarchical Symmetric Beta Fusion Model (HSB), and the non-hierarchical Symmetric Beta Fusion Model (SB), the models by Turner et al. (2014), Average Then Calibrate (ATC), Calibrate Then Average (CTA), Calibrate Then Average using Log-Odds (CTALO), Hierarchical Calibrate Then Average (HCTA), and Hierarchical Calibrate Then Average on Log-Odds (HCTALO), and the two baseline methods Unweighted Linear Opinion Pool (ULINOP) and Probit Average (PAVG).

Supplementary material: File

Trick et al. supplementary material

Trick et al. supplementary material
Download Trick et al. supplementary material(File)
File 262.6 KB