Hostname: page-component-77f85d65b8-pkds5 Total loading time: 0 Render date: 2026-04-18T17:32:31.114Z Has data issue: false hasContentIssue false

How generalizable is good judgment? A multi-task, multi-benchmark study

Published online by Cambridge University Press:  01 January 2023

Barbara A. Mellers*
Affiliation:
Department of Psychology, Solomon Labs, 3720 Walnut St., University of Pennsylvania, Philadelphia, PA 19104
Joshua D. Baker
Affiliation:
University of Pennsylvania
Eva Chen
Affiliation:
University of Pennsylvania
David R. Mandel
Affiliation:
DRDC and York University
Philip E. Tetlock
Affiliation:
University of Pennsylvania
Rights & Permissions [Opens in a new window]

Abstract

Good judgment is often gauged against two gold standards – coherence and correspondence. Judgments are coherent if they demonstrate consistency with the axioms of probability theory or propositional logic. Judgments are correspondent if they agree with ground truth. When gold standards are unavailable, silver standards such as consistency and discrimination can be used to evaluate judgment quality. Individuals are consistent if they assign similar judgments to comparable stimuli, and they discriminate if they assign different judgments to dissimilar stimuli. We ask whether “superforecasters”, individuals with noteworthy correspondence skills (see Mellers et al., 2014) show superior performance on laboratory tasks assessing other standards of good judgment. Results showed that superforecasters either tied or out-performed less correspondent forecasters and undergraduates with no forecasting experience on tests of consistency, discrimination, and coherence. While multifaceted, good judgment may be a more unified than concept than previously thought.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2017] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Table 1: Laboratory tasks and associated benchmarks across the two surveys.

Figure 1

Figure 1: Average numerical interpretation of probability phrases in survey 1.

Figure 2

Figure 2: Average numerical interpretation of probability phrases in survey 2.

Figure 3

Table 2: Average plausible interval widths across probability terms

Figure 4

Table 3: Proportion of correct responses in congruence and information bias tasks.

Figure 5

Table 4: Proportion of correct responses to Bayesian problems (± 1%).

Figure 6

Table 5: Average absolute error (AAE) in Bayesian reasoning problems.

Figure 7

Table 6: Average correlations across benchmarks.

Supplementary material: File

Mellers et al. supplementary material

Mellers et al. supplementary material 1
Download Mellers et al. supplementary material(File)
File 295.3 KB
Supplementary material: File

Mellers et al. supplementary material

Mellers et al. supplementary material 2
Download Mellers et al. supplementary material(File)
File 18.4 KB
Supplementary material: File

Mellers et al. supplementary material

Mellers et al. supplementary material 3
Download Mellers et al. supplementary material(File)
File 18.5 KB
Supplementary material: File

Mellers et al. supplementary material

Mellers et al. supplementary material 4
Download Mellers et al. supplementary material(File)
File 52.2 KB