Hostname: page-component-77f85d65b8-grvzd Total loading time: 0 Render date: 2026-03-28T20:37:46.593Z Has data issue: false hasContentIssue false

Coherence of probability judgments from uncertain evidence: Does ACH help?

Published online by Cambridge University Press:  01 January 2023

Christopher W. Karvetski*
Affiliation:
KaDSci LLC
David R. Mandel
Affiliation:
Defence Research and Development Canada
Rights & Permissions [Opens in a new window]

Abstract

Although the Analysis of Competing Hypotheses method (ACH) is a structured analytic technique promoted in several intelligence communities for improving the quality of probabilistic hypothesis testing, it has received little empirical testing. Whereas previous evaluations have used numerical evidence assumed to be perfectly accurate, in the present experiment we tested the effectiveness of ACH using a judgment task that presented participants with uncertain evidence varying in source reliability and information credibility. Participants (N = 227) assigned probabilities to two alternative hypotheses across six cases that systematically varied case features. Across multiple tests of coherence, the ACH group showed no advantage over a no-technique control group. Both groups showed evidence of subadditivity, unreliability, and overly conservative non-Bayesian judgments. The ACH group also showed pseudo-diagnostic weighting of evidence. The findings do not support the claim that ACH is effective at improving probabilistic judgment.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2020] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Table 1: Prompts for probability equivalents(Pe) for uncertainty terms.

Figure 1

Figure 1: Example case from the judgment task.

Figure 2

Figure 2: Example of Step 1 in the ACH process as implemented in the experiment.

Figure 3

Figure 3: Example of Step 3 in the ACH process as implemented in the experiment.

Figure 4

Table 2: ACH consistency scoring logic used in the experiment for very inconsistent (II), inconsistent (I), neutral/not applicable (N), consistent (C), and very inconsistent (CC).

Figure 5

Figure 4: Example of Step 4 in the ACH process as implemented in the experiment.

Figure 6

Table 3: Description of six cases used within the experiment. An A1 informant is one that is deemed “completely reliable” and makes a claim that is “completely credible”. A C3 informant is one that is deemed “fairly reliable”, and makes a claim that is judged “possibly true”. A F3 informant is rated as “reliability cannot be judged”, and makes a claim denoted as “possibly true”.

Figure 7

Table 4: ANOVA results for uncertainty terms. Term denotes likely, highly likely, A1, C3, or F3.

Figure 8

Figure 5: Violin plots of probability equivalents (Pe) of qualitative uncertainty terms. An A1 is informant one that is deemed “completely reliable” and makes a claim that is “completely credible”. A C3 informant is one that is deemed “fairly reliable”, and makes a claim that is judged “possibly true”. A F3 informant is rated as “reliability cannot be judged”, and makes a claim denoted as “possibly true”.

Figure 9

Table 5: Mean probability equivalents (Pe) of qualitative uncertainty terms.. M is the mean value whereas LB and UB are 95% confidence interval lower and upper bounds. An A1 informant is one that is deemed “completely reliable” and makes a claim that is “completely credible”. A C3 informant is one that is deemed “fairly reliable”, and makes a claim that is judged “possibly true”. A F3 informant is rated as “reliability cannot be judged”, and makes a claim denoted as “possibly true”.

Figure 10

Table 6: Mean total evidential support by pair. M is the mean value whereas LB and UB are 95% confidence interval lower and upper bounds.

Figure 11

Table 7: ANOVA results for mean absolute deviation (MAD) measuring reliability between isomorphic case pairs. Total evidential support consists of the three levels of support (high, medium, low).

Figure 12

Table 8: ANOVA results for bias from complementarity. Total evidential support consists of the three levels of support (high, medium, low).

Figure 13

Table 9: Comparisons of the elicited probabilities (both raw and normalized) with Bayesian posterior probabilities in terms of agreement percentage (φ) and mean absolute deviation (MAD). M is the mean value whereas LB and UB are 95% confidence interval lower and upper bounds.

Figure 14

Table 10: Average Bayesian, elicited and normalized elicited probability judgments by case. Hyp. = hypothesis, M is the mean value, and LB and UB are 95% confidence interval lower and upper bounds.