Hostname: page-component-6766d58669-zlvph Total loading time: 0 Render date: 2026-05-16T22:10:47.073Z Has data issue: false hasContentIssue false

Strategies for exploration in the domain of losses

Published online by Cambridge University Press:  01 January 2023

Paul M. Krueger*
Affiliation:
Department of Psychology, University of California, Berkeley 94720 Equal contribution
Robert C. Wilson*
Affiliation:
Equal contribution Department of Psychology and Cognitive Science Program, University of Arizona 85721
Jonathan D. Cohen
Affiliation:
Princeton Neuroscience Institute, Princeton University 08544 Department of Psychology, Princeton University 08544
Rights & Permissions [Opens in a new window]

Abstract

Many decisions in everyday life involve a choice between exploring options that are currently unknown and exploiting options that are already known to be rewarding. Previous work has suggested that humans solve such “explore-exploit” dilemmas using a mixture of two strategies: directed exploration, in which information seeking drives exploration by choice, and random exploration, in which behavioral variability drives exploration by chance. One limitation of this previous work was that, like most studies on explore-exploit decision making, it focused exclusively on the domain of gains, where the goal was to maximize reward. In many real-world decisions, however, the goal is to minimize losses and it is well known from Prospect Theory that behavior can be quite different in this domain. In this study, we compared explore-exploit behavior of human subjects under conditions of gain and loss. We found that people use both directed and random exploration regardless of whether they are exploring to maximize gains or minimize losses and that there is quantitative agreement between the exploration parameters across domains. Our results also revealed an overall bias towards the more uncertain option in the domain of losses. While this bias towards uncertainty was qualitatively consistent with the predictions of Prospect Theory, quantitatively we found that the bias was better described by a Bayesian account, in which subjects had a prior that was optimistic for losses and pessimistic for gains. Taken together, our results suggest that explore-exploit decisions are driven by three independent processes: directed and random exploration, and a baseline uncertainty seeking that is driven by a prior.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2017] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Figure 1 — Task design. Screenshots of four different games showing the gains condition (A & B) and losses condition (C & D), and the horizon 1 condition (A & C) and horizon 6 condition (C & D). In the gains condition, points are added to subjects’ scores, and in the losses condition, points are subtracted. The height of the bandits represents the game length, either 5 or 10 trials. The first four trials of every game are forced, wherein subjects are instructed which bandit to select. In the [1 3] unequal uncertainty condition illustrated here, subjects are instructed to choose one option once and the other three times. In the [2 2] equal uncertainty condition, not shown, subjects play both options twice. Free-choice trials are cued by a pair of green squares located inside the box of each bandit. Once a subject presses a button to choose a bandit, the lever of that bandit flips down and the number of points for that bandit is displayed, followed by the onset of any remaining free-choice trials.

Figure 1

Figure 2 — Graphical depiction of the hierarchical Bayesian model. In this plot, each node corresponds to a variable in the model. Shaded nodes correspond to observed variables (e.g. choices) and unshaded nodes correspond to hidden variables (e.g. information bonus or decision noise). Discrete variables are represented as squares and continuous variables as circles. The group variables, which are illustrated as different “plates,” have different values for different games, g, subjects, s, or conditions, n (defined by the valence, uncertainty and horizon). For each game, the observable data (shaded nodes) consisted of a choice, cnsg, the difference in mean between each option, Δ Rnsg, and the difference in uncertainty between each option, Δ Insg. The model estimates posterior distributions of both the single subject-level parameters: the information bonus, Ans, decision noise, σns, and spatial bias, Bns, and the group-level parameters: µnA, σnA, knσ, λnσ, µnB and σBn.

Figure 2

Figure 3 — Performance plots. (A) Learning curves showing the fraction of responses in which subjects (solid lines) and the optimal model (green asterisks) chose the bandit with the greater underlying generative mean, as a function of free-choice trial number. (B) Performance of all 39 subjects in the gains and losses conditions. Five subjects performing at chance in either condition were excluded (red crosses), while the remaining 34 subjects performed equally well in both the gains and losses conditions.

Figure 3

Figure 4 — Model-free measures of directed and random exploration. (A) The fraction of trials in which subjects choose the more uncertain option increases from horizon 1 to horizon 6, indicative of directed exploration. It also increases from gains to losses, but this does not interact with the horizon condition. This is consistent with increased uncertainty seeking in losses, but not a difference in directed exploration between gains and losses. (B) The decision noise (calculated as the fraction of trials in which the low-mean option was chosen) increases from horizon 1 to horizon 6, indicative of random exploration. There is no significant difference in decision noise between gains and losses. Data points are averaged across 34 subjects, with error-bars indicating the standard error of the mean.

Figure 4

Figure 5 — Model-based measures of directed and random exploration. Parameter fits, averaged across 34 subjects, with error-bars indicating 95% credible intervals across subjects. (A) As with the model-free results, the information bonus is greater in the losses condition than in the gains condition, and greater for horizon 6 than for horizon 1. Decision noise is greater in horizon 6 than in horizon 1 in both uncertainty conditions (B & C), and greater overall in the unequal uncertainty condition (B) than the equal uncertainty condition (C). Decision noise is not significantly different across the gains and losses conditions.

Figure 5

Figure 6 — Posterior distributions showing the estimated information bonus is greater for losses than for gains. The difference between gains and losses in the posterior distributions of the information bonus shows that the estimated information bonus is greater for losses than for gains. This overall shift in the domain of losses is indicative of uncertainty seeking with losses.

Figure 6

Figure 7 - The interaction between reward and uncertainty according to Prospect Theory (A & B) and a Bayesian Shrinkage hypothesis (C & D). (A) Under conditions of uncertainty about rewards, the average utility, ⟨ U(R) ⟩ (black dots), will deviate below the utility of the actual average, U(⟨ R ⟩) (gray dots) in the domain of gains, and above the utility of the mean in the domain of losses. (B) The difference | ⟨ U(R) ⟩ − U(⟨ R ⟩) | is larger in magnitude closer to zero. As a result, Prospect Theory predicts that for gains, the more uncertain option is more aversive than the less uncertain option for low-mean rewards, and less aversive for high-mean rewards; for losses, the more uncertain option is more favorable for small negative losses, and less favorable for large negative losses. (C) The Bayesian Shrinkage hypothesis postulates that the posterior estimate of reward is biased by a prior that is optimistic for losses and pessimistic for gains. (D) The difference between the mean of the posterior distribution and the mean of the likelihood distribution increases further from zero, and increases when the likelihood distribution is more uncertain. As a result, the Bayesian Shrinkage hypothesis predicts that for gains, the more uncertain option becomes more aversive than the less uncertain option as the mean reward increases; for losses, the more uncertain option becomes more preferable as losses increase.

Figure 7

Figure 8 — Model-free analysis of reward magnitude effect is consistent with Bayesian Shrinkage hypothesis. In both horizon conditions, for both losses (A) and gains (B) there is a negative association between mean reward and choice of the more uncertain option, p(high info). Error-bars indicate the standard error of the mean across subjects.

Figure 8

Figure 9 — Distribution of group-level mean of the mean reward scale factor, µγ. In all conditions the µγ is less than zero with high probability, providing strong support for the Bayesian Shrinkage hypothesis.

Supplementary material: File

Krueger et al. supplementary material

Krueger et al. supplementary material 1
Download Krueger et al. supplementary material(File)
File 2.8 MB
Supplementary material: File

Krueger et al. supplementary material

Krueger et al. supplementary material 2
Download Krueger et al. supplementary material(File)
File 199.8 KB