Hostname: page-component-6766d58669-nf276 Total loading time: 0 Render date: 2026-05-19T07:53:55.272Z Has data issue: false hasContentIssue false

Strategies using recent feedback lead to matching or maximising behaviours

Published online by Cambridge University Press:  01 January 2023

Zhenbo Cheng
Affiliation:
Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
Jingying Gao
Affiliation:
Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
Leilei Zhang
Affiliation:
Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
Rights & Permissions [Opens in a new window]

Abstract

One challenge facing humans (and nonhuman animal) is that some options that appear attractive locally may not turn out best in the long run. To analyse this human learning problem, we explore human performance in a dynamic decision-making task that places local and global rewards in conflict. We found that experiences that included previous choices and rewards are not easily incorporated into people’s strategy to enhance their performance. Our results suggest that humans are easily driven by concerns about recent feedback, and that choice of a suboptimal behaviour option may be overcome by providing informative cues that indicate a clear immediate outcome for a better option.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2018] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Figure 1: The task and reward structure. A-C) the three conditions in the task. A) Sample two trials during the first condition. The participant selects the button A and obtain score of 55 in the first trial. Then the button B is chosen in the second trial and obtain score of 38. After each choice, a scale bar is updated to reflect the reward earned for that choice, and the bar height following a choice depends on the obtained score for that choice. B) The previous choices and scores are shown on the top of the screen in the second condition. C) The previous choices and scores are separated on both sides of the screen according to the choices in the third condition. D) Reward functions (blue and green curve) for two choices as the function of choice allocation to A. The dashed red curve shows the utility rate for different proportions of responses to A.

Figure 1

Figure 2: Average proportion choice A and average reward rate for each participant (black points), overlaid over the reward structure for button A (blue) and B (green).

Figure 2

Figure 3: Distributions of allocation to A (top panels) and total rewards (bottom panels) in the three conditions.

Figure 3

Figure 4: Proportion of participants with melioration and optimal strategy in the three conditions.

Supplementary material: File

Cheng et al. supplementary material

Cheng et al. supplementary material 1
Download Cheng et al. supplementary material(File)
File 27.6 KB
Supplementary material: File

Cheng et al. supplementary material

Cheng et al. supplementary material 2
Download Cheng et al. supplementary material(File)
File 27.6 KB
Supplementary material: File

Cheng et al. supplementary material

Cheng et al. supplementary material 3
Download Cheng et al. supplementary material(File)
File 29.9 KB