Hostname: page-component-77f85d65b8-v2srd Total loading time: 0 Render date: 2026-03-29T07:39:39.808Z Has data issue: false hasContentIssue false

Strategies in the multi-armed bandit

Published online by Cambridge University Press:  24 November 2025

Stanton Hudja
Affiliation:
Stuart School of Business, Illinois Institute of Technology, Chicago, Illinois, USA
Daniel Woods*
Affiliation:
Department of Economics, MQBS Experimental Economics Laboratory, Macquarie Business School, Sydney, New South Wales, Australia
*
Corresponding author: Daniel Woods; Email: daniel.woods@mq.edu.au
Rights & Permissions [Opens in a new window]

Abstract

This paper analyzes individual behavior in multi-armed bandit problems. We use a between-subjects experiment to implement four bandit problems that vary based on the horizon (indefinite or finite) and the number of bandit arms (two or three). We analyze commonly suggested strategies and find that an overwhelming majority of subjects are best fit by either a probabilistic “win-stay lose-shift” strategy or reinforcement learning. However, we show that subjects violate the assumptions of the probabilistic win-stay lose-shift strategy as switching depends on more than the previous outcome. We design two new “biased” strategies that adapt either reinforcement learning or myopic quantal response by incorporating a bias toward choosing the previous arm. We find that a majority of subjects are best fit by one of these two strategies but also find heterogeneity in subjects’ best-fitting strategies. We show that the performance of our biased strategies is robust to adapting popular strategies from other literatures (e.g., EWA and I-SAW) and using different selection criteria. Additionally, we find that our biased strategies best fit a majority of subjects when analyzing a new treatment with a new set of subjects.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of the Economic Science Association.
Figure 0

Fig. 1 An example of the experimental interface for the two-armed indefinite horizon treatment

Figure 1

Table 1. Predictions and results for each treatment

Figure 2

Table 2. Switching based on last outcome

Figure 3

Table 3. List of commonly suggested deterministic strategies

Figure 4

Table 4. Comparison of deterministic strategies

Figure 5

Table 5. List of commonly suggested probabilistic strategies

Figure 6

Table 6. Comparison of previously suggested strategies

Figure 7

Table 7. Comparison of previously suggested and new strategies

Figure 8

Table 8. Comparison of strategies in Robustness treatment