Search

Strategies in the multi-armed bandit
Stanton Hudja, Daniel Woods
Journal:

Experimental Economics ,

Published online by Cambridge University Press:

24 November 2025, pp. 1-23
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper analyzes individual behavior in multi-armed bandit problems. We use a between-subjects experiment to implement four bandit problems that vary based on the horizon (indefinite or finite) and the number of bandit arms (two or three). We analyze commonly suggested strategies and find that an overwhelming majority of subjects are best fit by either a probabilistic “win-stay lose-shift” strategy or reinforcement learning. However, we show that subjects violate the assumptions of the probabilistic win-stay lose-shift strategy as switching depends on more than the previous outcome. We design two new “biased” strategies that adapt either reinforcement learning or myopic quantal response by incorporating a bias toward choosing the previous arm. We find that a majority of subjects are best fit by one of these two strategies but also find heterogeneity in subjects’ best-fitting strategies. We show that the performance of our biased strategies is robust to adapting popular strategies from other literatures (e.g., EWA and I-SAW) and using different selection criteria. Additionally, we find that our biased strategies best fit a majority of subjects when analyzing a new treatment with a new set of subjects.

Crowdsourced Adaptive Surveys
Yamil Ricardo Velez
Journal:

Political Analysis / Volume 33 / Issue 4 / October 2025

Published online by Cambridge University Press:

12 February 2025, pp. 284-297
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Public opinion surveys are vital for informing democratic decision-making, but responding to rapidly changing information environments and measuring beliefs within hard-to-reach communities can be challenging for traditional survey methods. This paper introduces a crowdsourced adaptive survey methodology (CSAS) that unites advances in natural language processing and adaptive algorithms to produce surveys that evolve with participant input. The CSAS method converts open-ended text provided by participants into survey items and applies a multi-armed bandit algorithm to determine which questions should be prioritized in the survey. The method’s adaptive nature allows new survey questions to be explored and imposes minimal costs in survey length. Applications in the domains of misinformation, issue salience, and local politics showcase CSAS’s ability to identify topics that might otherwise escape the notice of survey researchers. I conclude by highlighting CSAS’s potential to bridge conceptual gaps between researchers and participants in survey research.

Conditions for indexability of restless bandits and an $O(K^3)$ algorithm to compute whittle index – CORRIGENDUM
Part of
- Numerical methods in calculus of variations and optimal control
- Mathematical economics
Nima Akbarzadeh, Aditya Mahajan
Journal:

Advances in Applied Probability / Volume 55 / Issue 4 / December 2023

Published online by Cambridge University Press:

09 June 2023, pp. 1473-1474

Print publication:

December 2023
- Article
- - You have access
- PDF
- HTML
- Export citation
This note corrects an error in the formula to obtain the Whittle index using the Sherman–Morrison formula in Akbarzadeh and Mahajan (2022). Also, some other minor typos are highlighted.

Conditions for indexability of restless bandits and an $\mathcal{O}\!\left(K^3\right)$ algorithm to compute Whittle index
Part of
- Numerical methods in calculus of variations and optimal control
- Mathematical economics
Nima Akbarzadeh, Aditya Mahajan
Journal:

Advances in Applied Probability / Volume 54 / Issue 4 / December 2022

Published online by Cambridge University Press:

14 June 2022, pp. 1164-1192

Print publication:

December 2022
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative processes where the evolution of the processes depends on the resources allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach because of its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler-to-verify refinements of these conditions. We then revisit a previously proposed algorithm called the adaptive greedy algorithm which is known to compute the Whittle index for a sub-class of restless bandits. We show that a generalization of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of this algorithm which can compute the Whittle index of a restless bandit with K states in $\mathcal{O}\!\left(K^3\right)$ computations. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.

Optimal stochastic scheduling of forest networks with switching penalties
Part of
- Special processes
- Operations research and management science
Mark P. Van Oyen, Demosthenis Teneketzis
Journal:

Advances in Applied Probability / Volume 26 / Issue 2 / June 1994

Published online by Cambridge University Press:

01 July 2016, pp. 474-497

Print publication:

June 1994
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We present structural properties of optimal policies for the problem of scheduling a single server in a forest network of N queues (without arrivals) subject to switching penalties. In addition to linear holding costs, we impose either lump sum switching costs or batch set-up delays which are incurred at each instant the server processes a job in a queue different from the previous one. We use reward rate notions to unearth conditions on the holding costs and service distributions for which an exhaustive policy is optimal. For the case of two nodes connected probabilistically in tandem, we explicitly define an optimal policy under similar conditions.

Optimality of index policies for stochastic scheduling with switching penalties
Part of
- Operations research and management science
- Stochastic systems and control
Mark P. Van Oyen, Dimitrios G. Pandelis, Demosthenis Teneketzis
Journal:

Journal of Applied Probability / Volume 29 / Issue 4 / December 1992

Published online by Cambridge University Press:

14 July 2016, pp. 957-966

Print publication:

December 1992
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We investigate the impact of switching penalties on the nature of optimal scheduling policies for systems of parallel queues without arrivals. We study two types of switching penalties incurred when switching between queues: lump sum costs and time delays. Under the assumption that the service periods of jobs in a given queue possess the same distribution, we derive an index rule that defines an optimal policy. For switching penalties that depend on the particular nodes involved in a switch, we show that although an index rule is not optimal in general, there is an exhaustive service policy that is optimal.

Restless bandits: activity allocation in a changing world
P. Whittle
Journal:

Journal of Applied Probability / Volume 25 / Issue A / 1988

Published online by Cambridge University Press:

14 July 2016, pp. 287-298

Print publication:

1988
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We consider a population of n projects which in general continue to evolve whether in operation or not (although by different rules). It is desired to choose the projects in operation at each instant of time so as to maximise the expected rate of reward, under a constraint upon the expected number of projects in operation. The Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static. If one is constrained to operate m projects exactly then arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal. The index is evaluated for some particular projects.

Search Results

Refine search

Refine search

Actions for selected content:

7 results

Strategies in the multi-armed bandit

Crowdsourced Adaptive Surveys

Conditions for indexability of restless bandits and an $O(K^3)$ algorithm to compute whittle index – CORRIGENDUM

Conditions for indexability of restless bandits and an $\mathcal{O}\!\left(K^3\right)$ algorithm to compute Whittle index

Optimal stochastic scheduling of forest networks with switching penalties

Optimality of index policies for stochastic scheduling with switching penalties

Restless bandits: activity allocation in a changing world

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

7 results

Strategies in the multi-armed bandit

Crowdsourced Adaptive Surveys

Conditions for indexability of restless bandits and an $O(K^3)$ algorithm to compute whittle index – CORRIGENDUM

Conditions for indexability of restless bandits and an $\mathcal{O}\!\left(K^3\right)$ algorithm to compute Whittle index

Optimal stochastic scheduling of forest networks with switching penalties

Optimality of index policies for stochastic scheduling with switching penalties

Restless bandits: activity allocation in a changing world