Monotone Policies and Indexability for Bidirectional Restless Bandits

K. D. Glazebrook; D. J. Hodge; C. Kirkbride

doi:10.1239/aap/1363354103

Monotone Policies and Indexability for Bidirectional Restless Bandits

Part of: Hamilton-Jacobi theories, including dynamic programming Numerical methods in calculus of variations and optimal control

Published online by Cambridge University Press: 04 January 2016

K. D. Glazebrook ,

D. J. Hodge and

C. Kirkbride

Show author details

K. D. Glazebrook*: Affiliation:
Lancaster University
D. J. Hodge*: Affiliation:
The University of Nottingham
C. Kirkbride*: Affiliation:
Lancaster University
*: ∗ Postal address: Department of Management Science, Lancaster University, Lancaster, LA1 4YX, UK.
∗∗ Postal address: School of Mathematical Sciences, The University of Nottingham, Nottingham, NG7 2RD, UK.
∗ Postal address: Department of Management Science, Lancaster University, Lancaster, LA1 4YX, UK.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.

Keywords

asset management Gittins index indexability inventory management Lagrangian relaxation machine maintenance monotone policy stochastic dynamic programming restless bandit Whittle index

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 49L20: Dynamic programming method 90C39: Dynamic programming 49M20: Methods of relaxation type

Information

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 45 , Issue 1 , March 2013 , pp. 51 - 85

DOI: https://doi.org/10.1239/aap/1363354103 [Opens in a new window]
Copyright: © Applied Probability Trust

References

Ansell, P., Glazebrook, K. D., Niño-Mora, J. and O'Keeffe, M. (2003). Whittle's index policy for a multi-class queueing system with convex holding costs. Math. Meth. Operat. Res. 57, 21–39.Google Scholar

Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57, 314–326.CrossRef Google Scholar

Dacre, M., Glazebrook, K. and Niño-Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). J. R. Statist. Soc. B 61, 747–791.Google Scholar

Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B 41, 148–177.Google Scholar

Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar

Glazebrook, K. D. and Minty, R. (2009). A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Operat. Res. 34, 26–44.Google Scholar

Glazebrook, K. D., Kirkbride, C. and Ruiz-Hernandez, D. (2006). Spinning plates and squad systems: policies for bidirectional restless bandits. Adv. Appl. Prob. 38, 95–115.CrossRef Google Scholar

Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754–774.Google Scholar

Jacko, P. (2009). Marginal productivity index policies for dynamic priority allocation in restless bandit models. , Universidad Carlos III de Madrid.Google Scholar

Ny J., Le, Dahleh, M. and Feron, E. (2008). Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. In 2008 American Control Conference, pp. 4220–4225.Google Scholar

Liu, K. and Zhao, Q. (2010). Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inf. Theory 56, 5547–5567.CrossRef Google Scholar

Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Adv. Appl. Prob. 33, 76–98.CrossRef Google Scholar

Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198.CrossRef Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar

Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637–648.Google Scholar

Weber, R. R. and Weiss, G. (1991). Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Prob. 23, 429–430.CrossRef Google Scholar

Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25A), ed. Gani, J., Applied Probability Trust, Sheffield, pp. 287–298.Google Scholar

Whittle, P. (1996). Optimal Control. John Wiley, Chichester.Google Scholar

Article contents

Monotone Policies and Indexability for Bidirectional Restless Bandits

Abstract

Keywords

MSC classification

Information

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests