Skip to main content
×
Home
    • Aa
    • Aa

INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS

  • Felipe Caro (a1) and Onesun Steve Yoo (a1)
Abstract

This article considers an important class of discrete time restless bandits, given by the discounted multiarmed bandit problems with response delays. The delays in each period are independent random variables, in which the delayed responses do not cross over. For a bandit arm in this class, we use a coupling argument to show that in each state there is a unique subsidy that equates the pulling and nonpulling actions (i.e., the bandit satisfies the indexibility criterion introduced by Whittle (1988). The result allows for infinite or finite horizon and holds for arbitrary delay lengths and infinite state spaces. We compute the resulting marginal productivity indexes (MPI) for the Beta-Bernoulli Bayesian learning model, formulate and compute a tractable upper bound, and compare the suboptimality gap of the MPI policy to those of other heuristics derived from different closed-form indexes. The MPI policy performs near optimally and provides a theoretical justification for the use of the other heuristics.

Copyright
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

1.E. Altman & S. Stidham , (1995). Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information source. Queueing Systems 21 (3-4): 267291.

3.A. Bernardo , B. Chowdhry (2002). Resources, real options, and corporate strategy. Journal of Financial Economics 63: 211234.

4.D. Bertsimas & A.J. Mersereau (2007). A learning approach for interactive marketing to a customer segment. Operations Reserach 55 (6): 11201135.

5.M. Brezzi & T.L. Lai (2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27: 87108.

7.F. Caro & J. Gallien (2007). Dynamic sssortment with demand learning for seasonal consumer goods. Management Science, 53 (2): 276292.

9.N. Ehsan & M. Liu (2004). On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services. In Proceedings INFOCOM 2004, Vol. 3, pp. 19741983.

15.J. Hardwick , R. Oehmke , Q.F. Stout (2006). New adaptive designs for delayed response models. Journal Sequential Planning Inference 136: 19401955.

16.R.S. Kaplan (1970). A dynamic inventory model with stochastic lead times. Management Science 16 (7): 491507.

18.J. Niño-Mora (2006). Dynamic priority allocation via restless bandit marginal productivity indices. Top 15: 161198.

19.J. Niño-Mora (2007). Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic with delayed state observation. In NGI 2007, Proceedings of the 3rd EuroNGI conference on next generation Internet networks: design and engineering for heterogeneity. Piscataway, NJ:IEEE, pp. 209217.

20.L.W. Robinson , J.R. Bradley & L.J. Thomas (2001). Consequences of order crossover under order-up-to inventory policies. Manufacturing & Service Operations Management 3 (3): 175188.

21.X. Wang & M. Bickis (2003). One-armed bandit models with continuous and delayed responses. Mathematical Methods of Operations Research 58: 209219.

22.R.R. Weber & G. Weiss (1990). On an index policy for restless bandits. Journal of Applied Probability 27: 637648.

23.G. Weiss (1992). Turnpike optimality of Smith's rule in parallel machines stochastic scheduling. Mathematics of Operations Research 17 (2): 255270.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Probability in the Engineering and Informational Sciences
  • ISSN: 0269-9648
  • EISSN: 1469-8951
  • URL: /core/journals/probability-in-the-engineering-and-informational-sciences
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×