Skip to main content Accessibility help
×
Home
Hostname: page-component-7ccbd9845f-vmftn Total loading time: 0.301 Render date: 2023-01-30T11:28:19.347Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS

Published online by Cambridge University Press:  23 April 2010

Felipe Caro
Affiliation:
UCLA Anderson School of Management Los Angeles, CA 90095, E-mail: fcaro@anderson.ucla.edu; onesun.yoo.2010@anderson.ucla.edu
Onesun Steve Yoo
Affiliation:
UCLA Anderson School of Management Los Angeles, CA 90095, E-mail: fcaro@anderson.ucla.edu; onesun.yoo.2010@anderson.ucla.edu

Abstract

This article considers an important class of discrete time restless bandits, given by the discounted multiarmed bandit problems with response delays. The delays in each period are independent random variables, in which the delayed responses do not cross over. For a bandit arm in this class, we use a coupling argument to show that in each state there is a unique subsidy that equates the pulling and nonpulling actions (i.e., the bandit satisfies the indexibility criterion introduced by Whittle (1988). The result allows for infinite or finite horizon and holds for arbitrary delay lengths and infinite state spaces. We compute the resulting marginal productivity indexes (MPI) for the Beta-Bernoulli Bayesian learning model, formulate and compute a tractable upper bound, and compare the suboptimality gap of the MPI policy to those of other heuristics derived from different closed-form indexes. The MPI policy performs near optimally and provides a theoretical justification for the use of the other heuristics.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Altman, E. & Stidham, S., (1995). Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information source. Queueing Systems 21 (3-4): 267291.CrossRefGoogle Scholar
2.Aviv, Y., Pazgal, A. (2002). Pricing of short life-cycle products through active learning. Working paper, Washington University, St. Louis, MO.Google Scholar
3.Bernardo, A., Chowdhry, B. (2002). Resources, real options, and corporate strategy. Journal of Financial Economics 63: 211234.CrossRefGoogle Scholar
4.Bertsimas, D. & Mersereau, A.J. (2007). A learning approach for interactive marketing to a customer segment. Operations Reserach 55 (6): 11201135.CrossRefGoogle Scholar
5.Brezzi, M. & Lai, T.L. (2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27: 87108.CrossRefGoogle Scholar
6.Burnetas, A.N. & Katehakis, M.N. (2003). Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem. Probability in the Engineering and Informational Science 17:5383.Google Scholar
7.Caro, F. & Gallien, J. (2007). Dynamic sssortment with demand learning for seasonal consumer goods. Management Science, 53 (2): 276292.CrossRefGoogle Scholar
8.DeGroot, M.H. (1970). Optimal statistical decisions. New York: McGraw-Hill.Google Scholar
9.Ehsan, N. & Liu, M. (2004). On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services. In Proceedings INFOCOM 2004, Vol. 3, pp. 19741983.CrossRefGoogle Scholar
10.Eick, S.G. (1988). Gittins procedures for bandits with delayed responses. Journal of the Royal Statistics Society B 50 (1): 125132.Google Scholar
11.Farias, V. & Madan, R. (2008). Irrevocable multi-armed bandit policies. Working paper. MIT Sloan School of Management.Google Scholar
12.Ginebra, J. & Clayton, M. K. (1995). Response surface bandits. Journal of the Royal Statistical Society B 57: 771784.Google Scholar
13.Gittins, J.C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society B 14: 148167.Google Scholar
14.Gittins, J.C. (1989). Multi-armed bandit allocation indices. Chichester, UK: John Wiley.Google Scholar
15.Hardwick, J., Oehmke, R., Stout, Q.F. (2006). New adaptive designs for delayed response models. Journal Sequential Planning Inference 136: 19401955.CrossRefGoogle Scholar
16.Kaplan, R.S. (1970). A dynamic inventory model with stochastic lead times. Management Science 16 (7): 491507.CrossRefGoogle Scholar
17.Muharremoglu, A. & Yang, N. (2008). Inventory management with an exogenous supply process. Working Paper, Columbia University New York.Google Scholar
18.Niño-Mora, J. (2006). Dynamic priority allocation via restless bandit marginal productivity indices. Top 15: 161198.CrossRefGoogle Scholar
19.Niño-Mora, J. (2007). Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic with delayed state observation. In NGI 2007, Proceedings of the 3rd EuroNGI conference on next generation Internet networks: design and engineering for heterogeneity. Piscataway, NJ:IEEE, pp. 209217.CrossRefGoogle Scholar
20.Robinson, L.W., Bradley, J.R. & Thomas, L.J. (2001). Consequences of order crossover under order-up-to inventory policies. Manufacturing & Service Operations Management 3 (3): 175188.CrossRefGoogle Scholar
21.Wang, X. & Bickis, M. (2003). One-armed bandit models with continuous and delayed responses. Mathematical Methods of Operations Research 58: 209219.CrossRefGoogle Scholar
22.Weber, R.R. & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability 27: 637648.CrossRefGoogle Scholar
23.Weiss, G. (1992). Turnpike optimality of Smith's rule in parallel machines stochastic scheduling. Mathematics of Operations Research 17 (2): 255270.CrossRefGoogle Scholar
24.Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability 25A: 287298.CrossRefGoogle Scholar
7
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS
Available formats
×

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS
Available formats
×

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *