1.Altman E. & Stidham S., (1995). Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information source. Queueing Systems 21 (3-4): 267–291.
2.Aviv Y., Pazgal A. (2002). Pricing of short life-cycle products through active learning. Working paper, Washington University, St. Louis, MO.
3.Bernardo A., Chowdhry B. (2002). Resources, real options, and corporate strategy. Journal of Financial Economics 63: 211–234.
4.Bertsimas D. & Mersereau A.J. (2007). A learning approach for interactive marketing to a customer segment. Operations Reserach 55 (6): 1120–1135.
5.Brezzi M. & Lai T.L. (2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 27: 87–108.
6.Burnetas A.N. & Katehakis M.N. (2003). Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem. Probability in the Engineering and Informational Science 17:53–83.
7.Caro F. & Gallien J. (2007). Dynamic sssortment with demand learning for seasonal consumer goods. Management Science, 53 (2): 276–292.
8.DeGroot M.H. (1970). Optimal statistical decisions. New York: McGraw-Hill.
9.Ehsan N. & Liu M. (2004). On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services. In Proceedings INFOCOM 2004, Vol. 3, pp. 1974–1983.
10.Eick S.G. (1988). Gittins procedures for bandits with delayed responses. Journal of the Royal Statistics Society B 50 (1): 125–132.
11.Farias V. & Madan R. (2008). Irrevocable multi-armed bandit policies. Working paper. MIT Sloan School of Management.
12.Ginebra J. & Clayton M. K. (1995). Response surface bandits. Journal of the Royal Statistical Society B 57: 771–784.
13.Gittins J.C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society B 14: 148–167.
14.Gittins J.C. (1989). Multi-armed bandit allocation indices. Chichester, UK: John Wiley.
15.Hardwick J., Oehmke R., Stout Q.F. (2006). New adaptive designs for delayed response models. Journal Sequential Planning Inference 136: 1940–1955.
16.Kaplan R.S. (1970). A dynamic inventory model with stochastic lead times. Management Science 16 (7): 491–507.
17.Muharremoglu A. & Yang N. (2008). Inventory management with an exogenous supply process. Working Paper, Columbia University New York.
18.Niño-Mora J. (2006). Dynamic priority allocation via restless bandit marginal productivity indices. Top 15: 161–198.
19.Niño-Mora J. (2007). Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic with delayed state observation. In NGI 2007, Proceedings of the 3rd EuroNGI conference on next generation Internet networks: design and engineering for heterogeneity. Piscataway, NJ:IEEE, pp. 209–217.
20.Robinson L.W., Bradley J.R. & Thomas L.J. (2001). Consequences of order crossover under order-up-to inventory policies. Manufacturing & Service Operations Management 3 (3): 175–188.
21.Wang X. & Bickis M. (2003). One-armed bandit models with continuous and delayed responses. Mathematical Methods of Operations Research 58: 209–219.
22.Weber R.R. & Weiss G. (1990). On an index policy for restless bandits. Journal of Applied Probability 27: 637–648.
23.Weiss G. (1992). Turnpike optimality of Smith's rule in parallel machines stochastic scheduling. Mathematics of Operations Research 17 (2): 255–270.
24.Whittle P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability 25A: 287–298.