Hostname: page-component-f7d5f74f5-qtg9w Total loading time: 0 Render date: 2023-10-02T18:44:18.082Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "coreDisableSocialShare": false, "coreDisableEcommerceForArticlePurchase": false, "coreDisableEcommerceForBookPurchase": false, "coreDisableEcommerceForElementPurchase": false, "coreUseNewShare": true, "useRatesEcommerce": true } hasContentIssue false

A class of bandit problems yielding myopic optimal strategies

Published online by Cambridge University Press:  14 July 2016

Jeffrey S. Banks
University of Rochester
Rangarajan K. Sundaram*
University of Rochester
Postal address for both authors: Department of Economics, Harkness Hall, University of Rochester, Rochester, NY 14627, USA.


We consider the class of bandit problems in which each of the n ≧ 2 independent arms generates rewards according to one of the same two reward distributions, and discounting is geometric over an infinite horizon. We show that the dynamic allocation index of Gittins and Jones (1974) in this context is strictly increasing in the probability that an arm is the better of the two distributions. It follows as an immediate consequence that myopic strategies are the uniquely optimal strategies in this class of bandit problems, regardless of the value of the discount parameter or the shape of the reward distributions. Some implications of this result for bandits with Bernoulli reward distributions are given.

Research Papers
Copyright © Applied Probability Trust 1992 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Financial support from the National Science Foundation and the Sloan Foundation to the first author is gratefully acknowledged.


Berry, D. and Fristedt, B. (1985) Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London.CrossRefGoogle Scholar
Feldman, D. (1962) Contributions to the two-armed bandit problem. Ann. Math. Statist. 33, 847856.CrossRefGoogle Scholar
Feller, W. (1968) An Introduction to Probability Theory and Its Applications, Vol. I. Wiley, New York.Google Scholar
Fristedt, B. and Berry, D. (1988) Optimality of myopic stopping times for geometric discounting. J. Appl. Prob. 25, 437443.CrossRefGoogle Scholar
Gittins, J. and Jones, D. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J. et al., pp. 241266. North-Holland, Amsterdam.Google Scholar
McLennan, A. (1988) Incomplete learning in a repeated statistical decision problem. Mimeo, University of Minnesota.Google Scholar
O'Flaherty, B. (1989) Some results on two-armed bandits when both projects vary. J. Appl. Prob. 26, 655658.CrossRefGoogle Scholar
Rodman, L. (1978) On the many-armed bandit problem. Ann. Prob. 6, 491498.CrossRefGoogle Scholar
Ross, S. (1983) Introduction to Dynamic Programming. Academic Press, New York.Google Scholar