We consider the Bernoulli bandit problem where one of the arms has win probability α and the others β, with the identity of the α arm specified by initial probabilities. With u = max(α, β), v = min(α, β), call an arm with win probability u a good arm. Whereas it is known that the strategy of always playing the arm with the largest probability of being a good arm maximizes the expected number of wins in the first n games for all n, we conjecture that it also stochastically maximizes the number of wins. That is, we conjecture that this strategy maximizes the probability of at least k wins in the first n games for all k, n. The conjecture is proven when k = 1, and k = n, and when there are only two arms and k = n - 1.
Email your librarian or administrator to recommend adding this journal to your organisation's collection.