Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-06T06:19:56.738Z Has data issue: false hasContentIssue false

ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT

Published online by Cambridge University Press:  05 October 2016

Apostolos Burnetas
Affiliation:
Department of Mathematics, University of Athens, Athens, Greece E-mail: aburnetas@math.uoa.gr
Odysseas Kanavetas
Affiliation:
Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey E-mail: okanavetas@sabanciuniv.edu
Michael N. Katehakis
Affiliation:
Department of Management Science and Information Systems, Rutgers University, NJ, USA E-mail: mnk@rutgers.edu

Abstract

We consider the multi-armed bandit problem under a cost constraint. Successive samples from each population are i.i.d. with unknown distribution and each sample incurs a known population-dependent cost. The objective is to design an adaptive sampling policy to maximize the expected sum of n samples such that the average cost does not exceed a given bound sample-path wise. We establish an asymptotic lower bound for the regret of feasible uniformly fast convergent policies, and construct a class of policies, which achieve the bound. We also provide their explicit form under Normal distributions with unknown means and known variances.

Information

Type
Research Article
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable