Hostname: page-component-89b8bd64d-r6c6k Total loading time: 0 Render date: 2026-05-06T09:21:32.572Z Has data issue: false hasContentIssue false

OPTIMAL MIXING OF MARKOV DECISION RULES FOR MDP CONTROL

Published online by Cambridge University Press:  17 May 2011

Dinard van der Laan
Affiliation:
Tinbergen Institute and Department of Econometrics and Operations Research, VU University, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands E-mail: dalaan@feweb.vu.nl

Abstract

In this article we study Markov decision process (MDP) problems with the restriction that at decision epochs, only a finite number of given Markov decision rules are admissible. For example, the set of admissible Markov decision rules could consist of some easy-implementable decision rules. Additionally, many open-loop control problems can be modeled as an MDP with such a restriction on the admissible decision rules. Within the class of available policies, optimal policies are generally nonstationary and it is difficult to prove that some policy is optimal. We give an example with two admissible decision rules—={d1, d2} —for which we conjecture that the nonstationary periodic Markov policy determined by its period cycle (d1, d1, d2, d1, d2, d1, d2, d1, d2) is optimal. This conjecture is supported by results that we obtain on the structure of optimal Markov policies in general. We also present some numerical results that give additional confirmation for the conjecture for the particular example we consider.

Information

Type
Research Article
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable