Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part I Stochastic Models and Bayesian Filtering
- Part II Partially Observed Markov Decision Processes: Models and Applications
- Part III Partially Observed Markov Decision Processes: Structural Results
- 9 Structural results for Markov decision processes
- 10 Structural results for optimal filters
- 11 Monotonicity of value function for POMDPs
- 12 Structural results for stopping time POMDPs
- 13 Stopping time POMDPs for quickest change detection
- 14 Myopic policy bounds for POMDPs and sensitivity to model parameters
- Part IV Stochastic Approximation and Reinforcement Learning
- Appendix A Short primer on stochastic simulation
- Appendix B Continuous-time HMM filters
- Appendix C Markov processes
- Appendix D Some limit theorems
- References
- Index
14 - Myopic policy bounds for POMDPs and sensitivity to model parameters
from Part III - Partially Observed Markov Decision Processes: Structural Results
Published online by Cambridge University Press: 05 April 2016
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part I Stochastic Models and Bayesian Filtering
- Part II Partially Observed Markov Decision Processes: Models and Applications
- Part III Partially Observed Markov Decision Processes: Structural Results
- 9 Structural results for Markov decision processes
- 10 Structural results for optimal filters
- 11 Monotonicity of value function for POMDPs
- 12 Structural results for stopping time POMDPs
- 13 Stopping time POMDPs for quickest change detection
- 14 Myopic policy bounds for POMDPs and sensitivity to model parameters
- Part IV Stochastic Approximation and Reinforcement Learning
- Appendix A Short primer on stochastic simulation
- Appendix B Continuous-time HMM filters
- Appendix C Markov processes
- Appendix D Some limit theorems
- References
- Index
Summary
Chapter 12 discussed stopping time POMDPs and gave sufficient conditions for the optimal policy to have a monotone structure. In this chapter we consider more general POMDPs (not necessarily with a stopping action) and present the following structural results:
Upper and lower myopic policy bounds using copositivity dominance: For general POMDPs it is difficult to provide sufficient conditions for monotone policies. Instead, we provide sufficient conditions so that the optimal policy can be upper and lower bounded by judiciously chosen myopic policies. These sufficient conditions involve the copositive ordering described in Chapter 10. The myopic policy bounds are constructed to maximize the volume of belief states where they coincide with the optimal policy. Numerical examples illustrate these myopic policies for continuous and discrete valued observations.
Lower myopic policy bounds using Blackwell dominance: Suppose the observation probabilities for actions 1 and 2 can be related via the following factorization: B(1) = B(2) R where R is a stochastic matrix. We then say that B(2) Blackwell dominates B(1). If this Blackwell dominance holds, we will show that a myopic policy coincides with the optimal policy for all belief states where choosing action 2 yields a smaller instantaneous cost than choosing action 1. Thus, the myopic policy forms a lower bound to the optimal policy. We provide two examples: scheduling an optimal filter versus an optimal predictor, and scheduling with ultrametric observation matrices.
Sensitivity to POMDP parameters: The final result considered in this chapter is: how does the optimal cumulative cost of POMDP depend on the transition and observation probabilities? We provide two sets of results: ordinal and cardinal. The ordinal results use the copositive ordering of transition matrices and Blackwell dominance of observation matrices that yield an ordering of the achievable optimal costs of a POMDP. The cardinal results determine explicit formulas for the sensitivity of the POMDP optimal costs and policy to small variations of the transition and observation probabilities.
The partially observed Markov decision process
Throughout this chapter we will consider discounted cost infinite horizon POMDPs discussed in §7.6. Let us briefly review this model.
- Type
- Chapter
- Information
- Partially Observed Markov Decision ProcessesFrom Filtering to Controlled Sensing, pp. 312 - 340Publisher: Cambridge University PressPrint publication year: 2016