Hostname: page-component-6766d58669-nf276 Total loading time: 0 Render date: 2026-05-19T15:46:26.388Z Has data issue: false hasContentIssue false

Bounding the difference between the values of robust and non-robust Markov decision problems

Published online by Cambridge University Press:  11 November 2024

Ariel Neufeld*
Affiliation:
Nanyang Technological University Singapore
Julian Sester*
Affiliation:
National University of Singapore
*
*Postal address: Division of Mathematical Sciences, 21 Nanyang Link, Singapore 637371. Email: ariel.neufeld@ntu.edu.sg
**Postal address: Department of Mathematics, 21 Lower Kent Ridge Road, Singapore 119077. Email: jul_ses@nus.edu.sg
Rights & Permissions [Opens in a new window]

Abstract

In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein ball.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust
Figure 0

Figure 1. The difference between the non-robust and the robust value function compared with the upper bound from (3.2) in the setting described in Example 3.1 with $\varepsilon>0$ and for different initial values of the Markov decision process. Initial values larger than 5 are omitted due to the setting-specific symmetry $V(x_0)-V^{\mathrm{true}}(x_0) = V(10-x_0)-V^{\mathrm{true}}(10-x_0)$ for $x_0\in\{0,1,\dots,10\}$.