Skip to main content Accessibility help
×
×
Home

Computable approximations for average Markov decision processes in continuous time

  • Jonatha Anselmi (a1), François Dufour (a2) and Tomás Prieto-Rumeau (a3)

Abstract

In this paper we study the numerical approximation of the optimal long-run average cost of a continuous-time Markov decision process, with Borel state and action spaces, and with bounded transition and reward rates. Our approach uses a suitable discretization of the state and action spaces to approximate the original control model. The approximation error for the optimal average reward is then bounded by a linear combination of coefficients related to the discretization of the state and action spaces, namely, the Wasserstein distance between an underlying probability measure μ and a measure with finite support, and the Hausdorff distance between the original and the discretized actions sets. When approximating μ with its empirical probability measure we obtain convergence in probability at an exponential rate. An application to a queueing system is presented.

Copyright

Corresponding author

* Postal address: INRIA Bordeaux Sud Ouest, Bureau B430, 200 av. de la Vieille Tour, 33405 Talence cedex, France. Email address: jonatha.anselmi@inria.fr
** Postal address: Institut de Mathématiques de Bordeaux, Université de Bordeaux, 351 cours de la Libération, F33405 Talence, France. Email address: francois.dufour@math.u-bordeaux.fr
*** Postal address: Statistics Department, UNED, calle Senda del Rey 9, 28040 Madrid, Spain. Email address: tprieto@ccia.uned.es

References

Hide All
[1]Anselmi, J., Dufour, F. and Prieto-Rumeau, T. (2016). Computable approximations for continuous-time Markov decision processes on Borel spaces based on empirical measures. J. Math. Anal. Appl. 443, 13231361.
[2]Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
[3]Boissard, E. (2011). Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Prob. 16, 22962333.
[4]Chang, H. S., Fu, M. C., Hu, J. and Marcus, S. I. (2007). Simulation-Based Algorithms for Markov Decision Processes. Springer, London.
[5]Dufour, F. and Prieto-Rumeau, T. (2012). Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 12541267.
[6]Dufour, F. and Prieto-Rumeau, T. (2013). Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optimization 51, 12981324.
[7]Dufour, F. and Prieto-Rumeau, T. (2014). Stochastic approximations of constrained discounted Markov decision processes. J. Math. Anal. Appl. 413, 856879.
[8]Dufour, F. and Prieto-Rumeau, T. (2015). Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87, 273307.
[9]Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756.
[10]Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953985.
[11]Guo, X. and Zhang, W. (2014). Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Europ. J. Operat. Res. 238, 486496.
[12]Hinderer, K. (2005). Lipschitz continuity of value functions in Markovian decision processes. Math. Meth. Operat. Res. 62, 322.
[13]Jacod, J. (1979). Calcul Stochastique et Problèmes de Martingales (Lecture Notes Math. 714). Springer, Berlin.
[14]Piunovskiy, A. and Zhang, Y. (2014). Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach. 4OR 12, 4975.
[15]Powell, W. B. (2007). Approximate Dynamic Programming. John Wiley, Hoboken, NJ.
[16]Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Prob. 49, 10721090.
[17]Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201207.
[18]Saldi, N., Linder, T. and Yüksel, S. (2015). Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans. Automatic Control 60, 553558.
[19]Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
[20]Van Roy, B. (2002). Neuro-dynamic programming: overview and recent trends. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), Kluwer, Boston, MA, pp. 431459.
[21]Zhang, Y. (2014). Average optimality for continuous-time Markov decision processes under weak continuity conditions. J. Appl. Prob. 51, 954970.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of Applied Probability
  • ISSN: 0021-9002
  • EISSN: 1475-6072
  • URL: /core/journals/journal-of-applied-probability
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

MSC classification

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed