Hostname: page-component-848d4c4894-wg55d Total loading time: 0 Render date: 2024-05-21T14:54:55.077Z Has data issue: false hasContentIssue false

Absorbing Continuous-Time Markov Decision Processes with Total Cost Criteria

Published online by Cambridge University Press:  22 February 2016

Xianping Guo*
Affiliation:
Sun Yat-Sen University
Mantas Vykertas*
Affiliation:
Open University
Yi Zhang*
Affiliation:
University of Liverpool
*
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P. R. China. Email address: mcsgxp@mail.sysu.edu.cn
∗∗ Postal address: Department of Mathematics and Statistics, Open University, Milton Keynes, MK7 6AA, UK. Email address: mantas.vykertas@gmail.com
∗∗∗ Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, UK. Email address: yi.zhang@liv.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we study absorbing continuous-time Markov decision processes in Polish state spaces with unbounded transition and cost rates, and history-dependent policies. The performance measure is the expected total undiscounted costs. For the unconstrained problem, we show the existence of a deterministic stationary optimal policy, whereas, for the constrained problems with N constraints, we show the existence of a mixed stationary optimal policy, where the mixture is over no more than N+1 deterministic stationary policies. Furthermore, the strong duality result is obtained for the associated linear programs.

Type
General Applied Probability
Copyright
© Applied Probability Trust 

References

Aliprantis, C. and Border, K. (2007). Infinite Dimensional Analysis. Springer, New York.Google Scholar
Altman, E. (1999). Constrained Markov Decision Processes. Chapman and Hall/CRC, Boca Raton.Google Scholar
Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control. Academic Press, New York.Google Scholar
Bertsekas, D., Nedı´c, A. and Ozdaglar, A. (2003). Convex Analysis and Optimization. Athena Scientific, Belmont, MA.Google Scholar
Bogachev, V. I. (2007). Measure Theory, Vol. I. Springer, Berlin.CrossRefGoogle Scholar
Bogachev, V. I. (2007). Measure Theory, Vol. II. Springer, Berlin.CrossRefGoogle Scholar
Clancy, D. and Piunovskiy, A. B. (2005). An explicit optimal isolation policy for a determinisitc epidemic model. Appl. Math. Comput. 163, 11091121.Google Scholar
Dubins, L. E. (1962). On extreme points of convex sets. J. Math. Anal. Appl. 5, 237244.CrossRefGoogle Scholar
Feinberg, E. A. and Fei, J. (2009). An inequality for variances of the discounted rewards. J. Appl. Prob. 46, 12091212.CrossRefGoogle Scholar
Feinberg, E. A. and Rothblum, U. G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Operat. Res. 37, 129153.CrossRefGoogle Scholar
Gleissner, W. (1988). The spread of epidemics. Appl. Math. Comput. 27, 167171.Google Scholar
Guo, X. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automatic Control 52, 11391143.CrossRefGoogle Scholar
Guo, X. and Hernández-Lerma, O. (2009). Continuous-time Markov Decision Processes. Springer, Berlin.CrossRefGoogle Scholar
Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756.CrossRefGoogle Scholar
Guo, X. and Song, X. (2011). Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Prob. 21, 20162049.CrossRefGoogle Scholar
Guo, X. and Zhang, L. (2011). Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. J. Systems Sci. Complex. 24, 491505.CrossRefGoogle Scholar
Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50, 2347.CrossRefGoogle Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-time Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13, 137146.CrossRefGoogle Scholar
Himmelberg, C. J. (1975). Measurable relations. Fund. Math. 87, 5372.CrossRefGoogle Scholar
Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976). Optimal plans for dynamic programming problems. Math. Operat. Res. 1, 390394.CrossRefGoogle Scholar
Jacod, J. (1975). Multivariate point processes: predictable projection, Radon-Nykodym derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235253.CrossRefGoogle Scholar
Kitaev, M. (1986). Semi-Markov and Jump Markov controlled models: average cost criterion. Theory. Prob. Appl. 30, 272288.CrossRefGoogle Scholar
Kitaev, M. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.Google Scholar
Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Kluwer, Dordrecht.CrossRefGoogle Scholar
Piunovskiy, A. B. (1998). A controlled Jump discounted model with constraints. Theory Prob. Appl. 42, 5171.CrossRefGoogle Scholar
Piunovskiy, A. B. (2004). Optimal interventions in countable Jump Markov processes. Math. Operat. Res. 29, 289308.CrossRefGoogle Scholar
Piunovskiy, A. and Zhang, Y. (2011). Accuracy of fluid approximation to controlled birth-and-death processes: absorbing case. Math. Meth. Operat. Res. 73, 159187.CrossRefGoogle Scholar
Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the dynamic programming approach. Preprint. Available at http://arxiv.org/abs/1103.0134v1.Google Scholar
Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 20322061.CrossRefGoogle Scholar
Piunovskiy, A. and Zhang, Y. (2012). The transformation method for continuous-time Markov decision processes. J. Optimization Theory Appl. 154, 691712.CrossRefGoogle Scholar
Pliska, S. R. (1975). Controlled Jump processes. Stoch. Process Appl. 3, 259282.CrossRefGoogle Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2008). Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optimization 47, 18881908.CrossRefGoogle Scholar
Rockafellar, R. T. (1974). Conjugate Duality and Optimization. SIAM, Philadelphia, PA.CrossRefGoogle Scholar
Varadarajan, V. S. (1958). Weak convergence of measures on separable metric spaces. Sankhyā 19, 1522.Google Scholar
Yeh, J. (2006). Real analysis: Theory of Measure and Integration, 2nd edn. World Scientific, Hackensack, NJ.CrossRefGoogle Scholar
Zhang, Y. (2011). Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factor. TOP, 31 pp.Google Scholar
Zhu, Q. (2008). Average optimality for continuous-time Jump Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339, 691704.CrossRefGoogle Scholar
Zhu, Q. and Prieto-Rumeau, T. (2008). Bias and overtaking optimality for continuous-time Jump Markov decision processes in Polish spaces. J. Appl. Prob. 45, 417429.CrossRefGoogle Scholar