Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Xianping Guo; Weiping Zhu

doi:10.1239/jap/1025131422

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Part of: Operations research and management science Special processes

Published online by Cambridge University Press: 14 July 2016

Xianping Guo and

Weiping Zhu

Show author details

Xianping Guo*: Affiliation:
Zhongshan University
Weiping Zhu*: Affiliation:
University of New South Wales
*: ∗ Postal address: Department of Mathematics, Zhongshan University, Guangzhou 510275, P. R. China.
∗∗ Postal address: School of Computer Science, ADFA, University of New South Wales, ACT 2600, Australia. Email address: weiping@cs.adfa.edu.au

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we consider denumerable-state continuous-time Markov decision processes with (possibly unbounded) transition and reward rates and general action space under the discounted criterion. We provide a set of conditions weaker than those previously known and then prove the existence of optimal stationary policies within the class of all possibly randomized Markov policies. Moreover, the results in this paper are illustrated by considering the birth-and-death processes with controlled immigration in which the conditions in this paper are satisfied, whereas the earlier conditions fail to hold.

Keywords

Continuous-time Markov decision process (possibly unbounded) transition and reward rates discounted criterion optimal stationary policy

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 90B50: Management decision making, including multiple objectives 60K99: None of the above, but in this section

Information

Type: Research Papers
Information: Journal of Applied Probability , Volume 39 , Issue 2 , June 2002 , pp. 233 - 250

DOI: https://doi.org/10.1239/jap/1025131422 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2002

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1] Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York.Google Scholar

[2] Artémiadis, N. K. (1976). Real Analysis. Southern Illinois University Press, Carbondale, IL.Google Scholar

[3] Bather, J. (1976). Optimal stationary policies for denumerable Markov chains in continuous time. Adv. Appl. Prob. 8, 144–158.CrossRef Google Scholar

[4] Bertsekas, D. P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar

[5] Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1996). Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence. J. Appl. Prob. 33, 986–1002.CrossRef Google Scholar

[6] Feller, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488–555.Google Scholar

[7] Guo, X. P., and Zhu, W. (2002). Optimality conditions for CTMDP with average cost criterion. In Markov Processes and Controlled Markov Chains, eds Hou, Z. T., Filar, J. A. and Chen, A. Y., Kluwer, Dordrecht.Google Scholar

[8] Guo, X. P., and Zhu, W. (2002). Denumerable state continuous time Markov decision processes with unbounded cost and transition rate under average criterion. Austral. N. Z. Indust. Appl. Math. J. 43, 541–557.Google Scholar

[9] Haviv, M., and Puterman, M. L. (1998). Bias optimality in controlled queueing systems. J. Appl. Prob. 35, 136–150.Google Scholar

[10] Hou, Z. T., and Guo, X. P. (1998). Markov Decision Processes. Science and Technology Press of Hunan, Changsha.Google Scholar

[11] Howard, R. A. (1960). Dynamic Programming and Markov Processes. John Wiley, New York.Google Scholar

[12] Kakumanu, P. (1971). Continuously discounted Markov decision model with countable state and action spaces. Ann. Math. Statist. 42, 919–926.Google Scholar

[13] Lembersky, M. R. (1974). On maximal rewards and ∊-optimal policies in continuous time Markov chains. Ann. Statist. 2, 159–169.Google Scholar

[14] Lippman, S. A. (1973). Semi-Markov decision processes with unbounded rewards. Manag. Sci. 19, 717–731.Google Scholar

[15] Lippman, S. A. (1975). Applying a new device in the optimization of exponential queueing system. Operat. Res. 23, 667–710.Google Scholar

[16] Lippman, S. A. (1975). On dynamic programming with unbounded rewards. Manag. Sci. 21, 1225–1233.Google Scholar

[17] Miller, R. L. (1968). Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 552–569.Google Scholar

[18] Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.Google Scholar

[19] Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar

[20] Serfozo, R. (1981). Optimal control of random walks, birth and death processes, and queues. Adv. Appl. Prob. 13, 61–83.Google Scholar

[21] Song, J. S. (1987). Continuous time Markov decision programming with nonuniformly bounded transition rate. Scientia Sinica 12, 1258–1267.Google Scholar

[22] Van Nunen, J. A. E. E., and Wessels, J. (1978). A note on dynamic programming with unbounded rewards. Manag. Sci. 24, 576–580.Google Scholar

[23] Wu, C. B. (1997). Continuous time Markov decision processes with unbounded reward and non-uniformly bounded transition rate under discounted criterion. Acta Math. Appl. Sinica 20, 196–208.Google Scholar

[24] Yushkevich, A. A. (1977). Controlled Markov models with countable state space and continuous time. Theory Prob. Appl. 22, 215–235.Google Scholar

[25] Yushkevich, A. A., and Feinberg, E. A. (1979). On homogeneous Markov model with continuous time and finite or countable state space. Theory Prob. Appl. 24, 156–161.Google Scholar

Article contents

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Abstract

Keywords

MSC classification

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests