Hostname: page-component-76fb5796d-wq484 Total loading time: 0 Render date: 2024-04-27T19:19:34.863Z Has data issue: false hasContentIssue false

SEMI-MARKOV DECISION PROCESSES

NONSTANDARD CRITERIA

Published online by Cambridge University Press:  22 October 2007

M. Baykal-Gürsoy
Affiliation:
Department of Industrial and Systems EngineeringRutgers University, Piscataway, NJ E-mail: gursoy@rci.rutgers.edu
K. Gürsoy
Affiliation:
Department of Management ScienceKean UniversityUnion, NJ

Abstract

Considered are semi-Markov decision processes (SMDPs) with finite state and action spaces. We study two criteria: the expected average reward per unit time subject to a sample path constraint on the average cost per unit time and the expected time-average variability. Under a certain condition, for communicating SMDPs, we construct (randomized) stationary policies that are ε-optimal for each criterion; the policy is optimal for the first criterion under the unichain assumption and the policy is optimal and pure for a specific variability function in the second criterion. For general multichain SMDPs, by using a state space decomposition approach, similar results are obtained.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Altman, E. (1993). Asymptotic properties of constrained Markov decision processes. Mathematical Methods of Operations Research 37: 151170.CrossRefGoogle Scholar
2.Bather, J. (1973). Optimal decision procedures in finite Markov chains. Part II: Communicating systems. Advances in Applied Probability 5: 521552.CrossRefGoogle Scholar
3.Baykal-Gürsoy, M. & Ross, K.W. (1992). Variability sensitive Markov decision processes. Mathematics of Operations Research 17: 558571.CrossRefGoogle Scholar
4.Beutler, F.J. & Ross, K.W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications 112: 236252.CrossRefGoogle Scholar
5.Beutler, F.J. & Ross, K.W. (1986). Time-average optimal constrained semi-Markov decision processes. Advances in Applied Probability 18: 341359.CrossRefGoogle Scholar
6.Beutler, F.J. & Ross, K.W. (1987). Uniformization for semi-Markov decision processes under stationary policies. Advances in Applied Probability 24: 644656.CrossRefGoogle Scholar
7.Bouakiz, M.A. & Sobel, M.J. (1985). Nonstationary policies are optimal for risk-sensitive Markov decision processes. Technical Report, Georgia Institute of Technology.Google Scholar
8.Charnes, A. & Cooper, W.W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly 9: 181186.CrossRefGoogle Scholar
9.Çinlar, E. (1975). Introduction to stochastic processes. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
10.Denardo, E.V. (1971). Markov renewal programs with small interest rate. Annals of Mathematical Statistics 42: 477496.CrossRefGoogle Scholar
11.Denardo, E.V. & Fox, B.L. (1968). Multichain Markov renewal programs. SIAM Journal of Applied Mathematics 16: 468487.CrossRefGoogle Scholar
12.Derman, C. (1962). On sequential decisions and Markov chains. Management Science 9: 1624.CrossRefGoogle Scholar
13.Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.Google Scholar
14.Derman, C. & Veinott, A.F. Jr. (1972). Constrained Markov decision chains. Management Science 19: 389390.CrossRefGoogle Scholar
15.Federgruen, A., Hordijk, A., & Tijms, H.C. (1979). Denumerable state semi-Markov decison processes with unbounded costs, average cost criterion. Stochastic Processes and Applications 9: 223235.CrossRefGoogle Scholar
16.Federgruen, A. & Tijms, H.C. (1978). The optimality equation in average cost denumerable state semi-Markov decison problems, recurrency conditions and algorithms. Journal of Applied Probability 15: 356373.CrossRefGoogle Scholar
17.Federgruen, A., Schweitzer, P.J., & Tijms, H.C. (1983). Denumerable undiscounted semi-Markov decision processes with unbounded rewards. Mathematics of Operations Research 8(2): 298313.CrossRefGoogle Scholar
18.Feinberg, E.A. (1994). Constrained semi-Markov decision processes with average rewards. Mathematical Methods of Operations Research 39: 257288.CrossRefGoogle Scholar
19.Filar, J.A., Kallenberg, L.C.M., & Lee, H.M. (1989). Variance penalized Markov decision processes. Mathematics of Operations Research 14: 147161.CrossRefGoogle Scholar
20.Fox, B. (1966). Markov renewal programming by linear fractional programming. SIAM Journal of Applied Mathematics 16: 14181432.CrossRefGoogle Scholar
21.Heyman, D.P. & Sobel, M.J. (1983). Stochastic models in operations research. Vol. II: Stochastic optimization. New York: McGraw-Hill.Google Scholar
22.Jewell, W.S. (1963). Markov renewal programming I: Formulation, finite return models. Journal of Operations Research 11: 938948.CrossRefGoogle Scholar
23.Jewell, W.S. (1963). Markov renewal programming II: Inifinite return models, example. Operations Research 11: 949971.CrossRefGoogle Scholar
24.Jianyong, L. & Xiaobo, Z. (2004). On average reward semi-Markov decision processes with a general multichain structure. Mathematics of Operations Research 29(2): 339352.CrossRefGoogle Scholar
25.Kallenberg, L.C.M. (1983). Linear programming and finite Markovian control problems. Mathematical Centre Tracts Vol. 146. Amsterdam: Elsevier.Google Scholar
26.Loeve, M. (1978). Probability theory, Vol. 2. New York: Springer-Verlag.Google Scholar
27.Lippman, S.A. (1971). Maximal average reward policies for semi-Markov renewal processes with arbitrary state and action spaces. Annals of Mathematical Statistics 42: 17171726.CrossRefGoogle Scholar
28.Mine, H. & Osaki, S. (1970). Markovian decision processes. New York: Elsevier.Google Scholar
29.Puterman, M.L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.CrossRefGoogle Scholar
30.Ross, S.M. (1970). Average cost semi-Markov processes. Journal of Applied Probability 7: 649656.CrossRefGoogle Scholar
31.Ross, S. (1971). Applied probability models with optimization applications. San Francisco: Holden-Day.Google Scholar
32.Ross, K.W. & Varadarajan, R. (1989). Markov decision processes with sample path constraints: The communicating case. Operations Research 37: 780790.CrossRefGoogle Scholar
33.Ross, K.W. & Varadarajan, R. (1991). Multichain Markov decision processes with a sample path constraint: A decomposition approach. Mathematics of Operations Research 16: 195207.CrossRefGoogle Scholar
34.Schäl, M. (1992). On the second optimality equation for semi-Markov decision models. Mathematics of Operations Research 17(2): 470486.CrossRefGoogle Scholar
35.Schweitzer, P.J. & Federgruen, A.F. (1978). The functional equations of undiscounted Markov renewal programming. Mathematics of Operations Research 3: 308321.CrossRefGoogle Scholar
36.Sennott, L.I. (1989). Average cost semi-Markov decision processes and the control of queueing systems. Probability in the Engineering and Informational Sciences 3: 247272.CrossRefGoogle Scholar
37.Sennott, L.I. (1993). Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7: 6983.CrossRefGoogle Scholar
38.Sobel, M.J. (1994). Mean variance tradeoffs in an undiscounted MDP. Operations Research 42(1): 175183.CrossRefGoogle Scholar
39.Yushkevich, A.A. (1981). On semi-Markov controlled models with an average reward criterion. Theory of Probability and Its Applications 26: 796802.CrossRefGoogle Scholar