SEMI-MARKOV DECISION PROCESSES: NONSTANDARD CRITERIA

M. Baykal-Gürsoy; K. Gürsoy

doi:10.1017/S026996480700037X

SEMI-MARKOV DECISION PROCESSES

NONSTANDARD CRITERIA

Published online by Cambridge University Press: 22 October 2007

M. Baykal-Gürsoy and

K. Gürsoy

Show author details

M. Baykal-Gürsoy: Affiliation:
Department of Industrial and Systems EngineeringRutgers University, Piscataway, NJ E-mail: gursoy@rci.rutgers.edu
K. Gürsoy: Affiliation:
Department of Management ScienceKean UniversityUnion, NJ

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Considered are semi-Markov decision processes (SMDPs) with finite state and action spaces. We study two criteria: the expected average reward per unit time subject to a sample path constraint on the average cost per unit time and the expected time-average variability. Under a certain condition, for communicating SMDPs, we construct (randomized) stationary policies that are ε-optimal for each criterion; the policy is optimal for the first criterion under the unichain assumption and the policy is optimal and pure for a specific variability function in the second criterion. For general multichain SMDPs, by using a state space decomposition approach, similar results are obtained.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 21 , Issue 4 , October 2007 , pp. 635 - 657

DOI: https://doi.org/10.1017/S026996480700037X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Altman, E. (1993). Asymptotic properties of constrained Markov decision processes. Mathematical Methods of Operations Research 37: 151–170.CrossRef Google Scholar

2.Bather, J. (1973). Optimal decision procedures in finite Markov chains. Part II: Communicating systems. Advances in Applied Probability 5: 521–552.CrossRef Google Scholar

3.Baykal-Gürsoy, M. & Ross, K.W. (1992). Variability sensitive Markov decision processes. Mathematics of Operations Research 17: 558–571.CrossRef Google Scholar

4.Beutler, F.J. & Ross, K.W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications 112: 236–252.CrossRef Google Scholar

5.Beutler, F.J. & Ross, K.W. (1986). Time-average optimal constrained semi-Markov decision processes. Advances in Applied Probability 18: 341–359.CrossRef Google Scholar

6.Beutler, F.J. & Ross, K.W. (1987). Uniformization for semi-Markov decision processes under stationary policies. Advances in Applied Probability 24: 644–656.CrossRef Google Scholar

7.Bouakiz, M.A. & Sobel, M.J. (1985). Nonstationary policies are optimal for risk-sensitive Markov decision processes. Technical Report, Georgia Institute of Technology.Google Scholar

8.Charnes, A. & Cooper, W.W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly 9: 181–186.CrossRef Google Scholar

9.Çinlar, E. (1975). Introduction to stochastic processes. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar

10.Denardo, E.V. (1971). Markov renewal programs with small interest rate. Annals of Mathematical Statistics 42: 477–496.CrossRef Google Scholar

11.Denardo, E.V. & Fox, B.L. (1968). Multichain Markov renewal programs. SIAM Journal of Applied Mathematics 16: 468–487.CrossRef Google Scholar

12.Derman, C. (1962). On sequential decisions and Markov chains. Management Science 9: 16–24.CrossRef Google Scholar

13.Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.Google Scholar

14.Derman, C. & Veinott, A.F. Jr. (1972). Constrained Markov decision chains. Management Science 19: 389–390.CrossRef Google Scholar

15.Federgruen, A., Hordijk, A., & Tijms, H.C. (1979). Denumerable state semi-Markov decison processes with unbounded costs, average cost criterion. Stochastic Processes and Applications 9: 223–235.CrossRef Google Scholar

16.Federgruen, A. & Tijms, H.C. (1978). The optimality equation in average cost denumerable state semi-Markov decison problems, recurrency conditions and algorithms. Journal of Applied Probability 15: 356–373.CrossRef Google Scholar

17.Federgruen, A., Schweitzer, P.J., & Tijms, H.C. (1983). Denumerable undiscounted semi-Markov decision processes with unbounded rewards. Mathematics of Operations Research 8(2): 298–313.CrossRef Google Scholar

18.Feinberg, E.A. (1994). Constrained semi-Markov decision processes with average rewards. Mathematical Methods of Operations Research 39: 257–288.CrossRef Google Scholar

19.Filar, J.A., Kallenberg, L.C.M., & Lee, H.M. (1989). Variance penalized Markov decision processes. Mathematics of Operations Research 14: 147–161.CrossRef Google Scholar

20.Fox, B. (1966). Markov renewal programming by linear fractional programming. SIAM Journal of Applied Mathematics 16: 1418–1432.CrossRef Google Scholar

21.Heyman, D.P. & Sobel, M.J. (1983). Stochastic models in operations research. Vol. II: Stochastic optimization. New York: McGraw-Hill.Google Scholar

22.Jewell, W.S. (1963). Markov renewal programming I: Formulation, finite return models. Journal of Operations Research 11: 938–948.CrossRef Google Scholar

23.Jewell, W.S. (1963). Markov renewal programming II: Inifinite return models, example. Operations Research 11: 949–971.CrossRef Google Scholar

24.Jianyong, L. & Xiaobo, Z. (2004). On average reward semi-Markov decision processes with a general multichain structure. Mathematics of Operations Research 29(2): 339–352.CrossRef Google Scholar

25.Kallenberg, L.C.M. (1983). Linear programming and finite Markovian control problems. Mathematical Centre Tracts Vol. 146. Amsterdam: Elsevier.Google Scholar

26.Loeve, M. (1978). Probability theory, Vol. 2. New York: Springer-Verlag.Google Scholar

27.Lippman, S.A. (1971). Maximal average reward policies for semi-Markov renewal processes with arbitrary state and action spaces. Annals of Mathematical Statistics 42: 1717–1726.CrossRef Google Scholar

28.Mine, H. & Osaki, S. (1970). Markovian decision processes. New York: Elsevier.Google Scholar

29.Puterman, M.L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.CrossRef Google Scholar

30.Ross, S.M. (1970). Average cost semi-Markov processes. Journal of Applied Probability 7: 649–656.CrossRef Google Scholar

31.Ross, S. (1971). Applied probability models with optimization applications. San Francisco: Holden-Day.Google Scholar

32.Ross, K.W. & Varadarajan, R. (1989). Markov decision processes with sample path constraints: The communicating case. Operations Research 37: 780–790.CrossRef Google Scholar

33.Ross, K.W. & Varadarajan, R. (1991). Multichain Markov decision processes with a sample path constraint: A decomposition approach. Mathematics of Operations Research 16: 195–207.CrossRef Google Scholar

34.Schäl, M. (1992). On the second optimality equation for semi-Markov decision models. Mathematics of Operations Research 17(2): 470–486.CrossRef Google Scholar

35.Schweitzer, P.J. & Federgruen, A.F. (1978). The functional equations of undiscounted Markov renewal programming. Mathematics of Operations Research 3: 308–321.CrossRef Google Scholar

36.Sennott, L.I. (1989). Average cost semi-Markov decision processes and the control of queueing systems. Probability in the Engineering and Informational Sciences 3: 247–272.CrossRef Google Scholar

37.Sennott, L.I. (1993). Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7: 69–83.CrossRef Google Scholar

38.Sobel, M.J. (1994). Mean variance tradeoffs in an undiscounted MDP. Operations Research 42(1): 175–183.CrossRef Google Scholar

39.Yushkevich, A.A. (1981). On semi-Markov controlled models with an average reward criterion. Theory of Probability and Its Applications 26: 796–802.CrossRef Google Scholar

Article contents

SEMI-MARKOV DECISION PROCESSES

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests