The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms

A. Federgruen; H. C. Tijms

doi:10.2307/3213407

The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms

Published online by Cambridge University Press: 14 July 2016

A. Federgruen and

H. C. Tijms

Show author details

A. Federgruen: Affiliation:
Mathematisch Centrum, Amsterdam
H. C. Tijms: Affiliation:
Vrije Universiteit, Amsterdam

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper is concerned with the optimality equation for the average costs in a denumerable state semi-Markov decision model. It will be shown that under each of a number of recurrency conditions on the transition probability matrices associated with the stationary policies, the optimality equation has a bounded solution. This solution indeed yields a stationary policy which is optimal for a strong version of the average cost optimality criterion. Besides the existence of a bounded solution to the optimality equation, we will show that both the value-iteration method and the policy-iteration method can be used to determine such a solution. For the latter method we will prove that the average costs and the relative cost functions of the policies generated converge to a solution of the optimality equation.

Keywords

SEMI-MARKOV DECISION MODEL DENUMERABLE STATE SPACE AVERAGE COSTS OPTIMALITY EQUATION RECURRENCY CONDITIONS VALUE-ITERATION METHOD POLICY-ITERATION METHOD CONVERGENCE RESULTS

Information

Type: Research Papers
Information: Journal of Applied Probability , Volume 15 , Issue 2 , June 1978 , pp. 356 - 373

DOI: https://doi.org/10.2307/3213407 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1978

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1] Anthonisse, J. M. and Tijms, H. C. (1977) Exponential convergence of products of stochastic matrices. J. Math. Anal. Appl. 59, 360–364.Google Scholar

[2] De Leve, G., Federgruen, A. and Tijms, H. C. (1976) A general Markov decision method, I: model and method. Adv. Appl. Prob. 9, 296–315.Google Scholar

[3] De Leve, G., Federgruen, A. and Tijms, H. C. (1977) Generalized Markovian Decision Processes, Revisited. Mathematical Centre Tract, Mathematisch Centrum, Amsterdam. To appear.Google Scholar

[4] Derman, C. (1966) Denumerable state Markovian decision processes-average cost criterion. Ann. Math. Statist. 37, 1545–1553.Google Scholar

[5] Derman, C. and Veinott, A. Jr. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582–584.Google Scholar

[6] Doob, J. L. (1953) Stochastic Processes. Wiley, New York.Google Scholar

[7] Federgruen, A., Schweitzer, P. J. and Tijms, H. C. (1977) Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl. To appear.Google Scholar

[8] Flynn, J. (1977) Conditions for the equivalence of optimality criteria in dynamic programming. Ann. Statist. 41, 936–953.Google Scholar

[9] Hajnal, J. (1958) Weak ergodicity in non homogeneous Markov chains. Proc. Camb. Phil. Soc. 54, 233–246.Google Scholar

[10] Hastings, N. A. J. (1971) Bounds on the gain of Markov decision processes. Opns Res. 10, 240–243.Google Scholar

[11] Hordijk, A. (1974) Dynamic Programming and Potential Theory. Mathematical Centre Tract No. 51, Mathematisch Centrum, Amsterdam.Google Scholar

[12] Hordijk, A., Schweitzer, P. J. and Tijms, H. C. (1975) The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J. Appl. Prob. 12, 298–305.Google Scholar

[13] Hordijk, A. and Sladky, K. (1975) Sensitive optimality criteria in countable state dynamic programming. Maths Opns Res. 2, 1–13.Google Scholar

[14] Lippman, S. A. (1975) On dynamic programming with unbounded rewards. Management Sci. 21, 1225–1233.Google Scholar

[15] Maitra, A. (1968) Discounted dynamic programming on compact metric spaces. Sankhya A 30, 211–216.Google Scholar

[16] Robinson, D. R. (1976) Markov decision chains with unbounded costs and applications to the control of queues. Adv. Appl. Prob. 8, 159–176.Google Scholar

[17] Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar

[18] Royden, H. L. (1968) Real Analysis, 2nd edn., MacMillan, New York.Google Scholar

[19] Schweitzer, P. J. (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495–501.Google Scholar

[20] Tijms, H. C. (1975) On dynamic programming with arbitrary state space, compact action space and the average return as criterion. Report BW 55/75, Mathematisch Centrum, Amsterdam.Google Scholar

Article contents

The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests