Hostname: page-component-848d4c4894-8bljj Total loading time: 0 Render date: 2024-06-17T08:43:22.776Z Has data issue: false hasContentIssue false

Bounds and good policies in stationary finite–stage Markovian decision problems

Published online by Cambridge University Press:  01 July 2016

Gerhard Hübner*
Affiliation:
University of Hamburg
*
Postal address: Institut für Mathematische Stochastik, Universität Hamburg, Bundesstrasse 55, 2000 Hamburg 13, West Germany.

Abstract

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1980 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Chow, Y. S., Robbins, H. and Siegmund, D. (1971) Great Expectations: The Theory of Optimal Stopping. Houghton Mifflin, Boston.Google Scholar
Hastings, N. A. J. (1978) Some notes on dynamic programming and replacement. Operat. Res. Quart. 19, 453464.Google Scholar
Hastings, N. A. J. (1971) Bounds on the gain of a Markov decision process. Operat. Res. 19, 240244.CrossRefGoogle Scholar
Hastings, N. A. J. (1976) A test for nonoptimal actions in undiscounted finite Markov decision chains. Management Sci. 23, 8792.CrossRefGoogle Scholar
Hastings, N. A. J. and Van Nunen, J. A. E. E. (1977) The action elimination algorithm for Markov decision processes. In Markov Decision Theory ed. Tijms, H. C. and Wessels, J., Mathematical Centre Tracts, Amsterdam 93, 161170.Google Scholar
Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976) Optimal plans for dynamic programming problems. Math. Operat. Res. 1, 390394.CrossRefGoogle Scholar
Hinderer, K. (1970) Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes in Operations Research and Math. Systems 33, Springer-Verlag, Berlin.CrossRefGoogle Scholar
Hinderer, K. (1971) Instationäre dynamische Optimierung bei schwachen Voraussetzungen über die Gewinnfunktionen. Abh. Math. Sem. Univ. Hamburg 36, 208223.CrossRefGoogle Scholar
Hinderer, K. (1975) Neuere Resultate in der stochastischen dynamischen Optimierung. Z. Angew. Math. Mech. 55, T16T26.Google Scholar
Hinderer, K. (1976) Estimates for finite-stage dynamic programs. J. Math. Anal. Appl. 55, 207238.CrossRefGoogle Scholar
Hinderer, K. (1979) On approximate solutions of finite-stage dynamic programs. In Dynamic Programming and its Applications, ed. Puterman, M. L., Academic Press, New York, 289317.Google Scholar
Hinderer, K. and Hübner, G. (1977) On exact and approximate solutions of unstructured finite-stage dynamic programs. In Markov Decision Theory, ed. Tijms, H. C. and Wessels, J., Mathematical Centre Tracts, Amsterdam 93, 5776.Google Scholar
Hinderer, K. and Hübner, G. (1978) An improvement of J. F. Shapiro's turnpike theorem for the horizon of finite stage discrete dynamic programs. Trans. 7th Prague Conf. 1974, Reidel, Dordrecht.CrossRefGoogle Scholar
Howard, R. A. (1960) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar
Hübner, G. (1975) Extrapolation und Ausschließung suboptimaler Aktionen in endlichstufigen stationären Markoffschen Entscheidungsmodellen. Habilitationsschrift, Univ. Hamburg.Google Scholar
Hübner, G. (1977) Contraction properties of Markov decision models with application to the elimination of non-optimal actions. In Dynamische Optimierung, Bonner Math. Schriften 98, 5765.Google Scholar
Hübner, G. (1978) Improved procedures for eliminating suboptimal actions in Markov programming by the use of contraction properties. Trans. 7th Prague Conf. 1974, Reidel, Dordrecht.Google Scholar
Hübner, G. (1979) Sequential similarity transformations for solving finite-stage sub-Markov decision problems. Operat. Res. Verfahren 33, 197207.Google Scholar
Kushner, H. J. and Kleinman, A. S. (1971) Accelerated procedures for the solution of discrete Markov control problems. IEEE Trans. Automatic Control AC-16, 147152.Google Scholar
Lippman, S. A. (1973) Semi-Markov decision processes with unbounded rewards. Management Sci. 19, 717731.CrossRefGoogle Scholar
MacQueen, J. (1966) A modified dynamic programming method for Markovian decision problems. J. Math. Anal. Appl. 14, 3843.CrossRefGoogle Scholar
MacQueen, J. (1967) A test for suboptimal actions in Markovian decision problems. Operat. Res. 15, 559561.CrossRefGoogle Scholar
Morton, Th. E. and Wecker, W. E. (1977) Discounting, ergodicity and convergence for Markov decision processes. Management Sci. 23, 890900.CrossRefGoogle Scholar
Van Nunen, J. A. E. E. (1976) Contracting Markov Decision Processes. Doctoral Dissertation. University of Technology, Eindhoven.Google Scholar
Pliska, S. R. (1976) Optimisation of multitype branching processes. Management Sci. 23, 117124.CrossRefGoogle Scholar
Porteus, E. L. (1971) Some bounds for discounted sequential decision processes. Management Sci. 18, 711.CrossRefGoogle Scholar
Porteus, E. L. (1975) Bounds and transformations for discounted finite Markov decision chains. Operat. Res. 23, 761784.CrossRefGoogle Scholar
Porteus, E. L. and Totten, J. C. (1978) Accelerated computation of the expected discounted return in a Markov chain. Operat Res. 26, 350358.CrossRefGoogle Scholar
Rieder, U. (1975) Bayesian dynamic programming. Adv. Appl. Prob. 7, 330348.CrossRefGoogle Scholar
Rieder, U. (1978) Measurable selection theorems for optimization problems. Manuscripta Math. 24, 115131.Google Scholar
Ross, S. M. (1968a) Non-discounted denumerable Markovian decision models. Ann. Math. Statist. 39, 412423.Google Scholar
Ross, S. M. (1968b) Arbitrary state Markovian decision processes. Ann. Math. Statist. 39, 21182122.Google Scholar
Schaefer, H. H. (1974) Banach Lattices and Positive Operators. Springer-Verlag, Berlin.CrossRefGoogle Scholar
Schäl, M. (1971) Ein verallgemeinertes stationäres Entscheidungsmodell der dynamischen Optimierung. Operat. Res. Verfahren X, 145162.Google Scholar
Schäl, M. (1974) A selection theorem for optimization problems. Arch. Math. 25, 219224.Google Scholar
Schellhaas, H. (1974) Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung. Z. Operat. Res. 18, 91104.Google Scholar
Schweitzer, P. J. (1971) Multiple policy improvements in undiscounted Markov renewal programming. Operat. Res. 19, 784793.CrossRefGoogle Scholar
Schweitzer, P. J. and Federgruen, A. (1977) The asymptotic behavior of undiscounted value iteration in Markov decision problems. Math. Operat. Res. 2, 360381.CrossRefGoogle Scholar
Shapiro, J. F. (1969) Turnpike planning horizons for a Markovian decision model. Management Sci, 14, 292300.CrossRefGoogle Scholar
Veinott, A. F. (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 16351660.CrossRefGoogle Scholar
Wessels, J. (1977) Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58, 326335.CrossRefGoogle Scholar
Wessels, J. (1978) Stopping times and Markov programming. Trans. 7th Prague Conference, Prague 1974, Reidel, Dordrecht.CrossRefGoogle Scholar