Bounds and good policies in stationary finite–stage Markovian decision problems

Gerhard Hübner

doi:10.2307/1426499

Bounds and good policies in stationary finite–stage Markovian decision problems

Published online by Cambridge University Press: 01 July 2016

Gerhard Hübner

Show author details

Gerhard Hübner*: Affiliation:
University of Hamburg
*: ∗Postal address: Institut für Mathematische Stochastik, Universität Hamburg, Bundesstrasse 55, 2000 Hamburg 13, West Germany.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.

Keywords

MARKOV DECISION PROCESSES SUCCESSIVE APPROXIMATION NEARLY OPTIMAL POLICIES GENERAL STATE AND ACTION SPACES

Information

Type: Research Article
Information: Advances in Applied Probability , Volume 12 , Issue 1 , March 1980 , pp. 154 - 173

DOI: https://doi.org/10.2307/1426499 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1980

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Chow, Y. S., Robbins, H. and Siegmund, D. (1971) Great Expectations: The Theory of Optimal Stopping. Houghton Mifflin, Boston.Google Scholar

Hastings, N. A. J. (1978) Some notes on dynamic programming and replacement. Operat. Res. Quart. 19, 453–464.Google Scholar

Hastings, N. A. J. (1971) Bounds on the gain of a Markov decision process. Operat. Res. 19, 240–244.CrossRef Google Scholar

Hastings, N. A. J. (1976) A test for nonoptimal actions in undiscounted finite Markov decision chains. Management Sci. 23, 87–92.CrossRef Google Scholar

Hastings, N. A. J. and Van Nunen, J. A. E. E. (1977) The action elimination algorithm for Markov decision processes. In Markov Decision Theory ed. Tijms, H. C. and Wessels, J., Mathematical Centre Tracts, Amsterdam 93, 161–170.Google Scholar

Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976) Optimal plans for dynamic programming problems. Math. Operat. Res. 1, 390–394.CrossRef Google Scholar

Hinderer, K. (1970) Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes in Operations Research and Math. Systems 33, Springer-Verlag, Berlin.CrossRef Google Scholar

Hinderer, K. (1971) Instationäre dynamische Optimierung bei schwachen Voraussetzungen über die Gewinnfunktionen. Abh. Math. Sem. Univ. Hamburg 36, 208–223.CrossRef Google Scholar

Hinderer, K. (1975) Neuere Resultate in der stochastischen dynamischen Optimierung. Z. Angew. Math. Mech. 55, T16–T26.Google Scholar

Hinderer, K. (1976) Estimates for finite-stage dynamic programs. J. Math. Anal. Appl. 55, 207–238.CrossRef Google Scholar

Hinderer, K. (1979) On approximate solutions of finite-stage dynamic programs. In Dynamic Programming and its Applications, ed. Puterman, M. L., Academic Press, New York, 289–317.Google Scholar

Hinderer, K. and Hübner, G. (1977) On exact and approximate solutions of unstructured finite-stage dynamic programs. In Markov Decision Theory, ed. Tijms, H. C. and Wessels, J., Mathematical Centre Tracts, Amsterdam 93, 57–76.Google Scholar

Hinderer, K. and Hübner, G. (1978) An improvement of J. F. Shapiro's turnpike theorem for the horizon of finite stage discrete dynamic programs. Trans. 7th Prague Conf. 1974, Reidel, Dordrecht.CrossRef Google Scholar

Howard, R. A. (1960) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar

Hübner, G. (1975) Extrapolation und Ausschließung suboptimaler Aktionen in endlichstufigen stationären Markoffschen Entscheidungsmodellen. Habilitationsschrift, Univ. Hamburg.Google Scholar

Hübner, G. (1977) Contraction properties of Markov decision models with application to the elimination of non-optimal actions. In Dynamische Optimierung, Bonner Math. Schriften 98, 57–65.Google Scholar

Hübner, G. (1978) Improved procedures for eliminating suboptimal actions in Markov programming by the use of contraction properties. Trans. 7th Prague Conf. 1974, Reidel, Dordrecht.Google Scholar

Hübner, G. (1979) Sequential similarity transformations for solving finite-stage sub-Markov decision problems. Operat. Res. Verfahren 33, 197–207.Google Scholar

Kushner, H. J. and Kleinman, A. S. (1971) Accelerated procedures for the solution of discrete Markov control problems. IEEE Trans. Automatic Control AC-16, 147–152.Google Scholar

Lippman, S. A. (1973) Semi-Markov decision processes with unbounded rewards. Management Sci. 19, 717–731.CrossRef Google Scholar

MacQueen, J. (1966) A modified dynamic programming method for Markovian decision problems. J. Math. Anal. Appl. 14, 38–43.CrossRef Google Scholar

MacQueen, J. (1967) A test for suboptimal actions in Markovian decision problems. Operat. Res. 15, 559–561.CrossRef Google Scholar

Morton, Th. E. and Wecker, W. E. (1977) Discounting, ergodicity and convergence for Markov decision processes. Management Sci. 23, 890–900.CrossRef Google Scholar

Van Nunen, J. A. E. E. (1976) Contracting Markov Decision Processes. Doctoral Dissertation. University of Technology, Eindhoven.Google Scholar

Pliska, S. R. (1976) Optimisation of multitype branching processes. Management Sci. 23, 117–124.CrossRef Google Scholar

Porteus, E. L. (1971) Some bounds for discounted sequential decision processes. Management Sci. 18, 7–11.CrossRef Google Scholar

Porteus, E. L. (1975) Bounds and transformations for discounted finite Markov decision chains. Operat. Res. 23, 761–784.CrossRef Google Scholar

Porteus, E. L. and Totten, J. C. (1978) Accelerated computation of the expected discounted return in a Markov chain. Operat Res. 26, 350–358.CrossRef Google Scholar

Rieder, U. (1975) Bayesian dynamic programming. Adv. Appl. Prob. 7, 330–348.CrossRef Google Scholar

Rieder, U. (1978) Measurable selection theorems for optimization problems. Manuscripta Math. 24, 115–131.Google Scholar

Ross, S. M. (1968a) Non-discounted denumerable Markovian decision models. Ann. Math. Statist. 39, 412–423.Google Scholar

Ross, S. M. (1968b) Arbitrary state Markovian decision processes. Ann. Math. Statist. 39, 2118–2122.Google Scholar

Schaefer, H. H. (1974) Banach Lattices and Positive Operators. Springer-Verlag, Berlin.CrossRef Google Scholar

Schäl, M. (1971) Ein verallgemeinertes stationäres Entscheidungsmodell der dynamischen Optimierung. Operat. Res. Verfahren X, 145–162.Google Scholar

Schäl, M. (1974) A selection theorem for optimization problems. Arch. Math. 25, 219–224.Google Scholar

Schellhaas, H. (1974) Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung. Z. Operat. Res. 18, 91–104.Google Scholar

Schweitzer, P. J. (1971) Multiple policy improvements in undiscounted Markov renewal programming. Operat. Res. 19, 784–793.CrossRef Google Scholar

Schweitzer, P. J. and Federgruen, A. (1977) The asymptotic behavior of undiscounted value iteration in Markov decision problems. Math. Operat. Res. 2, 360–381.CrossRef Google Scholar

Shapiro, J. F. (1969) Turnpike planning horizons for a Markovian decision model. Management Sci, 14, 292–300.CrossRef Google Scholar

Veinott, A. F. (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 1635–1660.CrossRef Google Scholar

Wessels, J. (1977) Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58, 326–335.CrossRef Google Scholar

Wessels, J. (1978) Stopping times and Markov programming. Trans. 7th Prague Conference, Prague 1974, Reidel, Dordrecht.CrossRef Google Scholar

Article contents

Bounds and good policies in stationary finite–stage Markovian decision problems

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests