Markov Branching Decision Chains with Interest-Rate-Dependent Rewards*

Ying Huang; Arthur F. Veinott Jr

doi:10.1017/S0269964800003715

Markov Branching Decision Chains with Interest-Rate-Dependent Rewards*

Published online by Cambridge University Press: 27 July 2009

Ying Huang and

Arthur F. Veinott Jr

Show author details

Ying Huang: Affiliation:
Philips Laboratories, 345 Scarborough Road, Briarcliff Manor, New York 10510
Arthur F. Veinott Jr: Affiliation:
Department of Operations Research, Terman Engineering Center, Stanford University, Stanford, California 94305

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Finite-state-and-action Markov branching decision chains are studied with bounded endogenous expected population sizes and interest-rate-dependent one-period rewards that are analytic in the interest rate at zero. The existence of a stationary strong-maximum-present-value policy is established. Miller and Veinott's [1969] strong policy-improvement method is generalized to find in finite time a stationary n-present-value optimal policy and, when the one-period rewards are rational in the interest rate, a stationary strong-maximum-present-value policy. This extends previous studies of Blackwell [1962], Miller and Veinott [1969], Veinott [1974], and Rothblum [1974, 1975], in which the one-period rewards are independent of the interest rate, and Denardo [1971] in which semi-Markov decision chains with small interest rates are studied. The problem of finding a stationary n-present-value optimal policy is also formulated as a staircase linear program in which the objective function and right-hand sides, but not the constraint matrix, depend on the interest rate, and solutions for all small enough positive interest rates are sought. The optimal solutions of the primal and dual are polynomials in the reciprocal of the interest rate. A constructive rule is given for finding a stationary n-present-value optimal policy from an optimal solution of the asymptotic linear program. This generalizes the linear programming approaches for finding maximum-reward-rate and maximum-present-value policies for Markov decision chains studied by Manne [1960], d'Epenoux [1960, 1963], Balinski [1961], Derman [1962], Denardo and Fox [1968], Denardo [1970], Derman and Veinott [1972], Veinott [1973], and Hordijk and Kallenberg [1979, 1984].

Information

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 9 , Issue 1 , January 1995 , pp. 99 - 121

DOI: https://doi.org/10.1017/S0269964800003715 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

1.Balinski, M.L. (1961). On solving discrete stochastic decision problems. Navy Supply System Research Study 2, Mathematica, Princeton.Google Scholar

2.Blackwell, D. (1962). Dynamic programming. Annals of Mathematical Statistics 33: 719–726.CrossRef Google Scholar

3.Cantaluppi, L.J. (1981). Semi-Markov decision chains with holding-time-dependent policies. Ph.D. dissertation, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar

4.Cantaluppi, L.J. (1984). Computation of optimal policies in discounted semi-Markov decision chains. OR Spektrum 6(3): 147–160.CrossRef Google Scholar

5.Cantaluppi, L.J. (1984). Optimality of piecewise-constant policies in semi-Markov decision chains. SIAM Journal on Control and Optimization 22(5): 723–739.CrossRef Google Scholar

6.Dantzig, G.B. (1955). Optimal solution of dynamic Leontief model with substitution. Econometrica 23(3): 295–302.CrossRef Google Scholar

7.DeCani, J.S. (1963). “On the Transient and Asymptotic Behavior of a Markov Chains ‘Embedded’ in Continuous Time,” and “A Dynamic Programming Algorithm for Embedded Markov Chains,” mimeographed. Aeronautical Computer Laboratory, U.S. Naval Air Development Center, Johnsville, PA.Google Scholar

8.Denardo, E.V. (1970). Computing a bias-optimal policy in a discrete-time Markov decision problem. Operations Research 18(2): 279–288.CrossRef Google Scholar

9.Denardo, E.V. (1971). Markov renewal programs with small interest rates. Annals of Mathematical Statistics 42: 477–496.CrossRef Google Scholar

10.Denardo, E.V. & Fox, B.L. (1968). Multichain Markov renewal programs. SIAM Journal on Applied Mathematics 16(3): 468–487.CrossRef Google Scholar

11.d'Epenoux, F. (1960). Sur un problème de production et de stockage dans l'Alèatoire. Revue Francaise de Recherche Operationelle 14: 3–16.Google Scholar

12.d'Epenoux, F. (1963). A probabilistic production and inventory problem. (Partly redrafted from a translation of D'Epenoux [12].). Management Science 10: 98–108.CrossRef Google Scholar

13.Derman, C. (1962). On sequential decisions and Markov chains. Management Science 9: 16–24.CrossRef Google Scholar

14.Derman, C. & Veinott, A.F. Jr., (1972). Constrained Markov decision chains. Management Science 19(4): 389–390.CrossRef Google Scholar

15.Fox, B. (1966). Markov renewal programming by linear fractional programming. SIAM Journal on Applied Mathematics 14(6): 1418–1432.CrossRef Google Scholar

16.Hordijk, A. & Kallenberg, L.C.M. (1979). Linear programming and Markov decision chains. Management Science 25(4): 352–362.CrossRef Google Scholar

17.Hordijk, A. & Kallenberg, L.C.M. (1984). Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9(2): 276–289.CrossRef Google Scholar

18.Howard, R.A. (1960). Dynamic programming and Markov processes. Cambridge, MA: MIT Press.Google Scholar

19.Howard, R.A. (1963). Semi-Markov decision processes. Bulletin of the International Statistical Institute 40:625–652.Google Scholar

20.Jeroslow, R.G. (1973). Asymptotic linear programming. Operations Research 9(2): 276–289.Google Scholar

21.Jewell, W.S. (1963). Markov-renewal programming. I: Formulation, finite return models. Operations Research 11: 938–971.CrossRef Google Scholar

22.Lamond, B.F. (1989). A generalized inverse method for asymptotic linear programming. Mathematical Programming 23: 71–86.CrossRef Google Scholar

23.Manne, A.S. (1960). Linear programming and sequential decisions. Management Science 21: 1128–1141.Google Scholar

24.Miller, B.L. & Veinott, A.F. Jr., (1969). Discrete dynamic programming with a small interest rate. Annals of Mathematical Statistics 40: 366–370.CrossRef Google Scholar

25.Rothblum, U.G. (1974). Multiplicative Markov decision chains. Ph.D. dissertation, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar

26.Rothblum, U.G. (1975). Normalized Markov decision chains I; sensitive discount optimality. Operations Research 23 (4): 785–795.CrossRef Google Scholar

27.Rothblum, U.G. & Veinott, A.F. Jr., (1975). Cumulative average optimality for normalized Markov decision chains. Unpublished manuscript.Google Scholar

28.Rothblum, U.G. & Veinott, A.F. Jr., (1994). Markov population decision chains. Unpublished manuscript.Google Scholar

29.Shapley, L.S. (1953). Stochastic games. Proceedings of the National Academy of Sciences U.S.A. 39: 1095–1100.CrossRef Google Scholar PubMed

30.Veinott, A.F. Jr., (1968). Extreme points of Leontief substitution systems. Linear Algebra Applications 1: 181–194.CrossRef Google Scholar

31.Veinott, A.F. Jr., (1969). Discrete dynamic programming with sensitive discount optimality criteria. Annals of Mathematical Statistics 40: 1635–1660.CrossRef Google Scholar

32.Veinott, A.F. Jr., (1973). Linear programming formulation for average reward Markov decision chains and state-action frequencies. Unpublished manuscript.Google Scholar

33.Veinott, A.F. Jr., (1974). Markov decision chains. In G.B. Dantzig & B.C. Eaves (eds.), Studies in optimization. 10: 124–159. The Mathematical Association of America.Google Scholar

Article contents

Markov Branching Decision Chains with Interest-Rate-Dependent Rewards*

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests