Anselmi, J., Dufour, F. and Prieto-Rumeau, T. (2016). Computable approximations for continuous-time Markov decision processes on Borel spaces based on empirical measures. J. Math. Anal. Appl. 443, 1323–1361.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
Boissard, E. (2011). Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Prob. 16, 2296–2333.
Chang, H. S., Fu, M. C., Hu, J. and Marcus, S. I. (2007). Simulation-Based Algorithms for Markov Decision Processes. Springer, London.
Dufour, F. and Prieto-Rumeau, T. (2012). Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 1254–1267.
Dufour, F. and Prieto-Rumeau, T. (2013). Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optimization 51, 1298–1324.
Dufour, F. and Prieto-Rumeau, T. (2014). Stochastic approximations of constrained discounted Markov decision processes. J. Math. Anal. Appl. 413, 856–879.
Dufour, F. and Prieto-Rumeau, T. (2015). Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87, 273–307.
Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730–756.
Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953–985.
Guo, X. and Zhang, W. (2014). Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Europ. J. Operat. Res. 238, 486–496.
Hinderer, K. (2005). Lipschitz continuity of value functions in Markovian decision processes. Math. Meth. Operat. Res. 62, 3–22.
Jacod, J. (1979). Calcul Stochastique et Problèmes de Martingales (Lecture Notes Math. 714). Springer, Berlin.
Piunovskiy, A. and Zhang, Y. (2014). Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach. 4OR 12, 49–75.
Powell, W. B. (2007). Approximate Dynamic Programming. John Wiley, Hoboken, NJ.
Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Prob. 49, 1072–1090.
Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201–207.
Saldi, N., Linder, T. and Yüksel, S. (2015). Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans. Automatic Control 60, 553–558.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
Van Roy, B. (2002). Neuro-dynamic programming: overview and recent trends. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), Kluwer, Boston, MA, pp. 431–459.
Zhang, Y. (2014). Average optimality for continuous-time Markov decision processes under weak continuity conditions. J. Appl. Prob. 51, 954–970.