Papers
Topics
Authors
Recent
Search
2000 character limit reached

Model approximation in MDPs with unbounded per-step cost

Published 13 Feb 2024 in math.OC, cs.LG, cs.SY, and eess.SY | (2402.08813v1)

Abstract: We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hat{\pi}{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference between the value function of $\hat{\pi}\star $ when used in $\mathcal{M}$ and the optimal value function of $\mathcal{M}$. We then extend our results and obtain potentially tighter upper bounds by considering affine transformations of the per-step cost. We further provide upper bounds that explicitly depend on the weighted distance between cost functions and weighted distance between transition kernels of the original and approximate models. We present examples to illustrate our results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. B. Bozkurt, A. Mahajan, A. Nayyar, and Y. Ouyang, “Weighted norm bounds in mdps with unbounded per-step cost,” in Conference on Decision and Control.   Singapore: IEEE, Dec. 2023.
  2. B. L. Fox, “Finite-state approximations to denumerable-state dynamic programs,” J. Math. Analy. Appl., vol. 34, no. 3, pp. 665–670, 1971.
  3. W. Whitt, “Approximations of dynamic programs, I,” Math. Oper. Res., vol. 3, no. 3, pp. 231–243, 1978.
  4. ——, “Approximations of dynamic programs, II,” Math. Oper. Res., vol. 4, no. 2, pp. 179–185, 1979.
  5. ——, “Representation and approximation of noncooperative sequential games,” SIAM J. Contr. Optim., vol. 18, no. 1, pp. 33–48, 1980.
  6. D. Bertsekas, “Convergence of discretization procedures in dynamic programming,” IEEE Trans. Autom. Control, vol. 20, no. 3, pp. 415–419, 1975.
  7. C.-S. Chow and J. N. Tsitsiklis, “An optimal one-way multigrid algorithm for discrete-time stochastic control,” IEEE transactions on automatic control, vol. 36, no. 8, pp. 898–914, 1991.
  8. F. Dufour and T. Prieto-Rumeau, “Approximation of Markov decision processes with general state space,” J. Math. Analy. Appl., vol. 388, no. 2, pp. 1254–1267, 2012.
  9. A. Haurie and P. L’ecuyer, “Approximation and bounds in discrete event dynamic programming,” IEEE Trans. Autom. Control, vol. 31, no. 3, pp. 227–235, 1986.
  10. A. Müller, “How does the value function of a Markov decision process depend on the transition probabilities?” Math. Oper. Res., vol. 22, no. 4, pp. 872–885, 1997.
  11. N. Saldi, T. Linder, and S. Yüksel, “Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control,” IEEE Trans. Autom. Control, vol. 60, no. 2, pp. 553–558, 2014.
  12. N. Saldi, S. Yüksel, and T. Linder, “On the asymptotic optimality of finite approximations to Markov decision processes with borel spaces,” Math. Oper. Res., vol. 42, no. 4, pp. 945–978, 2017.
  13. A. D. Kara, “Near optimality of finite memory feedback policies in partially observed markov decision processes,” J. Mach. Learn. Res., vol. 23, no. 1, pp. 437–482, 2022.
  14. J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan, “Approximate information state for approximate planning and reinforcement learning in partially observed systems.” J. Mach. Learn. Res., vol. 23, no. 12, pp. 1–83, 2022.
  15. B. L. Fox, “Discretizing dynamic programs,” J. Opt. Theory and Appl., vol. 11, pp. 228–234, 1973.
  16. P. K. Dutta, M. K. Majumdar, and R. K. Sundaram, “Parametric continuity in dynamic programming problems,” J. Economic Dynamics and Control, vol. 18, no. 6, pp. 1069–1092, 1994.
  17. N. Saldi, S. Yüksel, and T. Linder, “Near optimality of quantized policies in stochastic control under weak continuity conditions,” J. Math. Analy. Appl., vol. 435, no. 1, pp. 321–337, 2016.
  18. ——, “Asymptotic optimality of finite model approximations for partially observed markov decision processes with discounted cost,” IEEE Trans. Autom. Control, vol. 65, no. 1, pp. 130–142, 2019.
  19. A. D. Kara and S. Yuksel, “Robustness to incorrect priors in partially observed stochastic control,” SIAM J. Contr. Optim., vol. 57, no. 3, pp. 1929–1964, 2019.
  20. ——, “Robustness to incorrect system models in stochastic control,” SIAM J. Cont. Optim., vol. 58, no. 2, pp. 1144–1182, 2020.
  21. A. D. Kara, M. Raginsky, and S. Yüksel, “Robustness to incorrect models and data-driven learning in average-cost optimal stochastic control,” Automatica, vol. 139, p. 110179, 2022.
  22. B. Ravindran and A. G. Barto, “Approximate homomorphisms: A framework for non-exact minimization in Markov decision processes,” in KBCS, 2004.
  23. E. van der Pol, T. Kipf, F. A. Oliehoek, and M. Welling, “Plannable approximations to MDP homomorphisms: Equivariance under actions,” in AAMAS, Auckland, New Zealand, May 2020.
  24. N. Ferns, P. Panangaden, and D. Precup, “Metrics for finite Markov decision processes.” in UAI, vol. 4, 2004, pp. 162–169.
  25. ——, “Bisimulation metrics for continuous Markov decision processes,” SIAM J. Comp., vol. 40, no. 6, pp. 1662–1714, 2011.
  26. P. S. Castro, P. Panangaden, and D. Precup, “Equivalence relations in fully and partially observable Markov decision processes.” in IJCAI, vol. 9, 2009, pp. 1653–1658.
  27. D. Abel, D. Hershkowitz, and M. Littman, “Near optimal behavior via approximate state abstraction,” in ICML.   PMLR, 2016, pp. 2915–2923.
  28. V. François-Lavet, G. Rabusseau, J. Pineau, D. Ernst, and R. Fonteneau, “On overfitting and asymptotic bias in batch reinforcement learning with partial observability,” J. Artif. Intel. Res., vol. 65, pp. 1–30, 2019.
  29. C. Gelada, S. Kumar, J. Buckman, O. Nachum, and M. G. Bellemare, “Deepmdp: Learning continuous latent space models for representation learning,” in ICML.   PMLR, 2019, pp. 2170–2179.
  30. K. J. Arrow, T. Harris, and J. Marschak, “Optimal inventory policy,” Econometrica: Journal of the Econometric Society, pp. 250–272, 1951.
  31. D. P. Bertsekas, “Dynamic programming and optimal control, volume II,” Athena Scientific, 2015.
  32. L. N. Vaserstein, “Markov processes over denumerable products of spaces, describing large systems of automata,” Problemy Peredachi Informatsii, vol. 5, no. 3, pp. 64–72, 1969.
  33. M. Hairer and J. C. Mattingly, “Yet another look at Harris’ ergodic theorem for Markov chains,” in Seminar on Stochastic Analysis, Random Fields and Applications V.   Springer, 2011, pp. 109–117.
  34. S. P. Meyn and R. L. Tweedie, “Stability of Markovian processes I: Criteria for discrete-time chains,” Advances in Applied Probability, vol. 24, no. 3, pp. 542–574, 1992.
  35. H. A. Simon, “Dynamic programming under uncertainty with a quadratic criterion function,” Econometrica: Journal of the Econometric Society, pp. 74–81, 1956.
  36. H. Theil, “A note on certainty equivalence in dynamic planning,” Econometrica: Journal of the Econometric Society, pp. 346–349, 1957.
  37. H. Witsenhausen, “Inequalities for the performance of suboptimal uncertain systems,” Automatica, vol. 5, no. 4, pp. 507–512, Jul. 1969.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.