Online POMDP Planning with Anytime Deterministic Optimality Guarantees
Abstract: Decision-making under uncertainty is a critical aspect of many practical autonomous systems due to incomplete information. Partially Observable Markov Decision Processes (POMDPs) offer a mathematically principled framework for formulating decision-making problems under such conditions. However, finding an optimal solution for a POMDP is generally intractable. In recent years, there has been a significant progress of scaling approximate solvers from small to moderately sized problems, using online tree search solvers. Often, such approximate solvers are limited to probabilistic or asymptotic guarantees towards the optimal solution. In this paper, we derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution. We show that at any time, we can derive bounds that relate between the existing solution and the optimal one. We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms that have a certain structure to provide them with deterministic guarantees with marginal computational overhead. In return, not only do we certify the solution quality, but we demonstrate that making a decision based on the deterministic guarantee may result in superior performance compared to the original algorithm without the deterministic certification.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256, 2002.
- POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning Research, 18(26):1–5, 2017. URL http://jmlr.org/papers/v18/16-300.html.
- An on-line pomdp solver for continuous observation spaces. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 7643–7649. IEEE, 2021.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1):99–134, 1998.
- Sparse tree search optimality guarantees in pomdps with continuous observation spaces. In Intl. Joint Conf. on AI (IJCAI), pages 4135–4142, 7 2020.
- Generalized optimality guarantees for solving continuous observation pomdps through particle belief mdp approximation. arXiv preprint arXiv:2210.05015, 2022.
- Monte-carlo planning in large pomdps. In Advances in Neural Information Processing Systems (NIPS), pages 2164–2172, 2010.
- Despot: Online pomdp planning with regularization. In NIPS, volume 13, pages 1772–1780, 2013.
- Online algorithms for pomdps with continuous state, action, and observation spaces. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 28, 2018.
- Adaptive online packing-guided search for pomdps. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems (NIPS), volume 34, pages 28419–28430. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ef41d488755367316f04fc0e0e9dc9fc-Paper.pdf.
- Despot: Online pomdp planning with regularization. JAIR, 58:231–266, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.