Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
Abstract: We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks.
- Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine learning, 2004.
- Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017.
- Optimality and approximation with policy gradient methods in markov decision processes. In Conference on Learning Theory, pages 64–66. PMLR, 2020.
- Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. In Robotics: Science and Systems, volume 13, pages 1–10, 2017.
- Eitan Altman. Constrained Markov Decision Processes. Routledge, 2021.
- Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016.
- Control barrier functions: Theory and applications. In 2019 18th European Control Conference, pages 3420–3431, 2019.
- Provably safe and robust learning-based model predictive control. Automatica, 49(5):1216–1226, 2013.
- Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3682–3689, 2022.
- Safe model-based reinforcement learning with stability guarantees. In Advances in Neural Information Processing Systems, pages 908–918, 2017.
- Natural actor-critic algorithms. Automatica, 45(11):2471–2482, 2009.
- Vivek S Borkar. An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters, 54(3):207–213, 2005.
- OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
- End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3387–3395, 2019.
- Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In International Conference on Machine Learning, pages 834–843. PMLR, 2017.
- CVXPY: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17(1):2909–2913, 2016.
- Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476. PMLR, 2018.
- A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752, 2018.
- Gerald B Folland. Real Analysis: Modern Techniques and Their Applications, volume 40. John Wiley & Sons, 1999.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870. PMLR, 2018.
- “Provably safe” in the wild: testing control barrier functions on a vision-based quadrotor in an outdoor environment. In RSS 2020 Workshop in Robust Autonomy, 2020. URL https://openreview.net/pdf?id=CrBJIgBr2BK.
- V. Konda. Actor-Critic Algorithms. PhD thesis, Massachusetts Institute of Technology, 2002.
- Reinforcement learning based distributed control of dissipative networked systems. IEEE Transactions on Control of Network Systems, 9(2):856–866, 2021.
- Safe reinforcement learning: Learning with supervision using a constraint-admissible set. In 2018 Annual American Control Conference, pages 6390–6395, 2018.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Model-based constrained reinforcement learning using generalized control barrier function. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4552–4559, 2021.
- Trajectory generation and control for precise aggressive maneuvers with quadrotors. The International Journal of Robotics Research, 31(5):664–674, 2012.
- Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 2019.
- Constrained reinforcement learning has zero duality gap. Advances in Neural Information Processing Systems, 32, 2019.
- Stable-baselines3: Reliable reinforcement learning implementations. The Journal of Machine Learning Research, 22(1):12348–12355, 2021.
- Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer School on Machine Learning, pages 63–71. Springer, 2003.
- Safe exploration for active learning with gaussian processes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 133–149. Springer, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Safe exploration for optimization with gaussian processes. In International Conference on Machine Learning, pages 997–1005. PMLR, 2015.
- Beyond exponentially fast mixing in average-reward reinforcement learning via multi-level Monte Carlo actor-critic. In International Conference on Machine Learning, pages 33240–33267. PMLR, 2023.
- Reinforcement Learning: An Introduction. MIT Press, 2018.
- Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, pages 1057–1063, 2000.
- Scipy 1.0: fundamental algorithms for scientific computing in python. Nature Methods, 17(3):261–272, 2020.
- Data-driven safety filters: Hamilton-Jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine, 43(5):137–177, 2023.
- Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Robust Markov decision processes. Mathematics of Operations Research, 38(1):153–183, 2013.
- Safe teleoperation of dynamic uavs through control barrier functions. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7848–7855, 2018.
- Convergence and iteration complexity of policy gradient method for infinite-horizon reinforcement learning. In 2019 58th Conference on Decision and Control, pages 7415–7422, 2019.
- Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6):3586–3612, 2020.
- Policy optimization for ℋ2subscriptℋ2\mathcal{H}_{2}caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT linear control with ℋ∞subscriptℋ\mathcal{H}_{\infty}caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robustness guarantee: Implicit regularization and global convergence. SIAM Journal on Control and Optimization, 59(6):4081–4109, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.