Papers
Topics
Authors
Recent
Search
2000 character limit reached

Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving

Published 12 Oct 2023 in cs.RO | (2310.08674v1)

Abstract: Autonomous off-road driving is challenging as risky actions taken by the robot may lead to catastrophic damage. As such, developing controllers in simulation is often desirable as it provides a safer and more economical alternative. However, accurately modeling robot dynamics is difficult due to the complex robot dynamics and terrain interactions in unstructured environments. Domain randomization addresses this problem by randomizing simulation dynamics parameters, however this approach sacrifices performance for robustness leading to policies that are sub-optimal for any target dynamics. We introduce a novel model-based reinforcement learning approach that aims to balance robustness with adaptability. Our approach trains a System Identification Transformer (SIT) and an Adaptive Dynamics Model (ADM) under a variety of simulated dynamics. The SIT uses attention mechanisms to distill state-transition observations from the target system into a context vector, which provides an abstraction for its target dynamics. Conditioned on this, the ADM probabilistically models the system's dynamics. Online, we use a Risk-Aware Model Predictive Path Integral controller (MPPI) to safely control the robot under its current understanding of the dynamics. We demonstrate in simulation as well as in multiple real-world environments that this approach enables safer behaviors upon initialization and becomes less conservative (i.e. faster) as its understanding of the target system dynamics improves with more observations. In particular, our approach results in an approximately 41% improvement in lap-time over the non-adaptive baseline while remaining safe across different environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. M. Dunbabin and L. Marques, “Robots for environmental monitoring: Significant advancements and applications,” IEEE Robotics & Automation Magazine, vol. 19, no. 1, pp. 24–39, 2012.
  2. M. Trincavelli, M. Reggente, S. Coradeschi, A. Loutfi, H. Ishida, and A. J. Lilienthal, “Towards environmental monitoring with mobile robots,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 2210–2215.
  3. J. Bares, M. Hebert, T. Kanade, E. Krotkov, T. Mitchell, R. Simmons, and W. Whittaker, “Ambler: An autonomous rover for planetary exploration,” Computer, vol. 22, no. 6, pp. 18–26, 1989.
  4. A. Bechar and C. Vigneault, “Agricultural robots for field operations: Concepts and components,” Biosystems Engineering, vol. 149, pp. 94–111, 2016.
  5. K. Zhang, F. Niroui, M. Ficocelli, and G. Nejat, “Robot navigation of environments with unknown rough terrain using deep reinforcement learning,” in IEEE International Symposium on Safety, Security, and Rescue Robotics, 2018, pp. 1–7.
  6. S. Josef and A. Degani, “Deep reinforcement learning for safe local planning of a ground vehicle in unknown rough terrain,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6748–6755, 2020.
  7. B. Zhou, J. Yi, and X. Zhang, “Learning to navigate on the rough terrain: A multi-modal deep reinforcement learning approach,” in IEEE International Conference on Power, Intelligent Computing and Systems, 2022, pp. 189–194.
  8. W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Robotics: Science and Systems, 2017.
  9. S. Zhu, A. Kimmel, K. E. Bekris, and A. Boularias, “Fast model identification via physics engines for data-efficient policy search,” in International Joint Conference on Artificial Intelligence, 2018, pp. 3249–3256.
  10. J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” in Robotics: Science and Systems, 2018.
  11. Y.-Y. Tsai, H. Xu, Z. Ding, C. Zhang, E. Johns, and B. Huang, “Droid: Minimizing the reality gap using single-shot human demonstration,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3168–3175, 2021.
  12. M. Kaspar, J. D. M. Osorio, and J. Bock, “Sim2real transfer for reinforcement learning without dynamics randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020, pp. 4383–4388.
  13. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, pp. 23–30.
  14. Z. Xie, X. Da, M. Van de Panne, B. Babich, and A. Garg, “Dynamics randomization revisited: A case study for quadrupedal locomotion,” in IEEE International Conference on Robotics and Automation, 2021, pp. 4955–4961.
  15. X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in IEEE International Conference on Robotics and Automation, 2018, pp. 3803–3810.
  16. W. Yu, J. Tan, Y. Bai, E. Coumans, and S. Ha, “Learning fast adaptation with meta strategy optimization,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2950–2957, 2020.
  17. W. Yu, C. K. Liu, and G. Turk, “Policy transfer with strategy optimization,” in International Conference on Learning Representations, 2019.
  18. W. Yu, V. C. Kumar, G. Turk, and C. K. Liu, “Sim-to-real transfer for biped locomotion,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019, pp. 3503–3510.
  19. N. Hansen, A. Ostermeier, and A. Gawelczyk, “On the adaptation of arbitrary normal mutation distributions in evolution strategies: The generating set adaptation,” in International Conference on Genetic Algorithms, 1995, pp. 57–64.
  20. A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” in Robotics: Science and Systems, 2021.
  21. J. Yin, Z. Zhang, and P. Tsiotras, “Risk-aware model predictive path integral control using conditional value-at-risk,” in IEEE International Conference on Robotics and Automation, 2023, pp. 7937–7943.
  22. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  23. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  24. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” Advances in NIPS 2016 Deep Learning Symposium, arXiv:1607.06450, 2016.
  25. S. J. Wang, S. Triest, W. Wang, S. Scherer, and A. Johnson, “Rough terrain navigation using divergence constrained model-based reinforcement learning,” in Conference on Robot Learning, 2021.
  26. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  27. M. S. Gandhi, B. Vlahov, J. Gibson, G. Williams, and E. A. Theodorou, “Robust model predictive path integral control: Analysis and performance guarantees,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1423–1430, 2021.
  28. K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  29. G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Information-theoretic model predictive control: Theory and applications to autonomous driving,” IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603–1622, 2018.
  30. C. Feller and C. Ebenbauer, “Relaxed logarithmic barrier function based model predictive control of linear systems,” IEEE Transactions on Automatic Control, vol. 62, no. 3, pp. 1223–1238, 2016.
  31. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference for Learning Representations, 2015.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.