Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning to Solve Tasks with Exploring Prior Behaviours

Published 6 Jul 2023 in cs.RO, cs.AI, and cs.LG | (2307.02889v1)

Abstract: Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
  2. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
  3. A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, “End-to-end robotic reinforcement learning without reward engineering,” in Robotics: Science and Systems, 2019.
  4. R. Zhu, D. Zhang, and B. Lo, “Deep reinforcement learning based semi-autonomous control for robotic surgery,” arXiv preprint arXiv:2204.05433, 2022.
  5. B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, 2021.
  6. A. Singh, A. Yu, J. Yang, J. Zhang, A. Kumar, and S. Levine, “Cog: Connecting new skills to past experience with offline reinforcement learning,” arXiv preprint arXiv:2010.14500, 2020.
  7. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning, 2018, pp. 1861–1870.
  8. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning.” in International Conference on Learning Representations, 2016.
  9. S. Li, J. Zhang, J. Wang, Y. Yu, and C. Zhang, “Active hierarchical exploration with stable subgoal representation learning,” in International Conference on Learning Representations, 2022.
  10. S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in International Conference on Artificial Intelligence and Statistics, 2011, pp. 627–635.
  11. J. Ho and S. Ermon, “Generative adversarial imitation learning,” Advances in Neural Information Processing Systems, vol. 29, 2016.
  12. Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  13. K. Zakka, A. Zeng, P. Florence, J. Tompson, J. Bohg, and D. Dwibedi, “Xirl: Cross-embodiment inverse reinforcement learning,” in Conference on Robot Learning, 2022, pp. 537–546.
  14. K. Pertsch, Y. Lee, Y. Wu, and J. J. Lim, “Demonstration-guided reinforcement learning with learned skills,” in Conference on Robot Learning, 2021.
  15. A. Singh, H. Liu, G. Zhou, A. Yu, N. Rhinehart, and S. Levine, “Parrot: Data-driven behavioral priors for reinforcement learning,” in International Conference on Learning Representations, 2021.
  16. A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, “Learning complex dexterous manipulation with deep reinforcement learning and demonstrations,” in Robotics: Science and Systems, 2017.
  17. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  18. D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in International Conference on Machine Learning, 2017, pp. 2778–2787.
  19. Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” in International Conference on Learning Representations, 2019.
  20. T. Nguyen, T. M. Luu, T. Vu, and C. D. Yoo, “Sample-efficient reinforcement learning representation learning with curiosity contrastive forward dynamics model,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021, pp. 3471–3477.
  21. A. P. Badia, P. Sprechmann, A. Vitvitskyi, D. Guo, B. Piot, S. Kapturowski, O. Tieleman, M. Arjovsky, A. Pritzel, A. Bolt, and C. Blundell, “Never give up: Learning directed exploration strategies,” in International Conference on Learning Representations, 2020.
  22. R. Raileanu and T. Rocktäschel, “Ride: Rewarding impact-driven exploration for procedurally-generated environments,” in International Conference on Learning Representations, 2020.
  23. G. Ostrovski, M. G. Bellemare, A. Oord, and R. Munos, “Count-based exploration with neural density models,” in International Conference on Machine Learning, 2017, pp. 2721–2730.
  24. M. Seurin, F. Strub, P. Preux, and O. Pietquin, “Don’t do what doesn’t matter: Intrinsic motivation with action usefulness,” in Internationnal Joint Conference on Artificial Intelligence, 2021.
  25. B. Eysenbach, S. Levine, and R. R. Salakhutdinov, “Replacing rewards with examples: Example-based policy search via recursive classification,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  26. B. Eysenbach, R. Salakhutdinov, and S. Levine, “C-learning: Learning to achieve goals via recursive classification,” in International Conference on Learning Representations, 2021.
  27. T. Degris, M. White, and R. S. Sutton, “Off-policy actor-critic,” in International Conference on Machine Learning, 2012.
  28. T. Zhang, H. Xu, X. Wang, Y. Wu, K. Keutzer, J. E. Gonzalez, and Y. Tian, “Bebold: Exploration beyond the boundary of explored regions,” arXiv preprint arXiv:2012.08621, 2020.
  29. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033.
  30. S. Pitis, H. Chan, S. Zhao, B. Stadie, and J. Ba, “Maximum entropy gain exploration for long horizon multi-goal reinforcement learning,” in International Conference on Machine Learning, 2020, pp. 7750–7761.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.