Papers
Topics
Authors
Recent
Search
2000 character limit reached

Shaped Policy Search for Evolutionary Strategies using Waypoints

Published 30 May 2021 in cs.RO, cs.LG, and cs.NE | (2105.14639v2)

Abstract: In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Hindsight Experience Replay” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, 2017, pp. 5048–5058 URL: http://papers.nips.cc/paper/7090-hindsight-experience-replay
  2. “OpenAI Gym” In CoRR abs/1606.01540, 2016 arXiv: http://arxiv.org/abs/1606.01540
  3. “CARLA: An Open Urban Driving Simulator” In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings 78, Proceedings of Machine Learning Research PMLR, 2017, pp. 1–16 URL: http://proceedings.mlr.press/v78/dosovitskiy17a.html
  4. “Reinforcement Learning from Imperfect Demonstrations” In CoRR abs/1802.05313, 2018 arXiv: http://arxiv.org/abs/1802.05313
  5. “Lightweight Learner for Shared Knowledge Lifelong Learning” In CoRR abs/2305.15591, 2023 DOI: 10.48550/arXiv.2305.15591
  6. “Hybrid Reinforcement Learning with Expert State Sequences” In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019 AAAI Press, 2019, pp. 3739–3746 DOI: 10.1609/aaai.v33i01.33013739
  7. David Ha “A Visual Guide to Evolution Strategies” In blog.otoro.net, 2017 URL: https://blog.otoro.net/2017/10/29/visual-evolution-strategies/
  8. Nikolaus Hansen “The CMA Evolution Strategy: A Tutorial” In CoRR abs/1604.00772, 2016 arXiv: http://arxiv.org/abs/1604.00772
  9. Kiran Lekkala, Sami Abu-El-Haija and Laurent Itti “Meta adaptation using importance weighted demonstrations” In arXiv preprint arXiv:1911.10322, 2019
  10. “Attentive Feature Reuse for Multi Task Meta learning” In arXiv preprint arXiv:2006.07438, 2020
  11. Kiran Kumar Lekkala and Vinay Kumar Mittal “Accurate and augmented navigation for quadcopter based on multi-sensor fusion” In 2016 IEEE Annual India Conference (INDICON), 2016, pp. 1–6 IEEE
  12. Kiran Kumar Lekkala and Vinay Kumar Mittal “Artificial intelligence for precision movement robot” In 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN), 2015, pp. 378–383 IEEE
  13. Kiran Kumar Lekkala and Vinay Kumar Mittal “PID controlled 2D precision robot” In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014, pp. 1141–1145 IEEE
  14. Kiran Kumar Lekkala and Vinay Kumar Mittal “Simultaneous aerial vehicle localization and human tracking” In 2016 IEEE Region 10 Conference (TENCON), 2016, pp. 379–383 IEEE
  15. “robo-gym - An Open Source Toolkit for Distributed Deep Reinforcement Learning on Real and Simulated Robots” In CoRR abs/2007.02753, 2020 arXiv: https://arxiv.org/abs/2007.02753
  16. “Guided evolutionary strategies: augmenting random search with surrogate gradients” In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA 97, Proceedings of Machine Learning Research PMLR, 2019, pp. 4264–4273 URL: http://proceedings.mlr.press/v97/maheswaranathan19a.html
  17. “RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration” In IEEE Robotics Autom. Lett. 5.4, 2020, pp. 6262–6269 DOI: 10.1109/LRA.2020.3010750
  18. “Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations” In Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018 DOI: 10.15607/RSS.2018.XIV.049
  19. Hongyu Ren, Shengjia Zhao and Stefano Ermon “Adaptive Antithetic Sampling for Variance Reduction” In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA 97, Proceedings of Machine Learning Research PMLR, 2019, pp. 5420–5428 URL: http://proceedings.mlr.press/v97/ren19b.html
  20. Stéphane Ross, Geoffrey J. Gordon and Drew Bagnell “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning” In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011 15, JMLR Proceedings JMLR.org, 2011, pp. 627–635 URL: http://proceedings.mlr.press/v15/ross11a/ross11a.pdf
  21. Reuven Y. Rubinstein and Dirk P. Kroese “The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-Carlo Simulation (Information Science and Statistics)” Berlin, Heidelberg: Springer-Verlag, 2004
  22. “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” In CoRR abs/1703.03864, 2017 arXiv: http://arxiv.org/abs/1703.03864
  23. “Proximal Policy Optimization Algorithms” In CoRR abs/1707.06347, 2017 arXiv: http://arxiv.org/abs/1707.06347
  24. “Parameter-exploring policy gradients” In Neural Networks 23.4, 2010, pp. 551–559 DOI: 10.1016/j.neunet.2009.12.004
  25. Emanuel Todorov, Tom Erez and Yuval Tassa “MuJoCo: A physics engine for model-based control” In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012 IEEE, 2012, pp. 5026–5033 DOI: 10.1109/IROS.2012.6386109
  26. “What can we learn from misclassified ImageNet images?” In arXiv preprint arXiv:2201.08098, 2022
  27. “Natural evolution strategies” In J. Mach. Learn. Res. 15.1, 2014, pp. 949–980 URL: http://dl.acm.org/citation.cfm?id=2638566
  28. “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames” In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 OpenReview.net, 2020 URL: https://openreview.net/forum?id=H1gX8C4YPr
  29. “Ferroelectric fet based context-switching fpga enabling dynamic reconfiguration for adaptive deep learning machines” In arXiv preprint arXiv:2212.00089, 2022
  30. Huasha Zhao and John F. Canny “Sparse Allreduce: Efficient Scalable Communication for Power-Law Data” In CoRR abs/1312.3020, 2013 arXiv: http://arxiv.org/abs/1312.3020
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.