Shaped Policy Search for Evolutionary Strategies using Waypoints
Abstract: In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.
- “Hindsight Experience Replay” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, 2017, pp. 5048–5058 URL: http://papers.nips.cc/paper/7090-hindsight-experience-replay
- “OpenAI Gym” In CoRR abs/1606.01540, 2016 arXiv: http://arxiv.org/abs/1606.01540
- “CARLA: An Open Urban Driving Simulator” In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings 78, Proceedings of Machine Learning Research PMLR, 2017, pp. 1–16 URL: http://proceedings.mlr.press/v78/dosovitskiy17a.html
- “Reinforcement Learning from Imperfect Demonstrations” In CoRR abs/1802.05313, 2018 arXiv: http://arxiv.org/abs/1802.05313
- “Lightweight Learner for Shared Knowledge Lifelong Learning” In CoRR abs/2305.15591, 2023 DOI: 10.48550/arXiv.2305.15591
- “Hybrid Reinforcement Learning with Expert State Sequences” In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019 AAAI Press, 2019, pp. 3739–3746 DOI: 10.1609/aaai.v33i01.33013739
- David Ha “A Visual Guide to Evolution Strategies” In blog.otoro.net, 2017 URL: https://blog.otoro.net/2017/10/29/visual-evolution-strategies/
- Nikolaus Hansen “The CMA Evolution Strategy: A Tutorial” In CoRR abs/1604.00772, 2016 arXiv: http://arxiv.org/abs/1604.00772
- Kiran Lekkala, Sami Abu-El-Haija and Laurent Itti “Meta adaptation using importance weighted demonstrations” In arXiv preprint arXiv:1911.10322, 2019
- “Attentive Feature Reuse for Multi Task Meta learning” In arXiv preprint arXiv:2006.07438, 2020
- Kiran Kumar Lekkala and Vinay Kumar Mittal “Accurate and augmented navigation for quadcopter based on multi-sensor fusion” In 2016 IEEE Annual India Conference (INDICON), 2016, pp. 1–6 IEEE
- Kiran Kumar Lekkala and Vinay Kumar Mittal “Artificial intelligence for precision movement robot” In 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN), 2015, pp. 378–383 IEEE
- Kiran Kumar Lekkala and Vinay Kumar Mittal “PID controlled 2D precision robot” In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014, pp. 1141–1145 IEEE
- Kiran Kumar Lekkala and Vinay Kumar Mittal “Simultaneous aerial vehicle localization and human tracking” In 2016 IEEE Region 10 Conference (TENCON), 2016, pp. 379–383 IEEE
- “robo-gym - An Open Source Toolkit for Distributed Deep Reinforcement Learning on Real and Simulated Robots” In CoRR abs/2007.02753, 2020 arXiv: https://arxiv.org/abs/2007.02753
- “Guided evolutionary strategies: augmenting random search with surrogate gradients” In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA 97, Proceedings of Machine Learning Research PMLR, 2019, pp. 4264–4273 URL: http://proceedings.mlr.press/v97/maheswaranathan19a.html
- “RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration” In IEEE Robotics Autom. Lett. 5.4, 2020, pp. 6262–6269 DOI: 10.1109/LRA.2020.3010750
- “Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations” In Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018 DOI: 10.15607/RSS.2018.XIV.049
- Hongyu Ren, Shengjia Zhao and Stefano Ermon “Adaptive Antithetic Sampling for Variance Reduction” In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA 97, Proceedings of Machine Learning Research PMLR, 2019, pp. 5420–5428 URL: http://proceedings.mlr.press/v97/ren19b.html
- Stéphane Ross, Geoffrey J. Gordon and Drew Bagnell “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning” In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011 15, JMLR Proceedings JMLR.org, 2011, pp. 627–635 URL: http://proceedings.mlr.press/v15/ross11a/ross11a.pdf
- Reuven Y. Rubinstein and Dirk P. Kroese “The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-Carlo Simulation (Information Science and Statistics)” Berlin, Heidelberg: Springer-Verlag, 2004
- “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” In CoRR abs/1703.03864, 2017 arXiv: http://arxiv.org/abs/1703.03864
- “Proximal Policy Optimization Algorithms” In CoRR abs/1707.06347, 2017 arXiv: http://arxiv.org/abs/1707.06347
- “Parameter-exploring policy gradients” In Neural Networks 23.4, 2010, pp. 551–559 DOI: 10.1016/j.neunet.2009.12.004
- Emanuel Todorov, Tom Erez and Yuval Tassa “MuJoCo: A physics engine for model-based control” In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012 IEEE, 2012, pp. 5026–5033 DOI: 10.1109/IROS.2012.6386109
- “What can we learn from misclassified ImageNet images?” In arXiv preprint arXiv:2201.08098, 2022
- “Natural evolution strategies” In J. Mach. Learn. Res. 15.1, 2014, pp. 949–980 URL: http://dl.acm.org/citation.cfm?id=2638566
- “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames” In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 OpenReview.net, 2020 URL: https://openreview.net/forum?id=H1gX8C4YPr
- “Ferroelectric fet based context-switching fpga enabling dynamic reconfiguration for adaptive deep learning machines” In arXiv preprint arXiv:2212.00089, 2022
- Huasha Zhao and John F. Canny “Sparse Allreduce: Efficient Scalable Communication for Power-Law Data” In CoRR abs/1312.3020, 2013 arXiv: http://arxiv.org/abs/1312.3020
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.