Proximal Policy Optimization with Adaptive Exploration
Abstract: Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute new insights into reinforcement learning algorithm design. The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent. Our proposed method outperforms standard PPO algorithms in learning efficiency, particularly when significant exploratory behavior is needed at the beginning of the learning process.
- An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131:139–148, 2012.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Gymnasium, March 2023.
- https://github.com/AndreiLix/axPPO.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.