ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery
Abstract: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised Reinforcement Learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this paper, we propose \textbf{Co}ntrastive dyna\textbf{m}ic \textbf{S}kill \textbf{D}iscovery \textbf{(ComSD)}\footnote{Code and videos: https://github.com/liuxin0824/ComSD} which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It contains a particle-based exploration reward to make agents access far-reaching states for exploratory skill acquisition, and a novel contrastive diversity reward to promote the discriminability between different skills. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed to balance state exploration and skill diversity, which further enhances the quality of the discovered skills. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for multi-joint robots, enabling state-of-the-art adaptation performance on challenging downstream tasks. It can also discover distinguishable and far-reaching exploration skills in the challenging tree-like 2D maze.
- Variational option discovery algorithms. arXiv preprint arXiv:1807.10299, 2018.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Richard Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pages 1317–1327. PMLR, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Bnas: Efficient neural architecture search using broad scalable architecture. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Variational intrinsic control. arXiv preprint arXiv:1611.07507, 2016.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 297–304. JMLR Workshop and Conference Proceedings, 2010.
- Fast task inference with variational intrinsic successor features. arXiv preprint arXiv:1906.05030, 2019.
- Unsupervised skill discovery via recurrent skill training. Advances in Neural Information Processing Systems, 35:39034–39046, 2022.
- Variational curriculum reinforcement learning for unsupervised discovery of skills. arXiv preprint arXiv:2310.19424, 2023.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020.
- Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191, 2021.
- Cic: Contrastive intrinsic control for unsupervised skill discovery. arXiv preprint arXiv:2202.00161, 2022.
- Unsupervised reinforcement learning with contrastive intrinsic control. Advances in Neural Information Processing Systems, 35:34478–34491, 2022.
- Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274, 2019.
- End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
- Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE transactions on neural networks and learning systems, 31(6):2064–2076, 2019.
- Internally rewarded reinforcement learning. arXiv preprint arXiv:2302.00270, 2023.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Aps: Active pretraining with successor features. In International Conference on Machine Learning, pages 6736–6747. PMLR, 2021.
- Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems, 34:18459–18473, 2021.
- Cross-domain random pre-training with prototypes for reinforcement learning. arXiv preprint arXiv:2302.05614, 2023.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Autonomous task sequencing for customized curriculum design in reinforcement learning. In IJCAI, pages 2536–2542, 2017.
- Lipschitz-constrained unsupervised skill discovery. In International Conference on Learning Representations, 2021.
- Controllability-aware unsupervised skill discovery. arXiv preprint arXiv:2302.05103, 2023.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Self-supervised exploration via disagreement. In International conference on machine learning, pages 5062–5071. PMLR, 2019.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
- One after another: Learning incremental skills for a changing world. arXiv preprint arXiv:2203.11176, 2022.
- Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657, 2019.
- Nearest neighbor estimates of entropy. American journal of mathematical and management sciences, 23(3-4):301–321, 2003.
- Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, pages 9870–9879. PMLR, 2021.
- Learning more skills through optimistic exploration. arXiv preprint arXiv:2107.14226, 2021.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Behavior contrastive learning for unsupervised skill discovery. arXiv preprint arXiv:2305.04477, 2023.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 11920–11931. PMLR, 2021.
- A mixture of surprises for unsupervised reinforcement learning. Advances in Neural Information Processing Systems, 35:26078–26090, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.