Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Abstract: Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration (i.e., determine which states to save and explore from, and what actions to consider next), which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these handcrafted heuristics with the intelligence and internalized human notions of interestingness captured by giant pretrained foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g., discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting opportunity to recognize and capitalize on serendipitous discoveries -- states encountered during exploration that are valuable in terms of exploration, yet where what makes them interesting was not anticipated by the human user. We evaluate our algorithm on a diverse range of language and vision-based tasks that require search and exploration. Across these tasks, IGE strongly exceeds classic reinforcement learning and graph search baselines, and also succeeds where prior state-of-the-art FM agents like Reflexion completely fail. Overall, Intelligent Go-Explore combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.
- Constitutional ai: Harmlessness from ai feedback, 2022.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757. doi: 10.1613/jair.3912. URL http://dx.doi.org/10.1613/jair.3912.
- Graph of thoughts: Solving elaborate problems with large language models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16):17682–17690, March 2024a. ISSN 2159-5399. doi: 10.1609/aaai.v38i16.29720. URL http://dx.doi.org/10.1609/aaai.v38i16.29720.
- Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024b.
- On the opportunities and risks of foundation models. ArXiv, 2021. URL https://crfm.stanford.edu/assets/report.pdf.
- Quality-diversity through ai feedback, 2023.
- Language models are few-shot learners, 2020.
- Grounding large language models in interactive environments with online reinforcement learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 3676–3713. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/carta23a.html.
- A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3):1–45, 2024.
- Seth Cooper. A framework for scientific discovery through video games. Morgan & Claypool, 2014.
- Textworld: A learning environment for text-based games. CoRR, abs/1806.11532, 2018.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022.
- First return, then explore. Nature, 590:580–586, 02 2021a. doi: 10.1038/s41586-020-03157-9.
- Go-explore: a new approach for hard-exploration problems, 2021b.
- Cell-free latent go-explore, 2023.
- Stream of search (sos): Learning to search in language, 2024.
- Thought Cloning: Learning to think while acting by imitating human thinking. Advances in Neural Information Processing Systems, 36, 2024.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022.
- Imitation learning: A survey of learning methods. ACM Comput. Surv., 50(2), apr 2017. ISSN 0360-0300. doi: 10.1145/3054912. URL https://doi.org/10.1145/3054912.
- General intelligence requires rethinking exploration. Royal Society Open Science, 10(6):230539, 2023.
- Motif: Intrinsic motivation from artificial intelligence feedback, 2023.
- Can large language models explore in-context?, 2024.
- The nethack learning environment. Advances in Neural Information Processing Systems, 33:7671–7684, 2020.
- Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22, 2022.
- RLAIF: Scaling reinforcement learning from human feedback with AI feedback, 2024. URL https://openreview.net/forum?id=AAxIs3D2ZZ.
- Beyond a*: Better planning with transformers via search dynamics bootstrapping, 2024.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
- Agentbench: Evaluating llms as agents, 2023.
- Go-explore complex 3-d game environments for automated reachability testing. IEEE Transactions on Games, 16(1):235–240, 2024. doi: 10.1109/TG.2022.3228401.
- In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
- OpenAI. Gpt-4 technical report, 2024.
- Neural map: Structured memory for deep reinforcement learning. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Bk9zbyZCZ.
- Reflexion: Language agents with verbal reinforcement learning, 2023.
- Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018. URL http://incompleteideas.net/book/the-book-2nd.html.
- CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. URL https://aclanthology.org/N19-1421.
- Gemini Team. Gemini: A family of highly capable multimodal models, 2024.
- Breadcrumbs to the goal: Goal-conditioned exploration from human-in-the-loop feedback. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 63222–63258. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/c7c7cf10082e454b9662a686ce6f1b6f-Paper-Conference.pdf.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):1–26, 2024.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
- A comprehensive study of multimodal large language models for image quality assessment. arXiv preprint arXiv:2403.10854, 2024.
- Tree of Thoughts: Deliberate problem solving with large language models, 2023a.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023b. URL https://openreview.net/forum?id=WE_vluYUL-X.
- OMNI: Open-endedness via models of human notions of interestingness. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=AgM3MzT99c.
- Calibrate before use: Improving few-shot performance of language models. In International conference on machine learning, pages 12697–12706. PMLR, 2021.
- Judging llm-as-a-judge with mt-bench and chatbot arena. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 46595–46623. Curran Associates, Inc., 2023a. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/91f18a1287b398d378ef22505bf41832-Paper-Datasets_and_Benchmarks.pdf.
- Judging llm-as-a-judge with mt-bench and chatbot arena, 2023b.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Bootstrap methods and applications. IEEE Signal Processing Magazine, 24(4):10–19, 2007.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.