diff History for Neural Language Agents
Abstract: Neural LLMs (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual inputs on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of low-sample instruction tuning. Further, we show that diff history scales favorably across different tuning dataset sizes. We open-source our code and data to https://diffhistory.github.io.
- Do as i can and not as i say: Grounding language in robotic affordances. In arXiv preprint arXiv:2204.01691, 2022.
- Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences, 117(47):29302–29310, 2020.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272, 2018.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Gemini Team, G. Gemini: A family of highly capable multimodal models. 2023.
- A nethack learning environment language wrapper for autonomous agents. Journal of Open Research Software, 11:8, 2023.
- A theory of causal learning in children: causal maps and bayes nets. Psychological review, 111(1):3, 2004.
- Insights from the neurips 2021 nethack challenge. In NeurIPS 2021 Competitions and Demonstrations Track, pp. 41–52. PMLR, 2022a.
- Dungeons and data: A large-scale nethack dataset. Advances in Neural Information Processing Systems, 35:24864–24878, 2022b.
- Vima: General robot manipulation with multimodal prompts. In Fortieth International Conference on Machine Learning, 2023.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Kahneman, D. Thinking, fast and slow. Macmillan, 2011.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- The UNIX programming environment, volume 270. Prentice-Hall Englewood Cliffs, NJ, 1984.
- Motif: Intrinsic motivation from artificial intelligence feedback. arXiv preprint arXiv:2310.00166, 2023.
- The nethack learning environment. Advances in Neural Information Processing Systems, 33:7671–7684, 2020.
- Language modeling with latent situations, 2022.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- OpenAI. Gpt-4 technical report, 2023.
- Learning with language-guided state abstractions. In RSS Workshop on Social Intelligence in Humans and Robots, 2023.
- Nethack is hard to hack. arXiv preprint arXiv:2305.19240, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Scaling language models: Methods, analysis insights from training gopher, 2022.
- How to grow a mind: Statistics, structure, and abstraction. science, 331(6022):1279–1285, 2011.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Scaling laws for imitation learning in nethack. arXiv preprint arXiv:2307.09423, 2023.
- How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989.
- Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning. arXiv preprint arXiv:2305.15486, 2023.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Improving policy learning via language dynamics distillation. Advances in Neural Information Processing Systems, 35:12504–12515, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.