diff History for Neural Language Agents

Published 12 Dec 2023 in cs.AI, cs.CL, and cs.LG | (2312.07540v3)

Abstract: Neural LLMs (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual inputs on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of low-sample instruction tuning. Further, we show that diff history scales favorably across different tuning dataset sizes. We open-source our code and data to https://diffhistory.github.io.

Abstract PDF HTML Upgrade to Chat

References (35)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a diff-based textual compression method that streamlines extensive environment observations for language agents.
The proposed technique extends effective interaction histories up to four times, significantly enhancing performance in complex environments like NetHack.
The integration of residual histories supports improved decision-making, paving the way for generalized, embodied control in future research.

Introduction to Long-Context Language Agents

LLMs (LMs) are increasingly being considered for a variety of applications beyond simple text generation, particularly in embodiments where agents interact with and control environments. One notable challenge in this space is the requirement to process lengthy histories of environment observations to inform decision-making. Traditional approaches face difficulties with managing these extensive observational histories due to computational constraints and inefficient representations, which limits the scope to domains that either have small observation spaces or require minimal historical interactions.

A Novel Approach to Textual Compression

In order to address these limitations, a highly effective method has been introduced to compress textual observations from consecutive environment interactions, which tend to be highly redundant. Drawing on the Unix [diff](https://www.emergentmind.com/topics/differential-transformer-diff) command, the proposed technique distills consecutive observations, condensing them into a streamlined representation that highlights only the differences (or residuals) between them. Applied to a complex, decision-intensive video game environment called NetHack, this compressive strategy – termed 'diff history' – demonstrates exceptional results. By using diff history, language agents saw a significant increase, up to four times, in the length of interaction history they could leverage, which directly translated to performance improvements.

The Benefits of Residual Interaction Histories

The integration of diff histories into LLMs resulted in substantial enhancements over previous state-of-the-art baselines. Specifically, language agents employing diff histories outperformed their counterparts that were trained on raw observational history as well as those utilizing visual observations by impressive margins. These findings underscore the potential of using compressed, residual-based interaction histories for sophisticated LLM integration into decision-making tasks.

Conclusion and Future Directions

This paper outlines the introduction and application of diff histories to language agents, showcasing a considerable improvement in agents' abilities to operate effectively in complex environments. This novel application serves as a promising step toward general-purpose, embodied control through LLMs. Future research may extend this approach to broader applications, cross-modal integration, and sophisticated planning tasks that combine the strengths of residual understanding with predictions of future environment states.

Markdown Report Issue