Handling Limited Context Windows in Approximate ICRL
Develop methods to enable the Approximate In-Context Reinforcement Learning (ICRL) algorithm to effectively handle limited language model context window lengths by supporting contexts that exceed the model’s window, thereby allowing robust deployment over extended interactions without relying on unbounded context capacity.
References
Our work also lays out open questions as far as the use of computational resources. However, Approximate left open the problem of working with a limited context window, a critical problem for deploying these methods for extended periods with many interactions.
— LLMs Are In-Context Bandit Reinforcement Learners
(2410.05362 - Monea et al., 2024) in Section 6 (Discussion and Limitations)