Effect of observation history on longer-horizon tasks
Determine how incorporating observation history affects the performance of LLM-based web agents on tasks with horizons longer than 15 steps, and characterize how performance scales with history length in such settings.
References
Furthermore, WorkArena L1 involves tasks of up to 15 steps, and the effect of observation history on longer-horizon tasks remains unknown.
— Read More, Think More: Revisiting Observation Reduction for Web Agents
(2604.01535 - Enomoto et al., 2 Apr 2026) in Limitation, Section: Observation history with richer representations