Hierarchical Hindsight Reflection

This presentation introduces Hierarchical Hindsight Reflection (H²R), a methodology that enables large language model agents to learn from experience through a dual-memory architecture. By separating high-level planning knowledge from low-level execution strategies, H²R allows agents to distill, organize, and reuse insights from past interactions. We explore how this hierarchical design enables compositional learning, examine the reflection process that builds these memories, and demonstrate empirical results showing substantial performance gains on complex planning benchmarks.
Script
Large language model agents face a fundamental challenge: how do you learn from past experience in a way that transfers to new tasks? Most approaches store everything in one flat memory, losing the distinction between what to do and how to do it. Hierarchical Hindsight Reflection solves this by building two separate but complementary memories, one for strategic planning and one for precise execution.
H²R decouples agent memory into two complementary hierarchies. The high-level memory holds abstract task breakdowns and planning insights, answering the question of which subgoals to pursue. The low-level memory stores successful action sequences tied to specific subgoals, grounding those intentions into executable steps. This separation is the key: when facing a new task, the agent can independently retrieve relevant strategies and concrete skills, then recombine them in novel ways.
How does the agent construct these memories in the first place?
H²R builds its memories through a process of nested reflection over past interactions. First, it infers subgoals from both successful and failed episodes, then contrasts those decompositions to extract planning insights. Next, it partitions successful trajectories by subgoal, comparing execution details to distill low-level insights. Crucially, this entire process operates through iterative prompting, no trainable parameters are updated. The result is a structured, queryable memory built purely from experience.
At test time, H²R performs independent vector-based retrieval over each memory hierarchy. When planning, the agent queries high-level memory using the task description, pulling in subgoal decompositions and planning insights from similar past tasks. When executing, it queries low-level memory using the current subgoal, retrieving concrete action sequences that successfully achieved that objective before. This dual retrieval enables flexible, compositional transfer.
The results validate the hierarchical approach. H²R achieves 75.9% success on household tasks in AlfWorld and 80.5% on strategic planning in PDDLGame, substantially outperforming flat memory baselines. Ablation studies reveal that both memory levels are essential: removing high-level memory drops performance by 27.7 points, removing low-level memory costs 19.4 points. The architecture enables true compositional transfer, agents recombine abstract plans and concrete skills from different past experiences to solve novel challenges.
Hierarchical Hindsight Reflection shows that learning from experience is not just about storing more, it is about organizing knowledge at the right level of abstraction. By separating what to do from how to do it, agents gain the flexibility to recombine insights in ways that monolithic memories cannot support. Visit EmergentMind.com to explore this research further and create your own AI video presentations.