Memory-Augmented Planning

Updated 2 February 2026

Memory-Augmented Planning is a method that integrates structured memory modules with planning agents to record historical context and enhance decision-making.
It employs techniques like constraint pinning, iterative feedback accumulation, and retrieval from past episodes to refine planning in complex environments.
These approaches improve performance across applications such as robotics, navigation, and scheduling, yielding measurable gains in success rates and efficiency.

Memory-augmented planning refers to a class of methods and system architectures in artificial intelligence that explicitly leverage structured memory states to improve sequential decision-making, constraint tracking, and adaptation during planning tasks. Recent research demonstrates that incorporating external or context-sensitive memory—distinct from standard model parameters or short-term internal states—provides substantial gains in both classic algorithmic planning and new LLM-powered agent frameworks.

1. Foundational Concepts and Definitions

Memory-augmented planning is characterized by the integration of structured memory modules alongside (or within) a planning agent, which can take the form of neural controllers, LLM-based agents, or hybrid pipelines. The memory acts as an external or distributed workspace that records historical context: symbolic constraints, failed attempts, feedback logs, perceptual information, or environmental states. Architectures range from classic differentiable neural computers with hard attention (Tanneberg et al., 2019), to modular multi-agent frameworks with role-separated memory banks (Fan et al., 1 Nov 2025), and retrieval-augmented LLM agents storing episodic trajectories (Kagaya et al., 2024). Key mechanisms include:

Constraint pinning: Ensuring all hard constraints persist across multistep reasoning cycles (e.g., via static rule memory).
Iterative feedback accumulation: Recording verification output after each candidate plan, guiding successive plan refinements.
Environmental knowledge graphs: Encoding spatio-temporal context for embodied or navigation agents (Yoo et al., 10 Sep 2025, Lei et al., 14 Feb 2025).
Multimodal and hierarchical memory: Integrating textual, visual, and cross-step historical cues in deep memory banks (Dong et al., 9 Oct 2025, Wang et al., 2023).

This separation between memory and computation enables the agent to avoid "constraint drift," recover from trial-and-error, and generalize to long-horizon or partially observable domains.

2. Memory Module Design and Lifecycle

Memory-augmented planning frameworks instantiate memory banks with explicit representations, defined update rules, and controlled access protocols.

Dual-Evolving Memory (EvoMem) (Fan et al., 1 Nov 2025):

Constraint Memory (CMem): Static set of constraints per query; written once by the constraint extractor and read on every plan generation/verification turn.
Query-feedback Memory (QMem): Dynamic log of (plan, score, errors) tuples; updated at each failed verification, read by the actor for correction, reset per query.

Temporal Knowledge Graphs (ExRAP) (Yoo et al., 10 Sep 2025), STMA (Lei et al., 14 Feb 2025):

Knowledge graphs encode entity-relation-timestamp quadruples, updating with new observations and removing contradictory facts.
Temporal summarization via LLM modules or relation extractors yields compact beliefs for the planner.

Multimodal Key-Value Stores (JARVIS-1 (Wang et al., 2023), MapAgent (Kong et al., 29 Jul 2025), RAP (Kagaya et al., 2024)):

Repositories store episodic task records, objects, visual context, and plans.
Retrieval is performed via text and, when available, visual embedding similarity (typically CLIP or SBERT).

Hierarchical Memory (UniWM (Dong et al., 9 Oct 2025)):

Intra-step (short-term) memory captures perceptual cues from the current scene.
Cross-step (long-term) memory aggregates intra-step banks to encode extended trajectory context via timestamped layers.

The memory lifecycle includes initialization, writing/append, retrieval for in-context reasoning, and explicit reset or consolidation (per query, episode, or time horizon).

3. Algorithms and Planner Integration

Memory-augmented planners employ explicit interfaces between memory banks and decision modules—often in a multistage or multi-agent workflow.

Tri-agent Loop (EvoMem) (Fan et al., 1 Nov 2025):

function SolveQuery(query, T):
    CMem ← ConstraintExtractor(query)
    QMem ← []
    for t in 1..T:
        plan_t ← Actor(query, CMem, QMem)
        (score_t, errors_t) ← Verifier(plan_t, CMem)
        if score_t == 100:
            return plan_t
        append QMem with (plan_t, score_t, errors_t)
    return BestOf(QMem) or plan_T

Plan generation is grounded in static constraints and dynamic feedback, eliminating undirected sampling.

Exploratory Retrieval-Augmented Planning (ExRAP) (Yoo et al., 10 Sep 2025):

Planning is driven by maximizing a composite criterion integrating exploitation value (task progress) and exploration bonus (mutual information gain on environmental queries).

MapAgent Coarse-to-Fine Pipeline (Kong et al., 29 Jul 2025):

Coarse-grain planning decomposes tasks into app-specific subtasks.
Fine-grain planning uses retrieved page-memory chunks to generate GUI-aware plans.
Execution alternates LLM-generated decision steps with judge-mediated evaluation.

Retrieval-Augmented Planning (RAP) (Kagaya et al., 2024):

At each step, the Reasoner queries memory for similar successful episodes, retrieves locally relevant subsequences, and incorporates them as demonstrations for the LLM/VLM Executor.

Memory Augmented Control Networks (MACN) (Khan et al., 2017), Symbolic Neural Architectures (Tanneberg et al., 2019):

Explicit read/write heads implement search and backtrack routines, storing landmark states or planning traces for partial observability and long-horizon combinatorial tasks.

4. Empirical Evaluation and Benchmark Results

Quantitative analysis consistently indicates substantial performance gains from memory augmentation. Selected results from recent literature:

Task/Benchmark	Method (Backbone)	Success Rate / Metric	Gain over Baseline
Trip Planning (NaturalPlan)	EvoMem (Gemini-1.5-Pro)	52.08% (Exact Match)	+11.17 pt
Calendar Scheduling	EvoMem (Gemini-1.5-Pro)	63.26%	+2.56 pt
VirtualHome (SR/PS)	ExRAP	55.14% / 11.33	+15.25 pp SR
Household Robotics QA	RAG-enabled LLM	91.3% Validity (Qwen2.5-32B)	+37.5 pt
Minecraft DiamondPickaxe	JARVIS-1	8.99% (vs DEPS 2.42%)	~5×
Mobile Automation (SPA)	MapAgent	0.553 / 0.350	+0.106 / +0.15
Motion Planning time	Motion Memory	up to –89% (time)	up to 89% faster
TextWorld (Success Rate)	STMA (Qwen2.5-72b)	+31.25 pp
ALFWorld (Text)	RAP	85.8% (vs 52.2%)	+33.6 pp

Across these domains, ablations show that both memory presence and the details of retrieval/update (dual memory, experience database, knowledge graph) are individually necessary for optimal results (Fan et al., 1 Nov 2025, Yoo et al., 10 Sep 2025, Kong et al., 29 Jul 2025, Kagaya et al., 2024).

5. Architectural Patterns and Theoretical Analysis

Memory-augmented planning systems implement a range of architectural paradigms:

Explicit memory-controller separation (classic DNC-based, Harvard/von Neumann (Tanneberg et al., 2019, Khan et al., 2017)): ideal for algorithmic symbolic planning, zero-shot scaling, and partial observability challenges.
Role-separated multi-agent frameworks (EvoMem (Fan et al., 1 Nov 2025)): constraint extraction, plan generation, and verification operate on distinct memory spaces, paralleling cognitive psychology models of working memory.
Retrieval-augmented instruction following (ExRAP (Yoo et al., 10 Sep 2025), RAP (Kagaya et al., 2024), JARVIS-1 (Wang et al., 2023), MINDSTORES (Chari et al., 31 Jan 2025)): successful episodes, feedback, or plans are indexed and injected via prompt engineering or embedding similarity.
Hierarchical and multimodal memory (UniWM (Dong et al., 9 Oct 2025), MapAgent (Kong et al., 29 Jul 2025)): enables effective reasoning across perceptual, contextual, and temporal axes, facilitating robust long-horizon navigation.

Theoretical analyses, including sampling bias preservation in motion planning (probabilistic completeness and optimality (Das et al., 2023)), underpin the reliability of retrieval-augmented frameworks. Overhead in memory storage and prompt length is noted as an active challenge (Fan et al., 1 Nov 2025, Glocker et al., 30 Apr 2025, Kagaya et al., 2024).

6. Limitations and Future Research Directions

Although memory augmentation yields clear gains, several persistent limitations have been identified:

Query-locality and lack of lifelong memory: Most frameworks reset or discard memory at query/episode boundaries, limiting cross-task generalization (Fan et al., 1 Nov 2025).
Manual/prompt-engineered retrieval and updates: Few systems learn retrieval policies or memory management end-to-end (Kagaya et al., 2024).
Scalability of storage and retrieval: Linear scaling, limited token budgets, and noisy retrieval in uncurated or lengthy episodes remain unresolved (Glocker et al., 30 Apr 2025, Chari et al., 31 Jan 2025).
Limited integration with external reward signals: Few works blend RL-style reward signals with memory update criteria (Chari et al., 31 Jan 2025).
Multimodal fusion for perceptual environments: Dynamic environments and UI-driven contexts challenge purely language-based memory; future work foresees schema-validation and hierarchical consolidation (Dong et al., 9 Oct 2025, Glocker et al., 30 Apr 2025).

Proposed directions include:

End-to-end tuning of memory modules and retrievers (Yoo et al., 10 Sep 2025, Chari et al., 31 Jan 2025).
Hierarchical consolidation to bound memory and enhance abstraction (Kagaya et al., 2024).
Online adaptation of retrieval weightings for exploration/exploitation tradeoffs (Yoo et al., 10 Sep 2025).
Hybrid symbolic-vector, multimodal, and spatial memory schemas for robust embodied planning (Lei et al., 14 Feb 2025, Kong et al., 29 Jul 2025, Dong et al., 9 Oct 2025).

7. Context and Significance in AI Planning

Memory-augmented planning bridges foundational principles from symbolic AI, cognitive psychology, and deep learning. By explicitly structuring memory, these systems forestall constraint drift, enable causal correction, and support continual improvement—approaching human-like iterative reasoning. The ongoing shift from pure end-to-end sequence modeling to hybridized, memory-centric frameworks is evidenced across domains: combinatorial puzzles, household robotics, embodied navigation, and open-world gaming.

A plausible implication is that further advances in scalable, adaptive memory modules—especially ones that selectively summarize, consolidate, and retrieve over diverse modalities—will constitute the next growth frontier for interpretable, robust, and generalizable planning systems. Empirical results to date suggest that such systems are critical to overcoming the bottlenecks of vanilla LLM planning, especially in non-stationary, partially observable, and multi-agent environments.

Key references: (Fan et al., 1 Nov 2025, Yoo et al., 10 Sep 2025, Glocker et al., 30 Apr 2025, Tanneberg et al., 2019, Wang et al., 2023, Kong et al., 29 Jul 2025, Das et al., 2023, Khan et al., 2017, Dong et al., 9 Oct 2025, Lei et al., 14 Feb 2025, Chari et al., 31 Jan 2025, Kagaya et al., 2024)