Memory-Augmented Prompting and Planning

Updated 29 January 2026

Memory-Augmented Prompting and Planning is a framework that integrates LLMs with external memory modules to enhance context-aware sequential decision-making.
It employs modular architectures that separate routing, task planning, and memory retrieval, enabling dynamic context management and robust multi-stage execution.
Empirical results demonstrate significant improvements in recall, planning accuracy, and long-horizon performance across domains like robotics, games, and automation.

Memory-augmented prompting and planning refers to a family of agent architectures, algorithms, and methodologies that combine LLMs, explicit external or internal memory systems, and retrieval-based prompt engineering to support robust, context-aware sequential decision-making in complex environments. This paradigm is a response to the limitations of standard in-context learning, which, due to bounded input length and the lack of persistent episodic memory, leads to failure modes in long-horizon reasoning, multi-stage planning, and continual task execution. Memory augmentation enables the retrieval and integration of relevant past experiences, observations, or heuristics into the LLM's prompt or internal processing, fundamentally improving both planning depth and recall.

1. Memory-Augmented Prompting Architectures

Contemporary memory-augmented prompting and planning systems structurally modularize the agent into functionally specialized roles, often separating (1) request handling/routing, (2) task planning, and (3) memory or knowledge base management. For instance, in embodied household robotics, a three-agent orchestration is deployed:

A Routing Agent classifies user queries (action, history, clarification) and dispatches to either a Task Planning Agent or a Knowledge Base Agent.
The Task Planning Agent uses past context and current scene objects—obtained via object detection modules such as Grounded SAM and VLMs—to build context-informed prompts for LLM-driven action planning.
The Knowledge Base Agent answers history/location queries by retrieving and presenting memory context through retrieval-augmented generation (RAG) (Glocker et al., 30 Apr 2025).

Agents in real-time strategy games, such as MASMP, leverage natural language-driven state machine prompting, pairing an explicit symbolic Finite State Machine (FSM) structure with light-weight episodic memory containing high-level tactical variables (Qi et al., 21 Oct 2025). Similarly, business document agents and mobile task automation frameworks such as Matrix and MapAgent use instruction-driven, multi-turn architectures with memory-augmented prompting layered over base LLMs (Liu et al., 2024, Kong et al., 29 Jul 2025).

2. Formal Memory Representations and Retrieval Schemes

A memory-augmented planner typically maintains one or more explicit memory stores (chronological logs, trajectory banks, or heuristic databases). At each planning or decision point, it retrieves a small, contextually relevant subset for in-context prompt augmentation:

Chronological Text/Action Memory: Growing log of past commands, actions, or question-answer pairs, indexed by timestamp (Glocker et al., 30 Apr 2025).
Key-Value or Structured Memory: Stores distilled heuristics, patterns, or domain-specific knowledge as (key, value) pairs, supporting similarity retrieval by key or vector-space encoding (Liu et al., 2024).
Trajectory/GUI Page Memory: Sequential page chunks or state-action pairs extracted from historical interaction trajectories, each summarized via multi-modal encoders and stored for retrieval (Kong et al., 29 Jul 2025).
Scene Graphs and Semantic Memories: Structured (often graphical) representations of persistent environmental state, such as 3D scene graphs for object layout in embodied agents (Wang et al., 2024), or Temporal Embodied Knowledge Graphs (TEKG) for factual world state in continual agents (Yoo et al., 10 Sep 2025).
Episodic Memory with Attention: Models such as JARVIS-1 combine text and visual embeddings (CLIP) over instructions, observations, and plans, enabling multi-modal memory retrieval (Wang et al., 2023).
Hierarchical Memory Banks: Transformers with intra-step (short-term) and cross-step (long-term) caches, as in UniWM, support both immediate observation context and long-range trajectory consistency (Dong et al., 9 Oct 2025).
Demonstration-derived Soft Prompts: As in MAP-VLA, stage-specific memory units are stored as learnable soft prompt embeddings, optimized through prompt tuning and dynamically retrieved during execution (Li et al., 12 Nov 2025).

Retrieval is generally executed by encoding the current query or decision context into a shared embedding space (e.g., BGE-M3, MiniLM, CLIP, or text-embedding-v3), computing similarity (typically cosine), and selecting the top- $k$ relevant memory chunks: $c = \mathrm{RAG}(M, q) = \{m_{i_1},\dots,m_{i_k}\}, \quad i_j = \arg\max_{i}\,\mathrm{sim}(\mathrm{emb}(q),\,\mathrm{emb}(m_i))$ Selecting $k$ balances prompt context window constraints and recall coverage.

3. Prompt Engineering, Planning, and Memory Fusion

Prompt construction in memory-augmented systems is highly structured, interleaving system-level instructions, the retrieved memory context, current observations, and the user or subtask command. Notably:

Chain-of-thought prompts: For planning agents, the context may include explicit chain-of-thought reasoning steps, structured objects lists, and justifications for action selection (Glocker et al., 30 Apr 2025, Liu et al., 2024).
Memory pre-pending and fusion: The retrieved memory (whether bullet points, plan snippets, state variables, or demonstration segments) is inserted immediately after system instructions and before new input (Wang et al., 2023, Qi et al., 21 Oct 2025).
Soft prompt augmentation: In MAP-VLA, retrieved memory prompts are added elementwise to base prompts encoding the current observation, with soft weighting based on stage similarity (Li et al., 12 Nov 2025).
Dynamic context management: Coarse-to-fine hierarchical prompting, as in MapAgent, supports decomposing a high-level user request into app-specific subtasks and further into UI-level interactions, injecting the most relevant GUI page traces at each stage to reduce hallucination and improve grounded planning (Kong et al., 29 Jul 2025).
Temporal and mutual-information filtering: ExRAP employs entropy thresholds and graph-based memory decay to prevent reliance on obsolete or stale context, ensuring temporal consistency in retrieval and planning (Yoo et al., 10 Sep 2025).

4. Empirical Results and Domain Applications

Memory-augmented prompting and planning architectures demonstrate consistent gains in diverse domains:

Framework	Application	Main Empirical Gains
(Glocker et al., 30 Apr 2025)	Household robotics	+37.6 pp recall in QA, +27% planning accuracy with RAG
(Qi et al., 21 Oct 2025)	RTS games	60% win rate vs 0% baseline on hardest AI; long-term tactical coherence
(Liu et al., 2024)	Document extraction	+30.3% success over CoT prompting, +35.2% over vanilla agent
(Kong et al., 29 Jul 2025)	Mobile automation	80% success (English, CHOP), +10.6% over baseline; 20% relative gain from memory/verification modules
(Wang et al., 2024)	Embodied agents	×1.3–2.3 composite/complex SR, ×3.4–62.7 speedup vs prior
(Wang et al., 2023)	Minecraft agent	5× improvement on long-horizon tasks; scaling with memory size
(Dong et al., 9 Oct 2025)	Visual navigation	+30% navigation SR, halved pose error; largest marginal gain from cross-step memory
(Li et al., 12 Nov 2025)	Robotic manipulation	+7% simulation, +25% real-robot gains; strong robustness to visual/temporal shift
(Yoo et al., 10 Sep 2025)	Continual instruction following	+16–27pp SR, 3–13 fewer steps; gains scale with nonstationarity/instruction load

Across settings, the integration of memory enables statistically significant improvements in recall, sample efficiency, long-horizon performance, and consistency, without requiring additional model fine-tuning.

5. Taxonomy and Design Patterns

Memory-augmented prompting and planning systems instantiate several recurring design patterns:

Explicit Memory Module: Textual, trajectory-based, key-value, graph-structured, or demonstration-indexed storage, decoupled from the LLM weights.
Neural Retrieval Function: Embedding-based nearest neighbor, often with cross-modal fusion (text, image, UI layouts).
Specialized Prompt Construction: Structured concatenation/interleaving of system directives, dynamic memory context, and live observations/actions.
Modular Agent Orchestration: Routing, memory, and task-planning agents with clear separation of classification, reasoning, and retrieval responsibilities.
Plug-and-Play Memory Augmentation: Compatibility with frozen backbones or off-the-shelf LLMs via prompt/injection, not end-to-end gradient updates (e.g., MAP-VLA, Matrix, JARVIS-1).
Temporal Management and Decay: Confidence decay, entropy gating, or frequency-based memory replacement to counteract staleness and promote efficient exploration (Yoo et al., 10 Sep 2025, Wang et al., 2024).

6. Limitations, Future Directions, and Generalization

Identified limitations of current memory-augmented architectures include:

Memory scaling: Retrieval efficiency and relevance may degrade as memory size grows with tasks and episodes.
Prompt window constraints: The number and length of retrieved memory units are bounded by the LLM's context window.
Generalization and semantic retrieval: L2-based or simple vector similarity metrics can fail under state noise, fast dynamics, or highly variable semantics. There is an ongoing push toward learnable retrieval metrics and hierarchical aggregation (Li et al., 12 Nov 2025).
Domain-specific engineering: Stage segmentation, demonstration alignment, and crafting of in-context exemplars often require significant manual or semi-automated effort.

Future extensions include:

Learnable, hierarchical, or continual memory aggregation: Summarizing many experiences into compact, adaptive prompts or prototypes.
Integration of temporal logic and event abstraction: E.g., combining TEKGs, episodic memory, and symbolic constraint tracking for continual agents (Yoo et al., 10 Sep 2025).
Stronger grounding via cross-modal and scene-graph memory fusion: Particularly in mobile robotics and complex manipulation.
Unified architectures unifying world modeling and planning: Jointly optimizing foresight and action in a single memory-augmented causal backbone (Dong et al., 9 Oct 2025).

Empirical evidence suggests that these ingredients are not limited to household robotics or game playing, but generalize to customer service, medical diagnosis, software agent orchestration, and other domains requiring context-aware, explainable, and robust agent behaviors (Glocker et al., 30 Apr 2025). The key is the explicit separation and efficient fusion of episodic memory, context retrieval, and prompt engineering.

7. Concluding Remarks

Memory-augmented prompting and planning, as implemented across multiple agent paradigms, constitutes a new baseline for LLM-based sequential decision-making. By equipping agents with explicit and efficiently retrievable memory, these systems consistently outperform memoryless or naive prompting strategies on long-horizon, multi-stage, and open-world tasks. The modularity of memory augmentation enables lightweight adaptation and integration with existing LLMs, with robust empirical improvements evidenced across document understanding, mobile automation, robotic manipulation, embodied navigation, and game domains (Glocker et al., 30 Apr 2025, Qi et al., 21 Oct 2025, Liu et al., 2024, Kong et al., 29 Jul 2025, Wang et al., 2024, Wang et al., 2023, Li et al., 12 Nov 2025, Dong et al., 9 Oct 2025, Yoo et al., 10 Sep 2025).