MemGen: Generative Latent Memory for LLMs

Updated 7 October 2025

MemGen is a dynamic generative latent memory framework that interleaves explicit memory synthesis with LLM reasoning to augment contextual understanding.
It integrates a Memory Trigger for detecting key reasoning junctures and a Memory Weaver for generating latent tokens that refine subsequent generation.
Empirical studies show MemGen outperforms traditional memory systems by up to 38.22% and spontaneously develops cognitive substructures similar to human memory faculties.

MemGen is a dynamic generative latent memory framework designed to endow LLM-powered agents with self-evolving, human-like cognitive faculties. Distinct from parametric and retrieval-based memory systems, MemGen tightly interleaves memory construction and reasoning via explicit, on-demand synthesis of latent memory tokens, enabling agents to recall and augment context through internalized generative mechanisms. The framework comprises two principal modules: a memory trigger, which monitors the agent’s reasoning state to decide when explicit memory should be invoked, and a memory weaver, which generates a sequence of latent tokens that encapsulate relevant experience or state cues. These components function in concert to produce a fluid cycle of memory invocation, construction, and reintegration within the agent’s generative process. Extensive empirical studies demonstrate that MemGen delivers substantial performance gains over existing memory frameworks and spontaneously evolves distinct cognitive substructures, including planning, procedural, and working memory faculties, without explicit supervision (Zhang et al., 29 Sep 2025).

1. Architectural Modules and Workflow

MemGen consists of two interdependent submodules:

Memory Trigger: Implemented as a lightweight LoRA adapter attached to a frozen LLM, the trigger continuously monitors the hidden state sequence $\mathbf{H}_{t,<j}$ at each step $j$ of token generation. An invocation probability

$p_j = \sigma(\mathcal{T}_{\textrm{trigger}}(\mathbf{H}_{t,<j}))$

is computed via a sigmoid function over the current metacognitive state. Based on $p_j$ , a binary decision $d_j \in \{\textrm{INVOKE}, \textrm{SKIP}\}$ is sampled. The trigger typically activates memory weaver operations only at semantically significant locations (e.g., periods, commas), ensuring memory augmentation occurs precisely when beneficial for reasoning.

Memory Weaver: This module is also implemented using LoRA adapters (or optionally in a full-parameter configuration). Upon trigger activation, the weaver takes the contextual hidden states $\mathbf{H}_{t,<j}$ as a "hook" input and synthesizes a structured, fixed-length matrix of latent memory tokens

$\mathbf{M}_t = \mathcal{W}_{\textrm{weaver}}(\mathbf{H}_{t,<j})$

where $\mathbf{M}_t \in \mathbb{R}^{K \times d_{\textrm{model}}}$ . These tokens are prepended to the decoding context and persist throughout subsequent generation, directly influencing the agent's behavior.

This orchestration produces the following cycle during generation:

Step	Component	Operation
Token Generation	Frozen LLM	Updates sequence $\mathbf{H}_{t,<j}$
Trigger Evaluation	Memory Trigger	Computes $p_j$ , decides INVOKE/SKIP
Memory Construction	Memory Weaver	Synthesizes $\mathbf{M}_t$ (if INVOKE), prepends tokens
Context Integration	Reasoner + Weaver	Generates next tokens conditioned on memory

In some variants, the weaver can incorporate semantically matched external embeddings by concatenating retrieved vectors $E_t$ with $\mathbf{H}_{t,<j}$ before memory synthesis:

$\mathbf{M}_t = \mathcal{W}_{\textrm{weaver}}([\mathbf{H}_{t,<j}; \mathbf{E}_t])$

2. Mechanism of Interleaved Reasoning and Memory Generation

The framework executes memory integration on-demand and interleaved with autoregressive reasoning:

Generation Loop: The primary LLM produces candidate tokens and maintains a running sequence of hidden states.
Trigger Monitoring: At each step, the memory trigger inspects $\mathbf{H}_{t,<j}$ , samples the binary decision $d_j$ from $p_j$ .
Weaver Activation: When $d_j = \textrm{INVOKE}$ , the reasoner halts, and the weaver generates latent memory $\mathbf{M}_t$ based on the current hidden state context.
Memory Integration: $\mathbf{M}_t$ is prepended to the hidden state sequence, modifying the prompt for further token generation:

$z_{t,j} \sim \pi_{t}(\cdot | s_t, z_{t,<j}, \mathbf{M}_t)$

This produces a tightly woven progression in which context-rich latent memory dynamically augments reasoning. The adaptive nature of the trigger ensures that memory is introduced only at points deemed critical for subsequent reasoning, reducing unnecessary or distractive augmentation.

3. Experimental Results and Performance Comparisons

MemGen achieves superior performance metrics compared to prevailing external and parametric memory systems across eight benchmarks, including ALFWorld (embodied action), GSM8K (math reasoning), GPQA (scientific reasoning), and KodCode (code generation):

MemGen outperforms ExpeL and AWM by up to 38.22% and GRPO by up to 13.44% in benchmark accuracy.
In continual learning regimes, MemGen preserves knowledge across sequential curricula, thereby mitigating catastrophic forgetting more effectively than baseline systems.
Model variants trained on Qwen3-8B and SmolLM3-3B backbones exhibit notable improvements in task accuracy and generalization when compared to retrieval-based and parametric baselines.

A workflow diagram summarizing this process is described as:

[ Frozen LLM (Reasoner) ]
       │
  Generate Tokens → Hidden States H_{t,<j}
       │
  ──Memory Trigger──► Samples d_j based on p_j
       │             If INVOKE:
       └─────────────► Pause generation
                      │
                      ▼
                [ Memory Weaver ]
                      │
           Synthesizes latent memory M_t
                      │
    Prepend M_t to H_{t,<j} and resume generation

4. Emergence of Human-Analogous Memory Faculties

Unsupervised training of MemGen leads to spontaneous self-organization of distinct clusters of latent tokens that serve specialized cognitive functions:

Planning Memory: Subgroups of memory tokens consistently facilitate high-level planning and the ordered execution of multi-step reasoning. Ablation of these clusters increases planning errors.
Procedural Memory: Other tokens specialize in recalling procedures such as tool invocation or specific answer formatting. Their removal results in a measurable rise in operational failures.
Working Memory: Dedicated clusters maintain short-term context fidelity and consistency over extended reasoning sequences.

These emergent faculties arise without explicit supervision or discrete labels, indicating that generative latent memory is capable of differentiating between cognitive archetypes in a manner analogous to human memory structures. This suggests that Memory Weaver–learned subspaces correspond to functional memory compartments, and points toward the plausibility of more naturalistic machine cognition in LLM agents.

5. Cross-Domain Adaptation and Generalization Capabilities

MemGen demonstrates robust transfer learning and cross-domain generalization:

Training on specific domains such as math reasoning or code synthesis yields improvements not only within those tasks but also in unrelated benchmarks, e.g., training on KodCode enhances MATH problem performance.
Latent memories synthesized by MemGen encode domain-agnostic features that are instrumental across disparate reasoning contexts.
The adaptive frequency of trigger-invoked memory insertion modulates with domain familiarity; frequent invocation on familiar tasks leverages established knowledge, while infrequent usage in novel domains minimizes harmful interference.

A plausible implication is that MemGen's generative latent memory captures abstract representations that transcend task-specific detail, supporting more versatile and scalable self-evolving agents.

6. Significance and Implications for Agent Design

MemGen introduces a fundamentally distinct approach compared to retrieval-based or parametric memory agents by embedding memory synthesis within the generative process of reasoning. This methodology yields agent architectures with greater flexibility, improved long-term knowledge retention, and spontaneous cognitive stratification. The framework alleviates reliance on explicit retrieval databases and avoids the limitations of static parametric adaptation, thereby enabling memory to co-evolve with reasoning state in a context-sensitive, task-adaptive manner.

The empirical findings—specifically, the significant performance increases, resilience to catastrophic forgetting, and unsupervised emergence of human-like memory substructures—substantiate MemGen's suitability as a foundation for future LLM-based self-evolving agents. These capabilities position MemGen as a candidate framework for constructing agents that dynamically internalize experience for reasoning augmentation, with possible applications in decision-making, interactive problem solving, and continual learning contexts across diverse domains.

Markdown Report Issue Upgrade to Chat

References (1)

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MemGen Framework.