- The paper introduces FluxMem, a framework that redefines agent memory by evolving connectivity across multiple layers to improve task performance.
- It employs a three-stage process—initial connection formation, feedback-driven refinement, and long-term consolidation—to dynamically optimize memory structure.
- Experimental results show FluxMem outperforms static and meta-evolving baselines in accuracy and cross-task success rates across diverse platforms.
Rethinking Memory as Continuously Evolving Connectivity: FluxMem Framework
Introduction
The paper "Rethinking Memory as Continuously Evolving Connectivity" (2605.28773) addresses the limitations inherent in static memory architectures for LLM-based agents, proposing a dynamic, connectivity-evolving memory system—FluxMem. Traditional memory frameworks for agents often rely on rigid repositories and fixed retrieval schemas, leading to brittle context integration and suboptimal adaptability. Static memory systems suffer from under-connection and over-connection, inaccurate granularity of memory units, and failure of structural consolidation, which hampers both precision and generalization abilities in dynamic environments.
Figure 1: Depiction of typical failures in static memory architectures, highlighting under-connection and over-connection issues.
FluxMem Architecture
FluxMem operationalizes memory as a continuously editable heterogeneous graph spanning semantic, episodic, and procedural layers. The framework employs a three-stage evolutionary pipeline: (i) Initial Connection Formation, (ii) Feedback-Driven Connectivity Refinement, and (iii) Long-Term Connection Consolidation. This approach enables the memory substrate to adaptively repair missing links, prune redundant associations, reshape unit granularity, and distill task-recurrent successful trajectories into mature procedural circuits.
Figure 2: FluxMem architecture schematic highlighting online and offline stages, with Stage I/II acting per-step and Stage III enabling long-term consolidation.
The semantic layer stores static knowledge (documents, APIs). The episodic layer records full state-action trajectories per task. The procedural layer abstracts distilled skills and templates from experience clusters. During execution, the agent’s local context is dynamically constructed as a task-specific induced subgraph, wherein retrieval, editing, and consolidation operations are sequentially performed to optimize context usefulness.
Three-Stage Memory Evolution
Stage I rapidly establishes preliminary cross-layer associations for the current step using hybrid relevance measures (embedding similarity, BM25 lexical matching, and LLM-based verification). Candidate semantic, episodic, and procedural nodes are retrieved and connected, forming an initial step-local subgraph.
Stage II: Feedback-Driven Refinement
Stage II leverages environmental and self-verification feedback to edit connectivity and unit content. Connection-level misalignments are corrected by link expansion and pruning. Unit-level abstraction mismatches are remedied via content reshaping, aligning semantic granularity with task requirements. Refinement iterations systematically optimize the evidence path within the working context for the current decision step.
Stage III: Long-Term Consolidation
Stage III operates offline, clustering episodic nodes into semantically similar groups, and inducing procedural skill nodes via LLM-based abstraction. Skill induction is guided by Procedure Evolution Maturity Score (PEMS), integrating success rate, skill brevity, and evolution stability. The refinement cycle iterates until convergence (ΔPEMS below ϵ), yielding concise and robust procedural circuits which can be directly reused for recurring tasks.
Experimental Evaluation
FluxMem was benchmarked on three diverse platforms: LoCoMo (long-context reasoning), Mind2Web (web navigation), and GAIA (general assistant tasks). The framework consistently outperformed static and meta-evolving memory baselines across all metrics and subcategories.
- LoCoMo: FluxMem achieved 95.06 average accuracy (GPT-4.1-mini), substantially above Full Context (81.23) and specialized memory OS baselines (EverMemOS at 93.05). On Qwen3-30B-A3B-2507-Instruct, FluxMem scored 93.44, with the closest baseline dropping to 74.87.
- Mind2Web: Without manual element filtering, FluxMem delivered a Cross-Task Success Rate (SR) of 8.1 (GPT-4.1-mini), doubling the best baseline AWM (3.6). On Gemini-2.5-flash, FluxMem reached 9.6 SR, indicating robust adaptability and retrieval precision in noisy, real-world web navigation.
- GAIA: FluxMem raised average success rates from 52.12 (Flash-Searcher) to 64.85 (+12.73% absolute) on Kimi K2, and performed comparably or better than MemEvolve and closed-source frameworks, especially in Level 3 complexity tasks.
Component and Dynamics Analysis
Ablation studies revealed that Stage II (feedback-driven refinement) was critical for memory-centric scenarios (LoCoMo), with the removal causing a pronounced performance drop (95.06% → 85.32%). In task environments requiring strong procedural abstraction (Mind2Web), Stage III (long-term consolidation) was most impactful. The performance profile steadily improved with increasing refinement rounds in Stage II, exhibiting performance saturation as optimal evidence paths were approached.
Figure 3: Ablation and dynamics analysis of FluxMem: impact of each stage, improvements per refinement round, and convergence of memory maturity scores.
PEMS convergence indicated that evolutionary maturity was achieved within a limited number of refinement rounds, effectively preventing redundant consolidation computation post stabilization. Figure 3 (e) illustrates both accuracy and PEMS trends, validating the efficacy of the maturity-guided termination criterion.
Case Study
The paper presents a tabular reasoning task from GAIA, examining FluxMem’s real-time context adaptation. The agent successfully parses schema but fails on aggregation due to misaligned API invocation. FluxMem identifies and expands connectivity toward relevant APIs, and refines procedural skill nodes to adjust for finer-grained metric composition, achieving step-level execution correction.
Figure 4: Case study of topological context adaptation and skill refinement during a GAIA tabular reasoning task.
Implications and Future Directions
Practically, FluxMem's connectivity-evolving paradigm enhances generalization, recall precision, and robustness to environmental feedback in diverse agentic environments. Theoretically, modeling memory as editable connectivity fosters self-organizing structures, facilitating optimal information flow and procedural skill distillation. The evolutionary framework serves as a principled foundation for agent self-evolution literature, aligning with trends in memory operating systems (Hu et al., 15 Dec 2025, Packer et al., 2023), meta-evolution (Zhang et al., 21 Dec 2025), and skill learning (Zhang et al., 2 Feb 2026).
Future innovations could address computational overheads arising from closed-loop LLM operations, dynamic scheduling for offline consolidation, and adaptation to open-world streaming environments with decaying memory regimes. Integration with neurobiologically inspired memory substrates (Gutiérrez et al., 2024), collaborative multi-agent memory (Zhang et al., 9 Jun 2025), and fine-grained sensitivity analyses offer substantial research trajectories.
Conclusion
FluxMem redefines agent memory as dynamic, evolving connectivity, enabling autonomous and robust context engineering for LLM-based agents. By modeling memory evolution across semantic, episodic, and procedural layers, and performing targeted connectivity refinement, FluxMem achieves superior task adaptation and generalization. This work establishes a foundational paradigm for future research in lifelong, self-evolving agent systems.