HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Published 15 Feb 2026 in cs.AI | (2602.13933v1)

Abstract: LLM agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a hybrid memory design that integrates summarized event units with raw dialogue text, enabling adaptive retrieval through a dual-tier scheduling mechanism.
It employs a lightweight vector search for simple queries and a deep LLM-based module for complex multi-hop reasoning, balancing speed and accuracy.
Empirical benchmarks show HyMem increases multi-hop accuracy by 10.46% and reduces inference tokens by 92.6%, setting a new state-of-the-art in efficiency.

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Motivation and Problem Formulation

Long-context reasoning in LLM-based agents remains fundamentally constrained by memory management bottlenecks: static architectures either sacrifice retrieval quality through aggressive memory compression or waste computation by processing raw, redundant text for all queries. The inability to support dynamic, fine-grained, and context-adaptive memory retrieval leads to degraded reasoning, irreversible information loss, and high latency, especially in multi-hop and complex scenarios. HyMem addresses these limitations via a hybrid memory design equipped with dynamic retrieval scheduling, drawing explicit inspiration from the cognitive economy observed in human recall systems.

Architecture and Methodology

HyMem implements a dual-granularity memory architecture:

Level-1 Memory (Summarized Event Units): Dialogue is automatically partitioned into discrete event-based units. Key facts (time, participants, event) are distilled into dense summaries, mapped to semantic vectors to support efficient similarity search. These units serve as both retrieval anchors and as lightweight context for simple queries.
Level-2 Memory (Raw Dialogue Text): Uncompressed text for each event is preserved and linked to its Level-1 summary, ensuring lossless backtracking when complex queries demand more detailed reasoning.

A dynamic two-tier retrieval system allocates memory resources based on real-time assessment:

Lightweight Memory Module: All initial queries are processed via fast vector search over Level-1 summaries, supporting the majority of simple, single-fact retrievals at negligible computational cost.
Deep Memory Module: If contextual completeness is not achieved (as determined by a strict assessment of relevance and sufficiency), the deep module is activated. It first narrows the search space with coarse-grained vector semantics before using an LLM-based retriever to explicitly select all causally and logically linked event units, reconstructing maximal context from Level-2 memories for complex multi-hop and compositional reasoning tasks.

An auxiliary reflection module assesses answer completeness post-retrieval. If a response fails to address the core requirement or exhibits evidence of information loss or reasoning gaps, the reflection module autonomously rewrites the query, triggering further retrieval and enabling multi-turn iterative refinement.

Empirical Results

On the LOCOMO and LongMemEval benchmarks, HyMem establishes new state-of-the-art accuracy under strict computational constraints:

On LOCOMO, it surpasses all baselines in overall accuracy and demonstrates substantial gains on multi-hop (+10.46%) and open-domain (+5.20%) tasks compared to full-context methods.
On LongMemEval, HyMem maintains a leading score of 75.00%, confirming consistent robustness to ultra-long contexts.

Efficiency analysis reveals that HyMem reduces average inference token usage by 92.6% compared to naive full-context baselines, without sacrificing accuracy. The system consistently delivers strong performance even at low memory retrieval budgets, and its efficiency stems directly from its ability to process over 70% of queries solely through the lightweight module.

Ablation studies highlight the key contributions of each architectural component. Removing the deep module or Level-2 memory results in up to a 17.4% accuracy deficit, confirming the necessity of dual-granularity design. The reflection module provides further resilience for multi-step dialogue reasoning, and the method's event-level compression maintains superior information density per token versus generic distillation techniques.

Theoretical and Practical Implications

HyMem formalizes a principle—dynamic cognitive economy—that has broad implications for scalable LLM agent memory:

Expressive Adaptivity: The hybrid system enables agents to modulate information granularity in situ, allowing for context-appropriate recall in production deployments without retraining base models. This supports seamless integration with both open and closed-source LLMs.
Robust Iterative Reasoning: The separation of context retrieval, answer generation, and post-hoc reflection not only maximizes resource utilization but also provides a strong defense against hallucination and error propagation, a critical aspect for safety-sensitive deployments.

Practically, HyMem’s architecture sets a new paradigm for retrieval-augmented systems: it balances the competing requirements of efficiency, flexibility, and reliability without relying on fine-tuning or massive engineering overhead. Its modular design admits future improvements—such as personalized memory granularity, integration with structured knowledge graphs, and active forgetting mechanisms—while providing theoretical insights into cognitive control for artificial agents.

Future Directions

The HyMem framework suggests several avenues for research advancement:

Granularity Optimization: Dynamically learning optimal partitioning and compression strategies per user or task could yield further gains in both efficiency and stability for personalized agents.
Integration with Planning and Decomposition: Tighter coupling between memory scheduling and advanced agent planning pipelines may facilitate more complex, multi-agent or multi-domain dialogues.
Theoretical Analysis: Formal characterization of the trade-off surfaces between retrieval granularity, reasoning capacity, and agent reliability in long-horizon, open-domain settings remains an open challenge.

Conclusion

HyMem brings a principled, architecture-driven approach to the core challenge of adaptive long-term memory for LLM agents, demonstrating that cognitive-economical dynamic retrieval can achieve significant improvements in both statistical accuracy and computational efficiency. These results underscore the value of hybrid, modular frameworks for building scalable, resilient, and resource-aware AI systems, and lay foundational groundwork for the next generation of persistent, reasoning-capable agent architectures (2602.13933).