Mnemis: Dual-Route Memory Retrieval
- The paper introduces a dual-route retrieval framework that combines fast similarity search (System-1) with deliberate hierarchical search (System-2).
- Mnemis utilizes two orthogonal memory graphs—a base graph for local cue-driven retrieval and a hierarchical graph for global semantic traversal.
- Empirical results demonstrate state-of-the-art accuracy on long-term memory benchmarks, outperforming traditional RAG and related retrieval models.
Mnemis: Dual-Route Retrieval
Mnemis is a memory-augmented retrieval framework for LLMs that operationalizes the dual-route theory of memory retrieval by integrating parallel fast (System-1) and slow (System-2) mechanisms. Motivated by limitations of existing retrieval-augmented generation (RAG) and Graph-RAG methodologies—which are predominantly similarity-based and thus ill-suited for complex or global information needs—Mnemis formalizes retrieval as the joint output of a similarity-driven base graph and a hierarchical graph supporting deliberate, structured searches. This architecture is informed by cognitive models of human memory, empirical evidence for dual-store representation, and recent advances in adaptive and metacognitive reasoning in neural systems (Tang et al., 17 Feb 2026).
1. Theoretical Motivation and Cognitive Foundations
Dual-route retrieval draws on cognitive architectures that postulate two separable memory processes: a fast, local, cue-driven route (System-1) and a slower, more deliberative, globally planning route (System-2) (Tang et al., 17 Feb 2026, Zheng, 22 Jan 2026, Reimann, 13 May 2025, Yoshida et al., 17 Feb 2025). In Mnemis, these are instantiated as (i) efficient similarity search over raw memory episodes and entities (System-1), and (ii) global selection via top-down traversal of semantic hierarchies (System-2). Empirical results from psycholinguistics and computational neuroscience converge on the necessity of parallel stores and retrieval cues for both lexical and higher-order structural information (Yoshida et al., 17 Feb 2025). Competing architectures such as AMOR (entropy-gated attention over SSMs) and Decide–Then–Retrieve frameworks (uncertainty-gated dual-path RAG) further validate adaptive, context-dependent engagement of retrieval strategies (Zheng, 22 Jan 2026, Chen et al., 7 Jan 2026).
2. Architectural Components and Data Structures
Mnemis maintains two orthogonal, graph-structured memory representations:
- Base Graph (System-1): Comprises nodes for episodes (text chunks with embeddings and timestamps), entities (named concepts with vector and summary representations), and edges (relations with fact embeddings), as well as episodic links from entities to all associated episodes. Embeddings are indexed using approximate nearest neighbor methods; textual content is indexed via BM25.
- Hierarchical Graph (System-2): A multilayered, category-based abstraction where each node can represent conceptual groupings (categories) or lower-level entities; edges encode many-to-many hierarchical relations. The hierarchy is constructed and periodically updated using LLM-driven semantic clustering and guided by principles of minimum abstraction, redundancy, and compression efficiency.
This dual-representation architecture enables both flat, vector-based similarity search and top-down, semantically guided traversals, supporting recall at differing granularities and abstraction levels (Tang et al., 17 Feb 2026).
3. Dual-Route Retrieval Algorithms
3.1 System-1 (Similarity Search)
Given a query , Mnemis computes the query embedding , retrieves top- candidates from the embedding index (episodes, entities, edges) and BM25, and merges results via reciprocal rank fusion (RRF):
where is the rank of item in retrieval method . Output is a reranked candidate set under a fixed retrieval budget.
3.2 System-2 (Global Selection)
The hierarchical memory is traversed top-down. The process:
- Selects relevant top-layer categories via LLM-based selectors.
- Descends one layer at a time, aggregating all child nodes (entities) associated with relevant parents at each level.
- At layer 0, collects all entities, then retrieves all adjacent edges and episodes.
Pseudocode (summarized):
1 2 3 |
for l = L, L-1, ..., 1:
S_{l-1} = {c in C_{l-1} | (p -> c) in CategoryEdges && LLM_Select(q, p, c)}
Collect entities S_0, then all connected edges and episodes |
3.3 Integration and Hybrid Reranking
Let (System-1) and (System-2) be the retrieved sets. Final selection is via a learned reranker , optionally interpolated with System-1’s RRF scores:
The reranked set is truncated to the final context window for answer generation (Tang et al., 17 Feb 2026).
4. Algorithmic and Mathematical Formalism
The Mnemis model leverages the following retrieval, ranking, and traversal formulations:
- Vector similarity:
- Multi-modal fusion: System-1 and System-2 outputs are combined into an unordered memory context, scored uniformly by a reranker.
Complexity per query is (ANN), (BM25), (hierarchy traversal, with layers and branching factor ), and for reranking.
5. Empirical Results and Comparative Evaluation
5.1 Long-term Memory Benchmarks
| Model | LoCoMo | LongMemEval-S |
|---|---|---|
| RAG | 73.8 | 72.6 |
| EMem-G | 85.3 | 84.9 |
| EverMemOS | 92.3 | 82.0 |
| Mnemis (k=30) | 93.9 | 91.6 |
On LoCoMo (multi-session human conversations: average 16k tokens/session, 2k test questions) and LongMemEval-S (500 sessions, 115k tokens/session), Mnemis achieves state-of-the-art retrieval accuracy (Tang et al., 17 Feb 2026). Ablations confirm additive contributions: System-1 only (89.1), System-2 only (87.7), joint (93.3). Multi-hop and enumerative queries show particular gains from System-2, while strictly local or temporal queries remain System-1 dominant.
5.2 Route-specific Gating and Adaptivity
Comparative results from AMOR and Decide–Then–Retrieve underscore the advantages of metacognitive gating: in AMOR, entropy-thresholded attention achieves perfect retrieval accuracy on synthetic copy tasks while firing expensive attention on only 22% of positions; in Decide–Then–Retrieve, uncertainty-gated dual-path retrieval improves both EM and F1 while reducing unnecessary retrievals (Zheng, 22 Jan 2026, Chen et al., 7 Jan 2026).
6. Variants and Extensions Across Modalities
The dual-route paradigm appears across diverse architectures and domains:
- Non-associative algebraic memory: Reimann’s algebra produces two distinct states (L-“recency,” R-“primacy”) using non-associative bundling, mapping to short-term and long-term memory circuits, and demonstrating serial position curves observed in human recall (Reimann, 13 May 2025).
- Transformer induction heads: Discrete routes correspond to verbatim token-copying heads and concept-level semantic heads, mixed via a routing controller for maximum generality (Feucht et al., 3 Apr 2025).
- Psycholinguistic models: Dual-store attention over both token and syntactic representations independently predicts human reading times, providing cross-species and neural evidence for parallel retrieval architectures (Yoshida et al., 17 Feb 2025).
7. Limitations, Open Challenges, and Future Directions
Current Mnemis implementations rebuild the hierarchical graph in batch rather than incrementally, support only text modalities, and depend on LLM-driven prompt engineering for System-2 traversal, which may omit relevant nodes under suboptimal prompting. Promising directions include multimodal graph extensions (images, tables), incremental/hot updates, graph-active retrieval via learned policies, and tighter fusion with joint scoring models (Tang et al., 17 Feb 2026).
A plausible implication is that dual-route retrieval—both in cognitive systems and in scalable machine memory—offers a generic, quantifiably interpretable solution for efficient and robust long-horizon information access, reconciling fast similarity-based access with global, structure-aware selection in a rigorously integrated framework.