System-2 Global Selection in Dual-Route Retrieval

Updated 20 February 2026

System-2 Global Selection is a dual-route retrieval approach that integrates deliberate hierarchical search with fast similarity-based methods.
It employs LLM-guided clustering and adaptive gating mechanisms, with benchmarks showing up to 93.9 scores and notable performance drops in ablation studies.
This method bridges cognitive theories and computational models, enhancing interpretability and efficiency in tackling complex long-term memory tasks.

Mnemis defines a hybrid memory retrieval architecture that unifies two complementary computational “routes”—a fast similarity-based (System-1) pathway and a deliberate global selection (System-2) pathway—enabling large-scale models and cognitive systems to achieve high retrieval accuracy, interpretability, and efficiency on complex long-term memory tasks. The core rationale is to integrate both high-throughput and deliberative search mechanisms, echoing dual-process theories in cognitive science, and to instantiate this integration with concrete data structures and adaptive routing mechanisms. Mnemis bridges neuroscientific, algorithmic, and engineering traditions, drawing directly on state-of-the-art implementations and formal analyses.

1. Dual-Route Retrieval Architecture

Mnemis is realized as a dual-graph framework for memory organization and retrieval, maintaining two parallel memory stores:

System-1 (Similarity-Based Base Graph):

A similarity-based store where memory items—episodes (raw text chunks), entities (named concepts), and edges (relations)—are embedded into a high-dimensional vector space and indexed for rapid nearest-neighbor retrieval. This route enables low-latency search analogous to "fast" System-1 cognition (Tang et al., 17 Feb 2026).

System-2 (Hierarchical Global Selection Graph):

A hierarchical structure of category nodes, defined by layered groupings of entities and relations, supporting top-down, global traversal based on semantic structure rather than only proximity. This route implements deliberate, coverage-oriented retrieval, mirroring “slow” System-2 deliberation (Tang et al., 17 Feb 2026).

Each route is associated with a distinct data structure and search algorithm. The base graph leverages approximate nearest neighbor (ANN) indexing (e.g., FAISS, Neo4j’s vector index) alongside textual indices (BM25), while the hierarchical graph is constructed via LLM-guided clustering and maintained as a multilevel, directed acyclic category tree.

2. System-1 and System-2 Retrieval Mechanisms

System-1 and System-2 routes employ different algorithms, corresponding to the representational and operational dichotomy:

System-1:
- Embedding model $f:\text{text}\rightarrow\mathbb{R}^d$ projects queries and memory items into vector space.
- Cosine similarity retrieves top- $k$ episodes, entities, and edges with score $\mathrm{sim}(\mathbf{q}, \mathbf{x}) = \frac{\mathbf{q}^\top \mathbf{x}}{\|\mathbf{q}\|_2\,\|\mathbf{x}\|_2}$ .
- Results from ANN and BM25 searches are merged and reranked with Reciprocal-Rank Fusion (RRF) (Tang et al., 17 Feb 2026).
System-2:
- The LLM clusters entities into higher-level “category” nodes at multiple layers ( $\ell = 1, \dots, L$ ).
- Top-down traversal starts from relevant category nodes selected by LLM scoring, recursively descends to constituent entities, and aggregates their associated edges and episodes.
- The selection criterion at each descent step is a prompt-based LLM gating function (Tang et al., 17 Feb 2026).
Hybrid Integration:

The union of System-1 and System-2 retrieval sets is passed to a learned reranker ( $g_\theta$ ), with final selection controlled via either exclusive or interpolated scoring (Tang et al., 17 Feb 2026):

$\hat{s}(x) = \alpha\,s_1(x) + (1-\alpha)\,g_{\theta}(q,\mathrm{repr}(x)),\quad(\alpha \approx 0.5)$

This design ensures that both fast local similarity and broad global coverage are available, and the routing between these is tunable.

3. Metacognitive Gating and Information-Theoretic Routing

Dual-route architectures such as Mnemis incorporate adaptive routing mechanisms that determine which retrieval path to activate:

Entropy-Based Gating ("AMOR"):

A metacognitive gate monitors predictive entropy at each inference position, routing to the computationally intensive System-2 slow path only when the base System-1 is "uncertain." Specifically, the entropy of the SSM's output is normalized and thresholded via a logistic sigmoid:

$g_t = \mathbb{I}\left[\sigma\left(\alpha(\hat{H}_t - \tau)\right) > 0.5\right]$

where $\hat{H}_t$ is normalized entropy and $\alpha, \tau$ are learnable (Zheng, 22 Jan 2026).

Uncertainty-Guided Triggering (DTR):

In Decide–Then–Retrieve (Chen et al., 7 Jan 2026), retrieval is triggered only when the LLM's autoregressive uncertainty $u(q) = -\frac{1}{T}\log P(\hat{a}|q)$ exceeds a preset threshold, thus invoking the more resource-intensive dual retrieval pass only when necessary.

Adaptive Fusion:

Retrieved items are fused and reranked adaptively via scoring functions that optimize joint relevance to both the query and context, e.g., geometric mean scores or LLM-based rerankers (Chen et al., 7 Jan 2026, Tang et al., 17 Feb 2026).

These information-theoretic routing mechanisms provide interpretability by exposing when and why the system chooses to “think slow.”

4. Representation: Duality in Memory and Cognitive Models

Mnemis frameworks unify a range of dual-route phenomena recognized in both engineering and cognitive neuroscience settings:

Graph-Based Duality:

Practical memory graphs instantiate the "fast" (similarity) vs "slow" (semantic hierarchy) dualism structurally (Tang et al., 17 Feb 2026).

Non-Associative Algebraic Formalism:

In Reimann's framework (Reimann, 13 May 2025), two memory states are constructed by distinct non-associative bundling operators: the L-state (left-associative, online, recency-weighted) and R-state (right-associative, chunk-oriented, primacy-weighted). Retrieval relies on cue-specific mutual information:

$k$ 0

providing a formal basis for recency and primacy gradients in recall performance.

Transformer-Based Dual Induction:

Empirical work identifies mechanistically separate attention heads for verbatim (token-level) and semantic (concept-level) copying, each with dedicated Q/K/V projections and gating mechanisms (Feucht et al., 3 Apr 2025).

Psycholinguistic Justification:

Human sentence processing exhibits dual memory representations—lexical tokens (sequence memory) and syntactic constituents (structure memory)—retrieved by variants of cue-based attention, each yielding separable contributions to observed processing variability (Yoshida et al., 17 Feb 2025).

5. Empirical Evaluation and Benchmark Results

The effectiveness of Mnemis-style dual-route retrieval has been demonstrated across multiple large-scale benchmarks and synthetic tasks:

Long-Term Memory QA:

On LoCoMo and LongMemEval-S, Mnemis achieves 93.9 and 91.6 scores (GPT-4.1-mini backbone) outperforming RAG, Nemori, and EMem-G baselines (highest prior score 92.3) (Tang et al., 17 Feb 2026).

Ablation Studies:

Removal of either System-1 or System-2 routes results in substantial drops (System-1+Graph: 89.1, System-2 only: 87.7) versus full dual-route (93.3), confirming complementary benefit (Tang et al., 17 Feb 2026).

Synthetic Retrieval Tasks:

AMOR attains perfect retrieval accuracy (100%) at only 22.3% attention activation, and a 1.09-nat entropy gap demonstrates metacognitive reliability (Zheng, 22 Jan 2026).

Retrieval-Augmented Generation (DTR):

Dual-path retrieval in DTR boosts EM and F1 metrics across five QA datasets (e.g., Qwen2.5-7B: EM 37.81, F1 48.00) versus standard RAG (EM 35.81, F1 45.81) (Chen et al., 7 Jan 2026).

Human Cognitive Modeling:

Attention entropy over both token and syntactic-constituent embeddings are independent, significant predictors of human reading times, supporting the necessity of dual store representations in sentence processing (Yoshida et al., 17 Feb 2025).

6. Neural Interpretability, Cognitive Plausibility, and Mechanistic Insights

Mnemis aligns with mechanistic findings in neuroscience and neuropsychology:

Prefrontal Cortex and Hippocampus:

The L-state (recency) construction mirrors prefrontal working memory with high-dimensional persistent activity. R-state (primacy) models hippocampal episodic chunk encoding. Lesion studies match this dissociation: hippocampal lesions selectively impair primacy, prefrontal lesions selective for recency (Reimann, 13 May 2025).

Gate Interpretability:

Metacognitive gates in AMOR and DTR can be linked to information-theoretic variables (entropy, negative log-likelihood), providing explicit computational transparency regarding route selection (Zheng, 22 Jan 2026, Chen et al., 7 Jan 2026).

Route Specialization and Ablations:

In Transformer decoders, ablations of token-level or concept-level induction heads produce sharply selective deficits in verbatim vs. semantic copying, validating the functional separation and control of each pathway (Feucht et al., 3 Apr 2025).

Behavioral Matching:

Non-associative algebra models reproduce the characteristic U-shaped Serial Position Curve of human recall (early primacy, late recency) by combining L- and R-states, parametrically matching empirical human data without auxiliary tuning (Reimann, 13 May 2025).

7. Connections to Broader Dual-Route Theories and Future Directions

Mnemis integrates, formalizes, and extends dual-route theories of memory and reasoning found in cognitive science (e.g., Kahneman’s “fast and slow thinking” (Zheng, 22 Jan 2026), cue-based retrieval in psycholinguistics (Yoshida et al., 17 Feb 2025)) and connects them to scalable computational architectures. Current limitations include hierarchical graph dynamism, restricted modality support, and sensitivity to prompt engineering in System-2 traversal (Tang et al., 17 Feb 2026). Future research directions involve multi-modal graph integration, incremental hierarchy updates, more flexible traversal policies, and tighter fusion with reranking and adaptive retrieval strategies.

Ongoing work is establishing the precision and generality of dual-route frameworks such as Mnemis, grounding them in both algorithmic and neural evidence and deploying them as state-of-the-art modules for LLM memory, long document question answering, and adaptive reasoning.