Papers
Topics
Authors
Recent
Search
2000 character limit reached

HippoRAG 2: Enhanced Memory for LLMs

Updated 14 December 2025
  • The paper introduces HippoRAG 2, a framework that unifies dense and sparse retrieval to boost factual recall, associative reasoning, and sense-making in LLMs.
  • It employs a dual-node knowledge graph with passage and phrase nodes, enhanced by Personalized PageRank and LLM-based triple filtering for deep contextual retrieval.
  • Benchmark experiments demonstrate a 7-point F1 gain over embedding retrievers on associative tasks while significantly reducing LLM token usage.

HippoRAG 2 is a non-parametric continual learning framework for LLMs that augments retrieval-augmented generation (RAG) with explicit, context-rich memory mechanisms, specifically constructed knowledge graphs (KGs) and Personalized PageRank (PPR), achieving superior performance in factual recall, sense-making, and associative retrieval. Designed to mimic the dynamic, interconnected character of human long-term memory, HippoRAG 2 builds upon its predecessor HippoRAG by incorporating deeper passage integration, more contextualized query–triple linking, and an online LLM loop both for knowledge extraction/filtering and answer generation. The system addresses deficiencies in previous graph-augmented RAG architectures, which often compromised factual recall or sense-making for associativity, by unifying dense and sparse representations, adding passage nodes to the KG, and leveraging LLM-powered recognition filtering. In benchmark experiments, HippoRAG 2 lifts associative QA F1 by 7 points over state-of-the-art embedding retrievers, while also excelling in factual and discourse-oriented tasks (GutiĆ©rrez et al., 20 Feb 2025).

1. Motivation and Design Principles

The core motivation for HippoRAG 2 is to endow LLMs with a non-parametric continual memory system capable of context-sensitive knowledge acquisition, recall, and integration—key features of human memory. Standard RAG workflows, reliant on nearest-neighbor vector retrieval, struggle with catastrophic forgetting in fine-tuning and lack the capacity for multi-hop associations (ā€œassociativityā€) and deep context interpretation (ā€œsense-makingā€). Structure-augmented approaches with knowledge graphs partly address associativity but have yielded trade-offs, usually reducing performance on basic factual QA. HippoRAG 2 aims to unify memory recall mechanisms to optimize all three memory modalities—factual, associative, and sense-making—through innovations including:

  • Dense–sparse integration: Passage nodes and phrase nodes co-exist in the KG.
  • Deep contextualization: Queries directly link to full triples.
  • Recognition memory: LLM-based filtering of relevant triples.
  • Online LLM loop: LLM handles both knowledge graph maintenance and final-answer reading.

2. Memory Graph Construction

The HippoRAG 2 knowledge graph includes both phrase nodes p∈Pp \in P (text spans from OpenIE triple extraction) and passage nodes d∈Dd \in D (full passages or documents from the corpus). This dual-node structure enables dense–sparse integration. Edges fall into three categories:

  • Relation edges: Connect phrase nodes for each KG triple (s,r,o)(s, r, o), with undirected weight wso=1w_{so} = 1.
  • Synonym edges: Between phrase nodes if their embeddings ei,eje_i, e_j satisfy similarity sim(ei,ej)≄τ\textrm{sim}(e_i, e_j) \geq \tau, Ļ„=0.8\tau=0.8; wij=sim(ei,ej)w_{ij} = \textrm{sim}(e_i, e_j).
  • Context edges: Link every phrase node extracted from a passage to that passage node; wdp=1w_{dp} = 1.

The graph is represented as an adjacency matrix AA over the full node set V=P∪DV = P \cup D, normalized row-wise:

A~=Dāˆ’1A\tilde{A} = D^{-1}A

where Dii=āˆ‘jAijD_{ii} = \sum_j A_{ij}. The KG is static offline; online, only the personalization vector for PPR changes following LLM-driven triple filtering.

3. Personalized PageRank and Retrieval

HippoRAG 2 applies PPR over the normalized adjacency matrix A~\tilde{A}, producing a contextually ranked retrieval:

r=αv+(1āˆ’Ī±)A~⊤r\mathbf{r} = \alpha \mathbf{v} + (1-\alpha)\tilde{A}^\top \mathbf{r}

where α=0.5\alpha = 0.5 is fixed and v\mathbf{v} is a personalization vector defined via relevant phrase and passage seed nodes informed by the query and triple scores. The vector v\mathbf{v} is:

vi={score(i)āˆ‘j∈Sscore(j)i∈S,Ā 0iāˆ‰S,v_i = \begin{cases} \frac{\text{score}(i)}{\sum_{j \in S} \text{score}(j)} & i \in S, \ 0 & i \notin S, \end{cases}

with score(i)\text{score}(i) being the average retrieval score for triple-generating phrase nodes or weighted embedding similarity for passage nodes. PPR is solved by power iteration. The top-ranked passage nodes select the contextual passages for the LLM reader downstream.

4. Deep Passage Integration and Prompt Construction

ā€œDeepā€ passage integration involves encoding both passage and query into dense vectors (hj,hq)(h_j, h_q), concatenated to form the final context-aware prompt embedding:

H=[hq;h1;h2;…;hk]H = [h_q; h_1; h_2; \ldots; h_k]

In transformer-based LLMs, this concatenation is consumed in encoder layers or injected into cross-attention for the memory bank at each generation step:

gt=LLM(gtāˆ’1,[hq∣h1āˆ£ā€¦āˆ£hk]prompt)\mathbf{g}_t = \mathrm{LLM}(\mathbf{g}_{t-1}, [h_q | h_1 | \ldots | h_k]_{\text{prompt}})

Practically, passages are prepended (delimited) in natural language before the query for answering.

5. Online Retrieval and Generation Workflow

The online operational logic comprises four sequential steps:

a. Query–Triple Linking and Passage Ranking: Compute embeddings for the query and match against KG triple texts and passage embeddings; retrieve top-kk triples TT and passages D0D_0.

b. Recognition Memory (Triple Filtering): Feed the query plus candidate triples TT into a secondary LLM (e.g., Llama-3.3-70B-Instruct) with a prompt designed for filtering out triples irrelevant to the query (see paper Appendix A). Resulting set is Tā€²āŠ†TT' \subseteq T.

c. Seed Node Selection and PPR: Extract up to five phrase nodes from T′T', score based on triple score, plus all passage nodes with scaled embedding similarity; construct v\mathbf{v} for PPR, returning top-kk passages D1D_1.

d. Final Generation: Concatenate D1D_1 as context and prompt the LLM for the answer output.

Optional post-processing allows addition of new high-confidence facts back into the KG via OpenIE and synonym detection, supporting continual learning.

6. Experimental Protocol and Evaluation

HippoRAG 2 was empirically validated on three major task types:

Task Type Benchmarks Metrics
Factual Recall NaturalQuestions (NQ), PopQA Recall@5, EM, F1
Associativity MuSiQue, 2Wiki, HotpotQA, LV-Eval Recall@5, EM, F1
Sense-making NarrativeQA EM, F1

Passage Recall@5 measures percentage of queries with supporting passage retrieved in top 5. Exact Match (EM) and F1 reflect generation accuracy. Key result: on associative benchmarks, HippoRAG 2 achieves a mean +7 F1 gain over NV-Embed-v2, the embedding retriever baseline. Factual and sense-making tasks also show slight improvements.

7. Comparative Evaluation and Limitations

Compared to state-of-the-art pure-embedding RAG (NV-Embed-v2 + LLM), HippoRAG 2 lifts multi-hop QA F1 (e.g., MuSiQue: 44.8→51.9) and Recall@5 (e.g., MuSiQue: 69.7%→74.7%, 2Wiki: 76.5%→90.4%). Structure-based approaches (RAPTOR, GraphRAG, LightRAG, HippoRAG) may improve associativity or sense-making, but generally reduce performance on simple QA by 5–10 F1, which HippoRAG 2 avoids. It also requires significantly fewer LLM tokens for indexing (e.g., 9M versus 115M for MuSiQue). Nevertheless, the LLM triple filter exhibits ā‰ˆ\approx7% miss rate, and sparse seeds can limit PPR effectiveness.

8. Future Directions

Future work concentrates on:

  • Integrating episodic memory for extended dialogue contexts.
  • Automatic consolidation/pruning of memory over large document collections.
  • Dynamic graph adaptation reflecting ongoing conversation context.

These directions aim to further approximate human-like conversational memory and scalability in continual learning (GutiƩrrez et al., 20 Feb 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HippoRAG 2.