Multi-hop Dense Retrieval

Updated 22 January 2026

Multi-hop Dense Retrieval (MDR) is a paradigm that iteratively refines queries by integrating evidence from multiple passages to build coherent document chains.
It leverages dual-encoder architectures, contrastive learning, and beam search to efficiently navigate large-scale corpora for complex multi-step reasoning.
Empirical results demonstrate MDR's state-of-the-art performance in multi-hop QA, fact verification, and relation extraction while highlighting challenges like error propagation and scalability.

Multi-hop Dense Retrieval (MDR) is a retrieval paradigm for answering complex information-seeking queries requiring reasoning over multiple discrete passages in a large corpus. MDR extends standard dense retrieval architectures by enabling iterative retrieval: at each hop, the system dynamically refines the query by integrating newly acquired evidence, thereby constructing a chain—or latent graph—of documents that collectively provide the necessary information to resolve the initial query. Research on MDR centers on neural methods—primarily dual-encoder models—optimized through multi-hop adaptations of contrastive learning, as well as decompositional and end-to-end strategies that address both efficiency and generalization challenges in open-domain multi-hop question answering (QA) and complex reasoning tasks.

1. Formal Definition and Core Principles

MDR models the evidence retrieval process as a sequence of dependent retrieval steps. The retrieval model seeks a chain of documents $[d_1, d_2, ..., d_L]$ such that each $d_t$ is selected from a corpus $\mathcal{D}$ conditioned on the question $q$ and the evidence $d_1, ..., d_{t-1}$ collected in previous hops. At each hop, the retriever forms a context-augmented query $q_t$ by composition—typically by concatenation or embedding-based fusion—of the question and previously retrieved evidence: $q_t = \text{Compose}(q, d_1, ..., d_{t-1})$ Retrieval operates as maximum inner product search (MIPS) in a dense embedding space. Formally, at each hop,

$d_t = \arg\max_{d \in \mathcal{D}} \langle E_Q(q_t), E_D(d) \rangle$

where $E_Q$ and $E_D$ are neural encoders for queries and documents, and $\langle \cdot, \cdot \rangle$ denotes inner product or, in some variants, cosine similarity (Xiong et al., 2020, Sidiropoulos et al., 2021, Seonwoo et al., 2021).

Score aggregation across hops is typically additive or multiplicative, and candidate reasoning chains are expanded in a beam search or scored jointly over passage sequences. The ability to recursively update the retrieval query as a function of prior evidence is fundamental to MDR’s effectiveness in scenarios where no single passage directly answers the question.

2. Methodological Taxonomy and Modeling Choices

2.1 Dual-Encoder Architectures

Most MDR systems employ dual-encoder (bi-encoder) architectures for efficient large-scale retrieval (Xiong et al., 2020, Sidiropoulos et al., 2021, Zhao et al., 2021, Seonwoo et al., 2021). For each text input $x$ , the encoder produces a dense vector $v_x$ (e.g., via the [CLS] token of a Transformer, projected to dimension $d$ ). Training follows hard negative contrastive learning: $\mathcal{L}_{\text{CL}} = -\log \frac{\exp(\operatorname{sim}(v_{q_t}, v_{d_t^+})/\tau)}{\sum_j\exp(\operatorname{sim}(v_{q_t}, v_{d_j})/\tau)}$ with in-batch and/or mined negatives.

2.2 Query Update Mechanisms

Critical to MDR is query composition:

Simple concatenation: $q_{t} = [q; d_1; ...; d_{t-1}]$ (input-level) (Xiong et al., 2020, Seonwoo et al., 2021).
Embedding fusion: $z_{q}^{(t)} = \text{MLP}([z_{q}^{(t-1)}; z_d])$ or $z_q^{(t)} = z_q^{(t-1)} + \alpha z_d$ (Zhao et al., 2021).
Attention-based context fusion: $\mathbf{D}_{\text{agg}} = \sum_{i} \alpha_i d_i$ with $\alpha_i$ via softmax over similarity (Huang et al., 19 Jun 2025).

2.3 Retrieval Path Search and Beam Search

Beam search expands multiple candidate reasoning chains, allowing the model to recover from early retrieval errors and explore alternative multi-hop paths. Each beam element holds a query state, accumulated score, and path: For t = 1 to H: For each (path, q, score) in beam: [Retrieve](https://www.emergentmind.com/topics/jetson-nano-r-retrieve) top-M passages; expand; prune to top-K beams ([2104.05883](/papers/2104.05883)) This contrasts with greedy iterative retrieval, which follows a single chain (Xiong et al., 2020).

2.4 Decomposition-Free versus Decomposition-Based

Decomposition-free MDR directly builds multi-hop evidence chains by recursive query updating and retrieval. Decomposition-based approaches, by contrast, rely on rule-based or generatively-produced sub-questions and retrieve each hop conditioned on sub-question outputs, often via external LLMs or complex planner–executor structures (Liu et al., 22 Aug 2025).

3. Supervision, Optimization, and Weak Labeling

3.1 Contrastive and Mixed Objectives

While InfoNCE-style contrastive loss is standard, several variants integrate additional objectives:

Mixed classification loss: e.g., three-way label prediction on (claim, evidence) pairs (SUPPORTS, REFUTES, NEI) (Bai et al., 2024).
End-to-end reinforcement learning via policy optimization: OPERA’s MAPGRPO algorithm trains planning and retrieval agents sequentially with group-relative rewards (Liu et al., 22 Aug 2025).

3.2 Weak Supervision and Synthetic Data

Human annotation for multi-hop retrieval is exceptionally costly due to the combinatorial space of hop sequences. LOUVRE mitigates this via large-scale pseudo-labels generated by bridge-entity rephrasing on Wikipedia hyperlinks, facilitating massive weakly-supervised pretraining (Seonwoo et al., 2021).

3.3 Posterior Regularization and Distillation

MoPo regularizes the prior (inference) retriever toward a posterior (oracle) model at each hop using a KL divergence term, updating the posterior model by an exponential moving average of the prior parameters to stabilize distillation (Xia et al., 2024). This addresses the instability and performance gap when transferring strong posterior information (e.g., answer-aware queries) to real-world inference.

3.4 Label-Free Retriever Training with LLM Signals

ReSCORE dispenses with explicit hop-level labels, instead extracting per-passage pseudo-labels from LLM token probabilities that jointly reflect both relevance (p(q|d)) and answer consistency (p(a|q,d)), and updating the retriever to optimize KL(Q_LLM‖P_θ) in an iterative RAG loop (Lee et al., 27 May 2025). This increases adaptability in domains where annotations are scarce.

4. Architectural and Efficiency Advances

4.1 Condensed and Focused Retrieval

Baleen addresses exponential context growth by condensing each hop’s retrieved evidence into a compact set of relevant sentences, using a two-stage condenser coupled with a focused late-interaction retriever (FLIPR) for efficient multi-hop expansion without loss of retrieval accuracy (Khattab et al., 2021). FLIPR applies token-level max-similarity between query and passage representations, focusing on subsets to better match disjoint multi-hop needs.

4.2 Decomposition-Free Generative Models

GRITHopper integrates dense retrieval training and causal language modeling on a unified decoder architecture, employing “retrieve–evaluate–act” instructions, post-retrieval final answer conditioning, and reward modeling. Unlike pure bi-encoders, this approach supports end-to-end differentiation and OOD generalization to longer reasoning chains (Erker et al., 10 Mar 2025).

4.3 Planner–Executor and Reinforcement Learning

OPERA explicitly models question decomposition, retrieval, and rewriting with specialized agent modules—Goal Planning, Analysis-Answer, Rewrite—trained via MAPGRPO. This orchestrates sub-goal formation and execution with dense retrieval tightly coupled to fine-grained reasoning rewards, allowing robust multi-hop search even for out-of-template queries (Liu et al., 22 Aug 2025).

5. Empirical Results and Benchmarking

MDR establishes state-of-the-art retrieval and QA performance across multiple open-domain and fact verification benchmarks:

On HotpotQA, 2WikiMultiHopQA, FEVER, MuSiQue, and related datasets, MDR systems consistently outperform prior sparse, entity-linking, and graph-based methods (Xiong et al., 2020, Zhao et al., 2021, Bai et al., 2024, Erker et al., 10 Mar 2025, Huang et al., 19 Jun 2025).
In HotpotQA 2-hop settings, classical MDR achieves R@2 ≈ 66, R@20 ≈ 80 (recall of both gold supports), with enhanced recall and end-to-end joint F1 using hybrid, beam, or condensed retrieval methods (Xiong et al., 2020, Sidiropoulos et al., 2021, Zhao et al., 2021, Khattab et al., 2021).
MR.COD demonstrates MDR’s impact beyond QA, substantially boosting recall and relation extraction F1 for cross-document RE (e.g., +7.5 absolute F1 over snippets baselines) (Lu et al., 2022).
For strong generalization, decomposition-free models (GRITHopper) and planner–executor frameworks (OPERA) show smaller performance drops and high relative My@k on out-of-domain 4-hop QA tasks (Erker et al., 10 Mar 2025, Liu et al., 22 Aug 2025).

Empirical ablations reveal beam size, negative mining, and query composition are critical. Weakly-supervised or LLM-pseudo labeling enables MDR to scale efficiently without human hop-level annotation (Seonwoo et al., 2021, Lee et al., 27 May 2025).

6. Scalability, Efficiency, and Limitations

The principal efficiency innovation of MDR is the use of dual-encoder architectures with offline passage indexing and sub-linear ANN search (e.g., FAISS HNSW), resulting in inference times orders of magnitude lower than rerank- or cross-attention-based pipelines (Xiong et al., 2020, Sidiropoulos et al., 2021, Khattab et al., 2021). Condensation techniques and compact context fusion further mitigate the context explosion as the number of hops increases (Khattab et al., 2021, Erker et al., 10 Mar 2025).

However, MDR’s susceptibility to error propagation (early-hop misretrieval), limited ability to self-correct, and challenges in scaling query composition to $T\gg2$ hops remain active research areas. Non-differentiable or multi-stage decompositional strategies can induce computation overhead and break end-to-end learning (Erker et al., 10 Mar 2025, Liu et al., 22 Aug 2025). Integration of sparse (BM25) and dense signals remains a promising direction for robust and resource-constrained deployment (Sidiropoulos et al., 2021).

7. Future Directions and Open Challenges

Dynamic hop counting, stopping, and backtracking: Rather than fixing the number of hops, future MDR should dynamically determine when to stop or revisit earlier steps (Erker et al., 10 Mar 2025).
Hybrid lexical–dense architectures: New work targets seamless fusion of lexical matching and dense semantic retrieval in a single network (Sidiropoulos et al., 2021, Bai et al., 2024).
Multi-task and mixed-objective learning: Incorporating verification classification, relevance, and contrastive losses in joint training yields further gains (Bai et al., 2024).
Reinforcement and instruction-based approaches: Architectures that couple high-level planning, reasoning, and retrieval via policy optimization or generative instructions are gaining traction (Liu et al., 22 Aug 2025).
Label-free and weak supervision: Exploiting LLMs for pseudo-labels or synthetic data generation expands MDR applicability to domains with scarce annotation (Seonwoo et al., 2021, Lee et al., 27 May 2025).
Plug-and-play compositional retrievers: Robustness to distributional shift and higher-hop reasoning demands decomposition-free models with strong generalization (Erker et al., 10 Mar 2025).

MDR is a rapidly developing area at the intersection of representation learning, open-domain retrieval, and complex multi-step reasoning, with a rich set of methods and empirical validation spanning question answering, fact verification, and cross-document relation extraction.