Effectiveness of retriever-aware training for deep research agents

Determine whether retriever-aware training of deep research agents (e.g., DR Tulu), in which the agent is instruction-tuned or otherwise aligned to adapt its sub‑query generation strategy to the specific underlying retriever (such as BM25, gte‑Qwen2‑7B‑instruct, or ReasonIR), improves end‑to‑end performance on the Sage scientific literature retrieval benchmark’s short‑form and open‑ended tasks.

Background

The paper finds that BM25 substantially outperforms LLM-based retrievers within deep research agent workflows because current agents tend to generate keyword-oriented sub‑queries, leading to a query–retriever mismatch that limits the benefits of semantic retrievers.

The authors note they did not perform instruction fine‑tuning or alignment of open‑source deep research agents to make their query generation strategies aware of the underlying retriever type. As a result, it remains unassessed whether training agents to adapt their sub‑queries to the retriever (retriever‑aware training) would mitigate the mismatch and improve retrieval and end‑to‑end performance on Sage.

References

As a result, we are unable to assess whether training agents to adapt their query generation strategies based on the underlying retriever type could improve performance.

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents  (2602.05975 - Hu et al., 5 Feb 2026) in Section: Limitations and Future Work