Robust and calibrated stop/continue criteria across retrievers, corpora, and LLM backbones
Develop stop/continue criteria for multi-hop question answering retrieval–reasoning procedures that remain calibrated across different retrievers, corpora, and large language model backbones, and evaluate these criteria under controlled variations of hop depth and retrieval noise to assess their reliability and transferability.
References
An important open problem is to develop stop/continue criteria that remain calibrated across retrievers, corpora, and LLM backbones, and to evaluate them under controlled variations of hop depth and retrieval noise.
— Retrieval--Reasoning Processes for Multi-hop Question Answering: A Four-Axis Design Framework and Empirical Trends
(2601.00536 - Ji et al., 2 Jan 2026) in Section RQ4: Open Problems and Future Directions, Challenge 4