Performance of BAPO on larger-scale LLMs
Determine how Boundary-Aware Policy Optimization (BAPO) performs when applied to larger-scale Large Language Models (exceeding 14B parameters) in agentic search settings, including whether its reliability benefits persist at greater model scales.
References
It remains to be seen how the proposed method performs on larger-scale LLMs.
— BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
(2601.11037 - Liu et al., 16 Jan 2026) in Limitations