Depth behavior of per-step success probability in the Diligent Learner framework
Determine whether, as multi-step reasoning proceeds over longer horizons, the Diligent Learner stepwise success probability γ—defined as the probability that the generator policy proposes a next-step extension that keeps the current reasoning prefix completable—remains bounded below by a positive constant independent of depth, or whether there exist categories of problems for which γ degrades catastrophically with increasing depth.
References
However, a central question remains unresolved; {\em as reasoning unfolds over longer horizons on tasks does the stepwise success probability $\gamma$ always remain larger than a positive constant, or are there categories of problems for which it catastrophically degrades with depth?}
— Tool Building as a Path to "Superintelligence"
(2602.21061 - Koplow et al., 24 Feb 2026) in Section 1, Introduction