Extent to which open-weight, small/medium LLMs benefit from self-evolving reasoning

Determine the extent to which open-weight large language models of small and medium scale can benefit from self-evolving reasoning paradigms to extend their reasoning limits on hard tasks, particularly in settings where verification and refinement capabilities are weak or unstable.

Background

The paper contrasts strong verification–refinement pipelines used by leading proprietary models with the weaker and less reliable verification and refinement abilities commonly found in open-weight, smaller-scale models. This gap raises uncertainty about whether and how such models can leverage iterative self-evolution to improve performance on difficult reasoning tasks.

Motivated by this uncertainty, the authors propose Deep Self-Evolving Reasoning (DSER), which models iterative verification and refinement as a Markov chain and argues that convergence toward correct solutions can occur if improvement probabilities marginally exceed degradation probabilities. While DSER demonstrates empirical gains for an 8B-parameter model on AIME benchmarks, the broader question of the extent to which open-weight small/medium models benefit across tasks and settings remains unresolved.

References

It is still unclear to what extent open-weight reasoning models, especially small and medium-sized ones with broader accessibility, can benefit from self-evolving paradigms and extend their reasoning limits.

Deep Self-Evolving Reasoning  (2510.17498 - Liu et al., 20 Oct 2025) in Section 1 (Introduction)