Scaling behavior of OPSD beyond 8B-parameter models
Determine whether the observed improvements from On-Policy Self-Distillation (OPSD) with increasing model size persist for large language models at scales beyond 8 billion parameters, specifically including approximately 70 billion parameters and larger frontier models, when applied to reasoning tasks using OPSD’s on-policy self-distillation setup.
References
While we observe that larger models benefit more from OPSD-consistent with our hypothesis that self-rationalization requires sufficient model capacity-it remains an open question whether this trend continues at scales beyond 8B parameters, such as 70B or larger frontier models.
— Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
(2601.18734 - Zhao et al., 26 Jan 2026) in Section 7. Limitations and Future Directions