Principled A Priori Criteria for Dataset Size Selection in Long-CoT Supervised Fine-Tuning

Develop principled a priori criteria for selecting the dataset size used in supervised fine-tuning on long chain-of-thought demonstrations for pretrained language models, taking into account dependence on model and data properties, so that practitioners can decide how many unique samples to use versus how many epochs to train before running experiments.

Background

While the authors find that training token accuracy provides a practical stopping signal for deciding when to stop increasing epochs, they observe that the optimal dataset size varies with both the data and the model.

This variability currently lacks an established rule or criterion that can be applied before training, making dataset-size selection an outstanding practical challenge.

References

While training token accuracy provides a practical stopping signal for epoch scaling, the optimal dataset size is data- and model-dependent, and principled criteria for selecting it a priori remain elusive.

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning  (2602.11149 - Kopiczko et al., 11 Feb 2026) in Conclusion