Downstream validation of predicted DCLM mix ratio sweet spot in OLMo-2 mid-training
Validate through downstream evaluation the prediction that a DCLM (pre-training) ratio around 0.6 in the OLMo-2 mid-training data mixture optimally balances specialization and retention, as indicated by relative critical sharpness analysis.
References
We leave the validation of this prediction through downstream evaluation to future work.
— A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
(2601.16979 - Kalra et al., 23 Jan 2026) in Section 4 (How much Pre-training data is needed to avoid Catastrophic forgetting?), end of section; referencing Appendix: relative_sharpness_olmo_midtraining