Downstream validation of predicted DCLM mix ratio sweet spot in OLMo-2 mid-training

Validate through downstream evaluation the prediction that a DCLM (pre-training) ratio around 0.6 in the OLMo-2 mid-training data mixture optimally balances specialization and retention, as indicated by relative critical sharpness analysis.

Background

Using relative critical sharpness, the authors identify a predicted sweet spot near a 0.6 DCLM ratio for the OLMo-2 mid-training mix, close to the ratio used in the original training. This prediction is based on curvature analysis across tasks within the Dolmino mix.

They explicitly defer empirical confirmation of this prediction to future work, indicating the need for comprehensive downstream evaluations to test whether the proposed ratio truly optimizes the trade-off between task specialization and retention of general capabilities.

References

We leave the validation of this prediction through downstream evaluation to future work.

A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs  (2601.16979 - Kalra et al., 23 Jan 2026) in Section 4 (How much Pre-training data is needed to avoid Catastrophic forgetting?), end of section; referencing Appendix: relative_sharpness_olmo_midtraining