Size-scaling of training stability and performance gains from learnable multipliers

Determine how training stability and performance improvements produced by learnable multipliers scale with model size across the large-language-model regime.

Background

While learnable multipliers improved performance in both Adam and Muon settings and at the tested scale, it remains unknown how these benefits and stability characteristics change as model width and depth grow, especially in long training runs typical of LLM pretraining.

References

Yet, many questions are left open. How training stability and performance improvement of LRMs scale with model size is another practical question.

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers  (2601.04890 - Velikanov et al., 8 Jan 2026) in Section 6: Conclusion and discussion