Empirical indicator of gradient-noise level governing scale adaptation
Identify an empirically measurable indicator of gradient noise level that predicts whether a given parameter tensor (e.g., a matrix vs. a vector) will exhibit scale-adaptation ability under standard pretraining with weight decay.
References
Yet, many questions are left open. Hence, an interesting direction for future work is to mechanistically understand the difference between matrix and scalar/vector dynamics, find an empirically measurable indicator of the noise level, or build a minimal mathematical model exhibiting both training regimes.
— Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
(2601.04890 - Velikanov et al., 8 Jan 2026) in Section 6: Conclusion and discussion