Empirical indicator of gradient-noise level governing scale adaptation

Identify an empirically measurable indicator of gradient noise level that predicts whether a given parameter tensor (e.g., a matrix vs. a vector) will exhibit scale-adaptation ability under standard pretraining with weight decay.

Background

The authors hypothesize a continuous spectrum of signal-to-noise ratios in weight gradients, which may determine whether parameters can adapt scale. An empirical indicator would enable diagnosing when reparameterization with learnable multipliers is beneficial and when matrix weights may escape the noise–WD equilibrium.

References

Yet, many questions are left open. Hence, an interesting direction for future work is to mechanistically understand the difference between matrix and scalar/vector dynamics, find an empirically measurable indicator of the noise level, or build a minimal mathematical model exhibiting both training regimes.

— Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers (2601.04890 - Velikanov et al., 8 Jan 2026) in Section 6: Conclusion and discussion

Empirical indicator of gradient-noise level governing scale adaptation

Background

References

Related Problems