Mechanistic differences between matrix and scalar/vector training dynamics
Characterize the mechanistic differences between noise-dominated dynamics of matrix parameters under weight decay and signal-dominated dynamics of scalar/vector parameters in large-language-model training, with the goal of explaining why learnable multipliers (scalar and vector) adapt scale while matrix weights are trapped by the noise–weight-decay equilibrium.
References
Yet, many questions are left open. Hence, an interesting direction for future work is to mechanistically understand the difference between matrix and scalar/vector dynamics, find an empirically measurable indicator of the noise level, or build a minimal mathematical model exhibiting both training regimes.
— Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
(2601.04890 - Velikanov et al., 8 Jan 2026) in Section 6: Conclusion and discussion