Feature-learning in the infinite-width limit with learnable multipliers
Establish whether the use of learnable multipliers alone guarantees maximal feature learning in the infinite-width limit, thereby eliminating the need for manual μP scaling rules.
References
Yet, many questions are left open. Or, does application of learnable multipliers automatically ensures maximal feature learning in infinite-width limit without manual scaling rules required in classical μP ?
— Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
(2601.04890 - Velikanov et al., 8 Jan 2026) in Section 6: Conclusion and discussion