Disentangle Muon’s contribution from auxiliary techniques in reported improvements
Quantify the extent to which reported performance improvements in prior studies are attributable specifically to the Muon optimizer, rather than to concurrent auxiliary techniques such as QK-Norm or QK-Clip.
References
Consequently, it remains unclear to what extent the reported improvements can be attributed to Muon itself, how Muon relates to established adaptive optimizers such as Adam, and whether Muon can be systematically improved.
— Delving into Muon and Beyond: Deep Analysis and Extensions
(2602.04669 - Qi et al., 4 Feb 2026) in Section 1 (Introduction)