Circuit-specific impact of learnable multipliers on capabilities

Ascertain whether learnable multipliers preferentially enhance specific types of transformer circuits and explain the uneven improvements across capabilities observed in downstream benchmarks.

Background

The authors report larger gains from learnable multipliers on reasoning-heavy benchmarks (BBH, MATH, GSM8K) compared to knowledge-centric ones (MMLU, ARC-C). They hypothesize multipliers may differentially strengthen certain circuits, motivating targeted mechanistic investigations.

References

Yet, many questions are left open. It is practically relevant to further investigate the the relation between learnable multipliers and the difference in improvement it provides to different capabilities we have preliminary seen in table \ref{tab:evals}. A interesting hypothesis to explore is whether learned multipliers enhance only a certain types of circuits learned by the model .

— Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers (2601.04890 - Velikanov et al., 8 Jan 2026) in Section 6: Conclusion and discussion

Circuit-specific impact of learnable multipliers on capabilities

Background

References

Related Problems