Upper bound on Score Pathway override under targeted fine-tuning

Determine the upper bound on the capacity of the Score Pathway—i.e., the query/key-dependent routing through the Softmax attention scores in decoder-only Transformers—to override the causal-residual topological baseline that induces a U-shaped positional influence profile, under aggressive, position-targeted fine-tuning protocols.

Background

The paper proves that a U-shaped positional influence profile arises at initialization from causal masking and residual connections, forming a topological baseline that persists under standard pretraining. This baseline is derived by isolating the linear Value Pathway and modeling the positional routing via powers of the Cesàro matrix (and its continuous limit), while showing that the Score Pathway vanishes at initialization.

During training, the non-linear Score Pathway becomes active and creates localized spikes (e.g., at content boundaries), but the macroscopic U-shape persists. The authors note that their analysis of trained networks relies on empirical Jacobian measurements rather than closed-form bounds on the trained Softmax, and they specifically identify as open the problem of quantifying how much the Score Pathway can, in principle, overcome the topological baseline under aggressive, position-targeted fine-tuning.

References

Determining the upper bound of the Score Pathway's ability to override the topological baseline under aggressive, position-targeted fine-tuning remains an open empirical question.

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias  (2603.10123 - Chowdhury, 10 Mar 2026) in Limitations