Step-resolved decomposition for curvature-based influence functions in looped transformers

Derive an interpretable step-resolved decomposition for curvature-based influence estimates using Influence Functions based on the inverse Hessian that attributes training-example influence across recurrent loop steps in weight-tied looped transformers, analogous to the Step-Decomposed Influence (SDI) decomposition available for TracIn.

Background

The paper introduces Step-Decomposed Influence (SDI), a method that losslessly decomposes TracIn’s scalar influence into a trajectory over the recurrent steps of weight-tied transformers, enabling step-resolved attribution without materializing per-example gradients.

While SDI relies on TracIn’s gradient dot-product formulation, many influence methods are curvature-based and depend on inverse-Hessian computations (Influence Functions). The authors note that, unlike TracIn, such curvature-based estimators do not have an obvious, similarly interpretable stepwise decomposition across recurrent iterations, presenting an unresolved methodological gap.

References

We rely on TracIn over Influence Functions for two reasons: (1) Influence Functions typically assume a converged model whereas TracIn operates on the optimisation trajectory, allowing us to attribute behaviour to specific training dynamics captured at any intermediate checkpoint; (2) TracIn admits a clean linear decomposition over the recurrent computation as shown below, directly enabling the unrolled attribution of SDI, while it is unclear how to derive a similarly interpretable decomposition for curvature-based influence estimates involving the inverse Hessian across recurrent steps.

— Step-resolved data attribution for looped transformers (2602.10097 - Kaissis et al., 10 Feb 2026) in Appendix, Section Extended Related Work, Data attribution

Step-resolved decomposition for curvature-based influence functions in looped transformers

Background

References

Related Problems