Step-resolved decomposition for curvature-based influence functions in looped transformers
Derive an interpretable step-resolved decomposition for curvature-based influence estimates using Influence Functions based on the inverse Hessian that attributes training-example influence across recurrent loop steps in weight-tied looped transformers, analogous to the Step-Decomposed Influence (SDI) decomposition available for TracIn.
References
We rely on TracIn over Influence Functions for two reasons: (1) Influence Functions typically assume a converged model whereas TracIn operates on the optimisation trajectory, allowing us to attribute behaviour to specific training dynamics captured at any intermediate checkpoint; (2) TracIn admits a clean linear decomposition over the recurrent computation as shown below, directly enabling the unrolled attribution of SDI, while it is unclear how to derive a similarly interpretable decomposition for curvature-based influence estimates involving the inverse Hessian across recurrent steps.