Quantifying alignment between instruction edits and induced low-rank weight updates
Quantify the alignment between instruction edits in Instruction-Level Weight Shaping (ILWS) and the effective low-rank updates they induce in transformer models by developing formal metrics or bounds that relate specific instruction-space edits to the magnitude and direction of the corresponding parameter perturbations and the resulting behavioral effects.
References
Finally, the theory-to-practice link is qualitative: while instruction edits influence effective low-rank updates, quantifying alignment remains an open problem.
— Instruction-Level Weight Shaping: A Framework for Self-Improving AI Agents
(2509.00251 - Costa, 29 Aug 2025) in Section 8 (Limitations and risks)