Scale sparse autoencoder decomposition of contributions to LLM-scale features
Develop scalable training procedures and architectures to apply sparse autoencoder decomposition of contribution vectors at large language model scales, enabling CODEC to operate on high-dimensional LLM features for causal interpretability.
References
We note that SAE training is a separate computational step, independent of contribution computation, and scaling it to LLM-scale features is left to future work.
— Causal Interpretation of Neural Network Computations with Contribution Decomposition
(2603.06557 - Melander et al., 6 Mar 2026) in Supplemental Material, Section “Runtime measurements and Complexity,” final paragraph