Scalable Dictionary Learning for Sparse Inference Under Superposition

Develop scalable dictionary learning algorithms for sparse inference under superposition that learn overcomplete decoders/dictionaries enabling accurate recovery of sparse latent factors from neural activations within the compressed-sensing regime.

Background

The paper demonstrates that sparse autoencoders (SAEs) fail to generalize compositionally out of distribution not because amortized inference is inherently flawed, but because the learned dictionaries themselves point in substantially incorrect directions. Replacing the SAE encoder with per-sample FISTA on the same dictionary does not close the gap, whereas an oracle baseline using per-sample FISTA with the ground-truth dictionary achieves near-perfect recovery.

Classical alternating-minimization dictionary learning (DL-FISTA) can succeed at small scale but degrades as latent dimensionality grows, revealing dictionary learning—not inference—as the central bottleneck. The authors therefore identify the need for scalable dictionary learning methods that work in overcomplete, compressed-sensing settings as the key outstanding challenge.

References

Our results reframe the SAE failure as a dictionary learning challenge, not an amortisation problem, and point to scalable dictionary learning as the key open problem for sparse inference under superposition.

— Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation (2603.28744 - Pacela et al., 30 Mar 2026) in Abstract

Scalable Dictionary Learning for Sparse Inference Under Superposition

Background

References

Related Problems