SAE Feature Dynamics Predict Decoding-Order Performance
Establish whether the dynamics of sparse autoencoder features during denoising in diffusion language models provide a useful signal that correlates with task performance across different remasking-based decoding orders, including ORIGIN (random order), TOPK-MARGIN (highest margin first), and ENTROPY (lowest-entropy first).
References
We conjecture that these SAE-based dynamics provide a useful signal that correlates with task performance across decoding orders.
— DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
(2602.05859 - Wang et al., 5 Feb 2026) in Section 5.2 (Feature Dynamics Across Decoding Strategies)