Generalization of Iso-Energy structural invariants beyond dual-encoder VLMs
Determine whether the structural invariants and alignment properties identified by the Iso-Energy–aligned sparse autoencoder in dual-encoder vision–language models also hold in models that employ cross-attention mechanisms or are trained with generative objectives.
References
Finally, our experiments are limited to dual-encoder vision–LLMs. Whether the same structural invariants and alignment properties hold in models with cross-attention mechanisms or generative training objectives remains an open question.
— Cross-Modal Redundancy and the Geometry of Vision-Language Embeddings
(2602.06218 - Dhimoïla et al., 5 Feb 2026) in Conclusion, final paragraph