Efficient Vision Transformers for 3D Reconstruction
Develop efficient Vision Transformer architectures for 3D reconstruction that maintain long-range, multi-view geometric consistency across input image sequences.
References
Consequently, it remains an open challenge to design efficient Vision Transformers for 3D reconstruction, as it demands maintaining long-range, multi-view geometric consistency.
— FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention
(2512.01540 - Wang et al., 1 Dec 2025) in Related Work, Efficient Vision Transformers subsection (Section 2)