Learning unified 3D representations from unposed multi-view images
Establish methods that can learn robust, unified 3D scene representations directly from unposed multi-view images, integrating geometry, appearance, and semantics without requiring known camera poses or per-scene optimization, and that remain effective in sparse-view settings.
References
However, deriving such effective 3D representations directly from unposed multi-view images remains an open challenge.
— Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
(2604.10573 - Zhou et al., 12 Apr 2026) in Section 1 (Introduction)