Geometry Prior as Cause of CRM’s Fast Convergence

Determine whether the strong geometry prior integrated into the Convolutional Reconstruction Model (CRM) architecture is responsible for the observed fast convergence during training for single-image to 3D textured mesh reconstruction.

Background

In the ablation study, the authors report that CRM begins producing reasonable reconstructions very early in training—after approximately 280 iterations (about 20 minutes). They hypothesize that this rapid convergence stems from the architectural use of geometric priors.

CRM’s architecture leverages spatial alignment between six orthographic input images and the rolled-out triplane representation, processed via a convolutional U-Net with Canonical Coordinate Maps (CCMs). The conjecture invites rigorous validation of whether these geometric priors causally drive the accelerated training dynamics.

References

We conjecture that the fast convergence results from the strong geometry prior in our architecture design.

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model  (2403.05034 - Wang et al., 2024) in Section 4.3.1 (Reconstruction Results on Early Training Stage)