Papers
Topics
Authors
Recent
Search
2000 character limit reached

CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Published 16 Apr 2024 in cs.CV | (2404.10603v2)

Abstract: Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavities. In this work, we propose CorrespondentDream, an effective method to leverage annotation-free, cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception, and by adopting it in our loss design, we are able to produce NeRF models with geometries that are more coherent with common sense, e.g., more smoothed object surface, yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results and a solid user study.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Genvs: Generative novel view synthesis with 3d-aware diffusion models, 2023.
  2. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  3. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022.
  4. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  5. The elements of statistical learning: data mining, inference, and prediction. Springer, 2009.
  6. Unsupervised semantic correspondence using stable diffusion. arXiv preprint arxiv:2305.15581, 2023.
  7. Leveraging 2d data to learn textured 3d mesh generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7498–7507, 2020.
  8. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  9. Debiasing scores and prompts of 2d diffusion for robust text-to-3d generation. arXiv preprint arXiv:2303.15413, 2023.
  10. Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint arXiv:2306.12422, 2023.
  11. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5885–5894, 2021.
  12. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5846–5854, 2021.
  13. Holodiffusion: Training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18423–18433, 2023.
  14. Transformatcher: Match-to-match attention for semantic correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8697–8707, 2022.
  15. Efficient semantic matching with hypercolumn correlation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 139–148, 2024.
  16. Nerfacc: Efficient sampling accelerates nerfs. arXiv preprint arXiv:2305.04966, 2023a.
  17. Sd4match: Learning to prompt stable diffusion model for semantic matching. arXiv preprint arXiv:2310.17569, 2023b.
  18. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023a.
  19. Common diffusion noise schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891, 2023b.
  20. Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023.
  21. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
  22. Diffusion hyperfeatures: Searching through time and space for semantic correspondence. In Advances in Neural Information Processing Systems, 2023.
  23. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  24. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  25. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2022.
  26. Neighbourhood consensus networks. Advances in neural information processing systems, 31, 2018.
  27. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  28. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
  29. Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data. arXiv e-prints, pages arXiv–2306, 2023.
  30. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023a.
  31. Emergent correspondence from image diffusion. arXiv preprint arXiv:2306.03881, 2023b.
  32. Learning accurate dense correspondences and when to trust them. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5714–5724, 2021.
  33. Sparf: Neural radiance fields from sparse and noisy poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4190–4200, 2023.
  34. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. CVPR, 2022.
  35. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023a.
  36. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Advances in Neural Information Processing Systems, 34:27171–27183, 2021.
  37. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023b.
  38. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023c.
  39. A tale of two features: Stable diffusion complements dino for zero-shot semantic correspondence. arXiv preprint arxiv:2305.15347, 2023.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.