360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting
Abstract: We introduce POP3D, a novel framework that creates a full $360\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspect that concurrent works fall short of. Our approach marries the strengths of four primary components: (1) a monocular depth and normal predictor that serves to predict crucial geometric cues, (2) a space carving method capable of demarcating the potentially unseen portions of the target object, (3) a generative model pre-trained on a large-scale image dataset that can complete unseen regions of the target, and (4) a neural implicit surface reconstruction method tailored in reconstructing objects using RGB images along with monocular geometric cues. The combination of these components enables POP3D to readily generalize across various in-the-wild images and generate state-of-the-art reconstructions, outperforming similar works by a significant margin. Project page: \url{http://cg.postech.ac.kr/research/POP3D}
- Matan Atzmon and Yaron Lipman. 2020. SAL: Sign Agnostic Learning of Shapes From Raw Data. In Proc. of CVPR.
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In Proc. of ICCV.
- Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. In Proc. of CVPR.
- ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth. arXiv:2302.12288 [cs.CV]
- Generative Novel View Synthesis with 3D-Aware Diffusion Models. arXiv:2304.02602 [cs.CV]
- 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. In Proc. of ECCV.
- Objaverse: A Universe of Annotated 3D Objects. arXiv:2212.08051 [cs.CV]
- Congyue Deng, Chiyu “Max” Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, and Dragomir Anguelov. 2023. NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors. In Proc. of CVPR. 20637–20647.
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans. In Proc. of ICCV. 10786–10796.
- P. Favaro and S. Soatto. 2005. A geometric approach to shape from defocus. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27, 3 (2005), 406–417.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In Proc. of ICLR.
- Learning a Predictable and Generative Vector Representation for Objects. In Proc. of ECCV.
- AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation. In Proc. of CVPR.
- NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. In Proc. of ICML.
- Fast and Explicit Neural View Synthesis. In Proc. of WACV. 3791–3800.
- Single-View View Synthesis in the Wild with Learned Adaptive Multiplane Images. In Proc. of ACM SIGGRAPH.
- Escaping Plato’s Cave: 3D Shape From Adversarial Rendering. In Proc. of ICCV.
- LoRA: Low-Rank Adaptation of Large Language Models. In Proc. of ICLR.
- Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. arXiv:2303.11989 [cs.CV]
- Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In Proc. of ICCV. 5885–5894.
- Wonbong Jang and Lourdes Agapito. 2021. CodeNeRF: Disentangled Neural Radiance Fields for Object Categories. In Proc. of ICCV. 12949–12958.
- Learning Category-Specific Mesh Reconstruction from Image Collections. In Proc. of ECCV.
- HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images. In Proc. of CVPR. 18423–18433.
- K.N. Kutulakos and S.M. Seitz. 1999. A theory of shape by space carving. In Proc. of ICCV. 307–314 vol.1.
- A. Laurentini. 1994. The Visual Hull Concept for Silhouette-Based Image Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 16, 2 (1994), 150–162.
- TRACER: Extreme Attention Guided Salient Object Tracing Network. In Proc. of AAAI Conference on Artificial Intelligence, Vol. 36. 12993–12994.
- Vision Transformer for NeRF-Based View Synthesis From a Single Input Image. In Proc. of WACV. 806–815.
- Zero-1-to-3: Zero-shot One Image to 3D Object. arXiv:2303.11328 [cs.CV]
- Angeline Loh. 2006. The recovery of 3-D structure using visual texture patterns. Ph. D. Dissertation.
- RealFusion: 360deg Reconstruction of Any Object From a Single Image. In Proc. of CVPR. 8446–8455.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proc. of ECCV.
- Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Processing Letters 20, 3 (2013), 209–212.
- Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv:2212.08751 [cs.CV]
- Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In Proc. of CVPR.
- Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion. In Proc. of CVPR.
- DreamFusion: Text-to-3D using 2D Diffusion. In Proc. of ICLR.
- Learning Transferable Visual Models From Natural Language Supervision. In Proc. of ICML, Vol. 139. 8748–8763.
- DreamBooth3D: Subject-Driven Text-to-3D Generation. arXiv:2303.13508 [cs.CV]
- Dense Depth Priors for Neural Radiance Fields from Sparse Input Views. In Proc. of CVPR.
- High-Resolution Image Synthesis With Latent Diffusion Models. In Proc. of CVPR. 10684–10695.
- Radu Alexandru Rosu and Sven Behnke. 2023. PermutoSDF: Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices. In Proc. of CVPR. 8466–8475.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In Proc. of CVPR. 22500–22510.
- PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. In Proc. of ICCV.
- Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. In Proc. of CVPR. 6229–6238.
- LAION-5B: An open large-scale dataset for training next generation image-text models. In Proc. of NeurIPS.
- 3D Photography using Context-aware Layered Depth Inpainting. In Proc. of CVPR.
- 3D Neural Field Generation Using Triplane Diffusion. In Proc. of CVPR. 20875–20886.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Proc. of NeurIPS.
- Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior. arXiv:2303.14184 [cs.CV]
- Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction. In Proc. of CVPR.
- Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In Proc. of CVPR.
- Exploiting Diffusion Prior for Real-World Image Super-Resolution. arXiv:2305.07015 [cs.CV]
- Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. In Proc. of ECCV.
- RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. In Proc. of CVPR. 4563–4573.
- NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction. arXiv:2212.05231 [cs.CV]
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
- Novel View Synthesis with Diffusion Models. In Proc. of ICLR.
- Multiview Compressive Coding for 3D Reconstruction. In Proc. of CVPR. 9065–9075.
- MagicPony: Learning Articulated 3D Animals in the Wild. Proc. of CVPR.
- Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images. In Proc. of ICCV.
- NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views. In Proc. of CVPR. 4479–4489.
- Volume rendering of neural implicit surfaces. In Proc. of NeurIPS.
- Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In Proc. of NeurIPS.
- Shelf-Supervised Mesh Prediction in the Wild. In Proc. of CVPR.
- pixelNeRF: Neural Radiance Fields from One or Few Images. In Proc. of CVPR.
- MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction. In Proc. of NeurIPS.
- Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields. arXiv:2305.11588 [cs.CV]
- NeRF++: Analyzing and Improving Neural Radiance Fields.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. of CVPR.
- Shape-from-shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 21, 8 (1999), 690–706.
- Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction. In Proc. of CVPR. 12588–12597.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.