Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

Published 21 Dec 2023 in cs.CV | (2312.14124v2)

Abstract: Controllable generation of 3D assets is important for many practical applications like content creation in movies, games and engineering, as well as in AR/VR. Recently, diffusion models have shown remarkable results in generation quality of 3D objects. However, none of the existing models enable disentangled generation to control the shape and appearance separately. For the first time, we present a suitable representation for 3D diffusion models to enable such disentanglement by introducing a hybrid point cloud and neural radiance field approach. We model a diffusion process over point positions jointly with a high-dimensional feature space for a local density and radiance decoder. While the point positions represent the coarse shape of the object, the point features allow modeling the geometry and appearance details. This disentanglement enables us to sample both independently and therefore to control both separately. Our approach sets a new state of the art in generation compared to previous disentanglement-capable methods by reduced FID scores of 30-90% and is on-par with other non disentanglement-capable state-of-the art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Learning representations and generative models for 3D point clouds. In ICML, 2018.
  2. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In CVPR, 2023.
  3. Demystifying mmd gans. In ICLR, 2018.
  4. Learning gradient fields for shape generation. In ECCV, 2020.
  5. Deep local shapes: Learning local SDF priors for detailed 3D reconstruction. In ECCV, 2020.
  6. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
  7. ShapeNet: An information-rich 3d model repository. Technical Report arXiv:1512.03012, Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
  8. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction, 2023.
  9. Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
  10. From data to functa: Your data point is a function and you can treat it like one. In ICML, 2022.
  11. HyperDiffusion: Generating implicit neural fields with weight-space diffusion, 2023.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
  13. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  14. Codenerf: Disentangled neural radiance fields for object categories. In ICCV, 2021.
  15. Shap-E: Generating conditional 3d implicit functions. arXiv:2305.02463, 2023.
  16. Elucidating the design space of diffusion-based generative models. In NeurIPS, 2022.
  17. Softflow: Probabilistic framework for normalizing flow on manifolds. In NeurIPS, 2020.
  18. Auto-Encoding Variational Bayes. In ICLR, 2014.
  19. Discrete point flow networks for efficient point cloud generation. In ECCV, 2020.
  20. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
  21. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023.
  22. Diffusion probabilistic models for 3d point cloud generation. In CVPR, 2021.
  23. Realfusion: 360deg reconstruction of any object from a single image. In CVPR, 2023.
  24. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  25. Diffrf: Rendering-guided 3d radiance field diffusion. In CVPR, 2023.
  26. Point-E: A system for generating 3d point clouds from complex prompts. arXiv:2212.08751, 2022.
  27. Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021.
  28. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, 2020.
  29. Photoshape: Photorealistic materials for large-scale shape collections. ACM TOG, 2018.
  30. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  31. Zero-shot text-to-image generation. In ICML, 2021.
  32. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  33. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  34. Graf: Generative radiance fields for 3d-aware image synthesis. In NeurIPS, 2020.
  35. 3d neural field generation using triplane diffusion. In CVPR, 2023.
  36. Scene representation networks: Continuous 3d-structure-aware neural scene representations. In NeurIPS, 2019.
  37. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  38. Generative modeling by estimating gradients of the data distribution. In NeurIPS, 2019.
  39. Disentangled3d: Learning a 3d generative model with disentangled geometry and appearance from monocular images. In CVPR, 2022.
  40. Attention is all you need. In NeurIPS, 2017.
  41. Simnp: Learning self-similarity priors between neural points. In ICCV, 2023.
  42. Point-NeRF: Point-based neural radiance fields. In CVPR, 2022.
  43. Pointflow: 3d point cloud generation with continuous normalizing flows. In ICCV, 2019.
  44. Mvsnet: Depth inference for unstructured multi-view stereo. In ECCV, 2018.
  45. LION: Latent point diffusion models for 3d shape generation. In NeurIPS, 2022.
  46. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  47. 3d shape generation and completion through point-voxel diffusion. In ICCV, 2021.
  48. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
  49. Visual object networks: Image generation with disentangled 3d representations. In NeurIPS, 2018.
Citations (6)

Summary

  • The paper introduces a diffusion model that disentangles 3D shape and appearance generation by combining neural point clouds with radiance fields.
  • The methodology leverages iterative denoising and volume rendering to enable independent control over coarse shapes and detailed appearances.
  • Results show reduced FID scores and enhanced diversity across datasets, establishing a new benchmark for 3D asset creation.

Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

Abstract

The paper "Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation" (2312.14124) proposes a pioneering methodology that addresses the challenges of generating controllable 3D assets, which are critical in areas such as AR/VR, content creation, and engineering. The researchers introduce a diffusion model that allows for disentangled generation of 3D shapes and appearances, a capability not achievable with existing models. By utilizing a hybrid approach combining neural point clouds with neural radiance fields, the paper demonstrates a method enabling separate control over shape and appearance. This results in significant advancements in generation quality, evidenced by reduced FID scores compared to previous methods.

Introduction

The creation and manipulation of 3D assets have extensive applications across various domains, including virtual reality (VR), augmented reality (AR), and media production. While diffusion models excel in generating high-quality 3D objects, the ability to independently control shape and appearance in these models remains unattainable. This paper introduces neural point cloud diffusion (NPCD), which offers this capability through a hybrid representation combining point clouds and neural radiance fields. The approach disentangles coarse object shapes from their appearance, allowing separate sampling and control.

The key innovation lies in modeling a diffusion process over point positions along with a high-dimensional feature space for density and radiance decoding. This enables independent control and sampling, setting new benchmarks in generation quality relative to previous state-of-the-art methods such as GRAF and Disentangled3D.

Methodology

NPCD operates by leveraging neural point clouds that host a continuous radiance field. This setup includes:

  • Point Positions and Features: Each point in the cloud has associated position and feature data. The positions dictate the coarse shape, while the features detail local appearance.
  • Volume Rendering: Utilizing a generalizable renderer, the model constructs images from these neural point clouds. This involves aggregating features based on proximity and rendering via multilayer perceptrons.
  • Denoising Diffusion: The diffusion process adopts a DDPM approach, transforming the data through iterative noise reduction, allowing for disentangled control over generation. Figure 1

    Figure 1: Overview of neural point cloud diffusion (NCPD). In the center we have a neural point cloud representation, where each point has a position and an appearance feature.

Results

The researchers compare NPCD against existing models that enable disentangled generation and show substantial improvements in quality as reflected by lower FID scores across multiple datasets including SRN Cars, SRN Chairs, and PhotoShape Chairs. The qualitative evaluations demonstrate NPCD's ability to generate diverse shapes and appearances independently. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Qualitative examples of disentangled generation on SRN cars, SRN chairs, PhotoShape chairs.

In addition, NPCD contends competitively with other generative models not capable of disentangled generation, showcasing comparable or superior performance metrics in traditional synthesis tasks. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Comparison against previous generative models that allow disentangled generation.

Implications and Future Directions

The NPCD model marks a significant step in generative modeling, notably in disentangled generation of 3D assets. Its impact extends to practical applications demanding fine control over object characteristics—essential for custom asset creation and modification in AR/VR environments. Future research could explore optimizing neural point cloud diffusion further or extending the framework to cater to more complex applications and larger datasets.

The disentanglement strategy employed by NPCD may inspire new architectures and learning strategies that emphasize modular and controllable learning processes across various generative domains, paving the way for more versatile applications in AI-driven spaces.

Conclusion

Neural Point Cloud Diffusion presents a novel and efficient method for disentangled 3D object generation, achieving superior results in independent control over shape and appearance. This establishes new standards in the generation quality for complex 3D assets, overcoming previous limitations found in GAN-based models. The research outlines promising potential for enhanced controls in asset creation, contributing significantly to the fields of artificial intelligence and computer graphics.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.