Papers
Topics
Authors
Recent
Search
2000 character limit reached

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

Published 27 May 2024 in cs.GR and cs.AI | (2405.17176v1)

Abstract: 2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition, such as baked-in shading effects in albedo. We introduce DreamMat, an innovative approach to resolve the aforementioned problem, to generate high-quality PBR materials from text descriptions. We find out that the main reason for the incorrect material distillation is that large-scale 2D diffusion models are only trained to generate final shading colors, resulting in insufficient constraints on material decomposition during distillation. To tackle this problem, we first finetune a new light-aware 2D diffusion model to condition on a given lighting environment and generate the shading results on this specific lighting condition. Then, by applying the same environment lights in the material distillation, DreamMat can generate high-quality PBR materials that are not only consistent with the given geometry but also free from any baked-in shading effects in albedo. Extensive experiments demonstrate that the materials produced through our methods exhibit greater visual appeal to users and achieve significantly superior rendering quality compared to baseline methods, which are preferable for downstream tasks such as game and film production.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (112)
  1. Single-Image 3D Human Digitization with Shape-guided Diffusion. In SIGGRAPH Asia. 1–11.
  2. Jonathan T Barron and Jitendra Malik. 2014. Shape, illumination, and reflectance from shading. TPAMI 37, 8 (2014), 1670–1687.
  3. Neural reflectance fields for appearance acquisition. arXiv preprint arXiv:2008.03824 (2020).
  4. Deep 3d capture: Geometry and reflectance from sparse multi-view images. In CVPR.
  5. Nerd: Neural reflectance decomposition from image collections. In CVPR.
  6. Neural-pil: Neural pre-integrated lighting for reflectance decomposition. In NeurIPS.
  7. Brent Burley and Walt Disney Animation Studios. 2012. Physically-based shading at disney. In SIGGRAPH.
  8. TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models. In ICCV.
  9. Scenetex: High-quality texture synthesis for indoor scenes via diffusion priors. arXiv preprint arXiv:2311.17261 (2023).
  10. Text2Tex: Text-driven Texture Synthesis via Diffusion Models. In ICCV.
  11. Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In ICCV.
  12. TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition. In NeurIPS.
  13. L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing. In ECCV.
  14. Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects. arXiv preprint arXiv:2401.05236 (2024).
  15. Multi-view 3d reconstruction of a texture-less smooth surface of unknown generic reflectance. In CVPR.
  16. Geometry Aware Texturing. In SIGGRAPH Asia. 1–2.
  17. Robert L Cook and Kenneth E. Torrance. 1982. A reflectance model for computer graphics. ACM Transactions on Graphics (ToG) 1, 1 (1982), 7–24.
  18. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR.
  19. Pandora: Polarization-aided neural decomposition of radiance. In ECCV.
  20. Objaverse: A universe of annotated 3d objects. In CVPR.
  21. DIP: Differentiable Interreflection-aware Physics-based Inverse Rendering. arXiv preprint arXiv:2212.04705 (2022).
  22. Deep polarization imaging for 3D shape and SVBRDF acquisition. In CVPR.
  23. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Transactions on Graphics (ToG) 38, 4 (2019), 1–15.
  24. Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing. arXiv preprint arXiv:2311.16043 (2023).
  25. MaterialGAN: reflectance capture using a generative SVBRDF model. ACM Transactions on Graphics (ToG) 39, 6 (2020), 1–13.
  26. threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio.
  27. Shape, light, and material decomposition from images using Monte Carlo rendering and denoising. NeurIPS.
  28. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In EMNLP.
  29. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
  30. Denoising diffusion probabilistic models. In NeurIPS.
  31. Text2room: Extracting textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989 (2023).
  32. Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. arXiv preprint arXiv:2310.01406 (2023).
  33. GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces. arXiv preprint arXiv:2311.17977 (2023).
  34. TensoIR: Tensorial Inverse Rendering. In CVPR.
  35. James T. Kajiya. 1986. The rendering equation. In SIGGRAPH.
  36. Brian Karis and Epic Games. 2013. Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice 4, 3 (2013), 1.
  37. Noise-free score distillation. arXiv preprint arXiv:2310.17590 (2023).
  38. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (ToG) 42, 4 (July 2023).
  39. Julian Knodt and Xifeng Gao. 2023. Consistent Mesh Diffusion. arXiv preprint arXiv:2312.00971 (2023).
  40. Intrinsic Image Diffusion for Single-view Material Estimation. arXiv preprint arXiv:2312.12274 (2023).
  41. NeROIC: Neural Rendering of Objects from Online Image Collections. In SIGGRAPH.
  42. Content creation for a 3D game with Maya and Unity 3D. Institute of Computer Graphics and Algorithms, Vienna University of Technology 6 (2011), 124.
  43. EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth. arXiv preprint arXiv:2311.15573 (2023).
  44. NeISF: Neural Incident Stokes Field for Geometry and Material Estimation. arXiv preprint arXiv:2311.13187 (2023).
  45. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
  46. Junxuan Li and Hongdong Li. 2022. Neural Reflectance for Shape Recovery with Shadow Handling. In CVPR.
  47. SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D. arxiv:2310.02596 (2023).
  48. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In CVPR.
  49. Learning to reconstruct shape and spatially-varying reflectance from a single image. In SIGGRAPH Asia.
  50. GS-IR: 3D Gaussian Splatting for Inverse Rendering. arXiv preprint arXiv:2311.16473 (2023).
  51. Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.
  52. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928 (2023).
  53. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV.
  54. SyncDreamer: Learning to Generate Multiview-consistent Images from a Single-view Image. arXiv preprint arXiv:2309.03453 (2023).
  55. NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images. In SIGGRAPH.
  56. Text-Guided Texturing by Synchronized Multi-View Diffusion. arXiv preprint arXiv:2311.12891 (2023).
  57. UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation. arXiv preprint arXiv:2312.08754 (2023).
  58. Unified shape and svbrdf recovery using differentiable monte carlo rendering. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 101–113.
  59. Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–14.
  60. X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. In ICCV.
  61. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. arXiv preprint arXiv:2211.07600 (2022).
  62. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV.
  63. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–15.
  64. Extracting Triangular 3D Models, Materials, and Lighting From Images. In CVPR.
  65. Practical SVBRDF Acquisition of 3D Objects with Unstructured Flash Photography. ACM Transactions on Graphics (ToG) 37, 6, Article 267 (2018), 12 pages.
  66. Mitsuba 2: A Retargetable Forward and Inverse Renderer. ACM Transactions on Graphics (ToG) 38, 6, Article 203 (2019), 17 pages.
  67. ControlDreamer: Stylized 3D Generation with Multi-View ControlNet. arXiv preprint arXiv:2312.01129 (2023).
  68. DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
  69. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. arXiv preprint arXiv:2306.17843 (2023).
  70. Learning transferable visual models from natural language supervision. In ICML.
  71. Texture: Text-guided texturing of 3d shapes. In SIGGRAPH.
  72. High-resolution image synthesis with latent diffusion models. In CVPR.
  73. Sam Sartor and Pieter Peers. 2023. MatFusion: A Generative Diffusion Model for SVBRDF Capture. In SIGGRAPH Asia.
  74. Sketchfab. [n. d.]. Sketchfab - The best 3D viewer on the web. https://www.sketchfab.com
  75. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In CVPR.
  76. Neural-PBIR reconstruction of shape, material, and illumination. In CVPR.
  77. Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818 (2023).
  78. DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars. In ICCV.
  79. MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion. (2023).
  80. Zhibin Tang and Tiantong He. 2023. Text-guided High-definition Consistency Texture Model. arXiv preprint arXiv:2305.05901 (2023).
  81. Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. In NeurIPS.
  82. Neural BSSRDF: Object Appearance Representation Including Heterogeneous Subsurface Scattering. arXiv preprint arXiv:2312.15711 (2023).
  83. ControlMat: A Controlled Generative Approach to Material Capture. arXiv preprint arXiv:2309.01700 (2023).
  84. MatFuse: Controllable Material Generation with Diffusion Models. arXiv preprint arXiv:2308.11408 (2023).
  85. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In NeurIPS.
  86. AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes. arXiv preprint arXiv:2312.06644 (2023).
  87. De-rendering 3d objects in the wild. In CVPR.
  88. Recovering shape and spatially-varying surface reflectance under unknown illumination. ACM Transactions on Graphics (ToG) 35, 6 (2016), 1–12.
  89. MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR. arXiv preprint arXiv:2308.09278 (2023).
  90. DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation. arXiv preprint arXiv:2310.13119 (2023).
  91. PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo. In ECCV.
  92. SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes. arXiv preprint arXiv:2310.13030 (2023).
  93. Neilf: Neural incident light field for physically-based material estimation. In ECCV.
  94. Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In NeurIPS.
  95. Intrinsicnerf: Learning intrinsic neural radiance fields for editable novel view synthesis. In ICCV.
  96. Jounathan Young. 2021. xatlas. https://github.com/jpcy/xatlas.git
  97. Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. arXiv preprint arXiv:2312.11360 (2023).
  98. Texture Generation on 3D Meshes with Point-UV Diffusion. In ICCV.
  99. Text-to-3d with classifier score distillation. arXiv preprint arXiv:2310.19415 (2023).
  100. Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models. arXiv preprint arXiv:2312.13913 (2023).
  101. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting. arXiv preprint arXiv:2312.13271 (2023).
  102. NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation. arXiv preprint arXiv:2303.17147 (2023).
  103. Iron: Inverse rendering by optimizing neural sdfs and materials from photometric images. In CVPR.
  104. PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In CVPR.
  105. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
  106. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (ToG) 40, 6 (2021), 1–18.
  107. Modeling Indirect Illumination for Inverse Rendering. In CVPR.
  108. Polarimetric multi-view inverse rendering. TPAMI (2022).
  109. TileGen: Tileable, Controllable Material Generation and Capture. In SIGGRAPH Asia.
  110. Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR.
  111. I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs. In CVPR.
  112. Junzhe Zhu and Peiye Zhuang. 2023. HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance. arXiv preprint arXiv:2305.18766 (2023).
Citations (5)

Summary

  • The paper introduces DreamMat, a novel approach that uses geometry- and light-aware diffusion models to accurately decompose PBR materials into albedo, roughness, and metalness.
  • The method leverages a randomized HDR lighting context and classifier score distillation loss to minimize shading artifacts and ensure consistency with light conditions.
  • Experimental results show higher CLIP scores and lower FID metrics than previous methods, demonstrating superior semantic alignment and visual realism in material generation.

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

Introduction

The paper "DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models" addresses the challenges of generating photo-realistic Physically Based Rendering (PBR) materials from textual descriptions. Conventional 2D diffusion models often integrate unwanted shading effects into RGB textures, resulting in unrealistic renderings. To overcome these challenges, the authors propose DreamMat, a novel approach that utilizes geometry- and light-aware diffusion models to ensure high-quality PBR materials generation.

Problem and Approach

The primary issue with existing methods is their focus on generating final shading colors rather than accurately decomposing materials into distinct PBR parameters like albedo, roughness, and metalness. The authors identify that mainstream 2D diffusion models lack sufficient constraints for material decomposition due to their training on final shading colors alone.

To mitigate this, DreamMat introduces the following key innovations:

  1. Light-aware Diffusion Model: The diffusion model is finetuned to consider a specified lighting environment. This ensures that generated textures align with given lighting conditions, reducing baked-in shading effects.
  2. Random Lighting Context: The distillation process incorporates random selection from a set of predefined High-Dynamic-Range (HDR) images, guiding material generation to focus on consistent geometry and light conditions.

Implementation Details

DreamMat leverages an inverse rendering-based approach using hash-grid-based representation to model SVBRDF, i.e., materials are computed and rendered using Monte Carlo sampling. The training involves optimizing material representation via a distillation loss paradigm. A key technical contribution is the finetuning of the Stable Diffusion model to be geometry- and light-aware using such conditions to accurately predict the appearance.

  • Material Representation: The SVBRDF is represented and optimized in a hash-grid format, which encodes albedo, roughness, and metallic properties.
  • Training Strategy: The pipeline utilizes Classifier Score Distillation (CSD) loss to iteratively improve the generated materials, focusing on alignment between rendered images and desired prompts under multiple lighting conditions.
  • Computational Tools: High-performance computation is achieved using ThreeStudio and large-scale GPU resources for training the ControlNet, which integrates both geometric and light conditions.

Experimental Results

The paper provides extensive experimental validation, showcasing superior performance over techniques like TEXTure, Fantasia3D, and others. The results highlight DreamMat's ability to efficiently generate detailed, realistic PBR materials while maintaining consistency under varied lighting conditions.

  • Qualitative Comparisons: DreamMat produces textures that are visually appealing, with enhanced fidelity to geometric structures and environmental lighting, as compared to prior methods.
  • Quantitative Metrics: The method achieves higher CLIP scores and lower FID compared to competitors, indicating better semantic alignment and visual quality.
  • User Studies: Feedback from studies demonstrates a preference for DreamMat-generated materials in terms of overall quality, text fidelity, and realistic rendering capabilities.

Limitations and Future Work

Despite its advancements, DreamMat has limitations in handling materials with complex physical interactions like transparency and subsurface scattering. Additionally, the computational cost of distillation presents challenges for real-time applications. Future work may explore optimizing indirect lighting effects and reducing computational overhead for broader applicability.

Conclusion

DreamMat represents a significant step forward in the automated generation of high-quality, realistic PBR materials using diffusion models. By integrating geometry and light awareness into the distillation process, DreamMat advances the state of the art in computer graphics, making it a valuable tool for applications in gaming, film, and virtual reality production.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 427 likes about this paper.