ControlMat: A Controlled Generative Approach to Material Capture
Abstract: Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials which could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space-optimization methods, and carefully validate our diffusion process design choices. Supplemental materials and additional details are available at: https://gvecchio.com/controlmat/.
- Adobe. 2022. Substance Source. https://substance3d.adobe.com/assets/.
- Reflectance Modeling by Neural Texture Synthesis. ACM Trans. Graph. 35, 4, Article 65 (jul 2016), 13 pages. https://doi.org/10.1145/2897824.2925917
- Two-shot SVBRDF Capture for Stationary Materials. ACM Trans. Graph. 34, 4, Article 110 (July 2015), 13 pages. https://doi.org/10.1145/2766967
- Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.
- MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation. arXiv preprint arXiv:2302.08113 2 (2023).
- Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
- Robert L Cook and Kenneth E. Torrance. 1982. A reflectance model for computer graphics. ACM Transactions on Graphics (ToG) 1, 1 (1982), 7–24.
- Bin Dai and David Wipf. 2019. Diagnosing and enhancing VAE models. arXiv preprint arXiv:1903.05789 (2019).
- Single-Image SVBRDF Capture with a Rendering-Aware Deep Network. ACM Transactions on Graphics (SIGGRAPH Conference Proceedings) 37, 128 (aug 2018), 15. http://www-sop.inria.fr/reves/Basilic/2018/DADDB18
- Flexible SVBRDF Capture with a Multi-Image Deep Network. Computer Graphics Forum(Eurographics Symposium on Rendering Conference Proceedings) 38, 4 (jul 2019), 13. http://www-sop.inria.fr/reves/Basilic/2019/DADDB19
- Guided Fine-Tuning for Large-Scale Material Transfer. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 39, 4 (2020). http://www-sop.inria.fr/reves/Basilic/2020/DDB20
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780–8794.
- Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. Advances in neural information processing systems 29 (2016).
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12873–12883.
- Michael Fischer and Tobias Ritschel. 2022. Metappearance: Meta-Learning for Visual Appearance Reproduction. ACM Trans Graph (Proc. SIGGRAPH Asia) 41, 4 (2022).
- Deep Inverse Rendering for High-Resolution SVBRDF Estimation from an Arbitrary Number of Images. ACM Trans. Graph. 38, 4, Article 134 (jul 2019), 15 pages. https://doi.org/10.1145/3306346.3323042
- Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
- BRDF Representation and Acquisition. In Proceedings of the 37th Annual Conference of the European Association for Computer Graphics: State of the Art Reports (Lisbon, Portugal) (EG ’16). Eurographics Association, Goslar, DEU, 625–650.
- MatFormer: A Generative Model for Procedural Materials. ACM Trans. Graph. 41, 4, Article 46 (jul 2022), 12 pages. https://doi.org/10.1145/3528223.3530173
- Improved training of wasserstein gans. Advances in neural information processing systems 30 (2017).
- Highlight-Aware Two-Stream Network for Single-Image SVBRDF Acquisition. ACM Trans. Graph. 40, 4, Article 123 (jul 2021), 14 pages. https://doi.org/10.1145/3450626.3459854
- Ultra-High Resolution SVBRDF Recovery from a Single Image. ACM Trans. Graph. (apr 2023). https://doi.org/10.1145/3593798 Just Accepted.
- MaterialGAN: Reflectance Capture Using a Generative SVBRDF Model. ACM Trans. Graph. 39, 6, Article 254 (nov 2020), 13 pages. https://doi.org/10.1145/3414685.3417779
- Generative Modelling of BRDF Textures from Flash Images. ACM Trans Graph (Proc. SIGGRAPH Asia) 40, 6 (2021).
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.
- A Novel Framework for Inverse Procedural Texture Modeling. ACM Trans. Graph. 38, 6, Article 186 (Nov. 2019), 14 pages. https://doi.org/10.1145/3355089.3356516
- Node Graph Optimization Using Differentiable Proxies. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver, BC, Canada) (SIGGRAPH ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 9 pages. https://doi.org/10.1145/3528233.3530733
- Generating Procedural Materials from Text or Image Prompts. In ACM SIGGRAPH 2023 Conference Proceedings.
- An Inverse Procedural Modeling Pipeline for SVBRDF Maps. ACM Trans. Graph. 41, 2, Article 18 (jan 2022), 17 pages. https://doi.org/10.1145/3502431
- Noise2Music: Text-conditioned Music Generation with Diffusion Models. arXiv:2302.03917 [cs.SD]
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125–1134.
- Álvaro Barbero Jiménez. 2023. Mixture of Diffusers for scene composition and high resolution image generation. arXiv preprint arXiv:2302.02412 (2023).
- Brian Karis. 2013. Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice 4, 3 (2013), 1.
- Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
- Modeling Surface Appearance from a Single Photograph Using Self-Augmented Convolutional Neural Networks. ACM Trans. Graph. 36, 4, Article 45 (jul 2017), 11 pages. https://doi.org/10.1145/3072959.3073641
- MaterIA: Single Image High-Resolution Material Capture in the Wild. Computer Graphics Forum 41, 2 (2022), 163–177. https://doi.org/10.1111/cgf.14466 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14466
- Wes McDermott. 2018. Maps common to both workflow. Allergorithmic, 75–79. https://substance3d.adobe.com/tutorials/courses/the-pbr-guide-part-2
- Lars Mescheder. 2018. On the convergence properties of gan training. arXiv preprint arXiv:1801.04406 1 (2018), 16.
- Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163 (2016).
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV]
- Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278–1286.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
- Making sense of cnns: Interpreting deep representations and their invariances with inns. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16. Springer, 647–664.
- Network-to-network translation with conditional invertible neural networks. Advances in Neural Information Processing Systems 33 (2020), 2784–2797.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv:2205.11487 [cs.CV]
- MATch: Differentiable Material Graphs for Procedural Material Capture. ACM Trans. Graph. 39, 6, Article 196 (Dec. 2020), 15 pages.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256–2265.
- Consistency Models. arXiv:2303.01469 [cs.LG]
- Neural discrete representation learning. Advances in neural information processing systems 30 (2017).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- SurfaceNet: Adversarial SVBRDF Estimation from a Single Image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12840–12848.
- MatFuse: Controllable Material Generation with Diffusion Models. arXiv:2308.11408 [cs.CV]
- Microfacet models for refraction through rough surfaces. In Proceedings of the 18th Eurographics conference on Rendering Techniques. 195–206.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4791–4800.
- Adding Conditional Control to Text-to-Image Diffusion Models.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
- PhotoMat: A Material Generator Learned from Single Flash Photos. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA.
- TileGen: Tileable, Controllable Material Generation and Capture. In SIGGRAPH Asia 2022 Conference Papers (Daegu, Republic of Korea) (SA ’22). Association for Computing Machinery, New York, NY, USA, Article 34, 9 pages. https://doi.org/10.1145/3550469.3555403
- A Semi-Procedural Convolutional Material Prior. Computer Graphics Forum n/a, n/a (2023). https://doi.org/10.1111/cgf.14781 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14781
- Xilong Zhou and Nima Khademi Kalantari. 2021. Adversarial Single-Image SVBRDF Estimation with Hybrid Training. Computer Graphics Forum (2021).
- Xilong Zhou and Nima Khademi Kalantari. 2022. Look-Ahead Training with Learned Reflectance Loss for Single-Image SVBRDF Estimation. ACM Transactions on Graphics 41, 6 (12 2022). https://doi.org/10.1145/3550454.3555495
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.