Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Published 18 Dec 2023 in cs.CV, cs.AI, and cs.GR | (2312.11360v2)

Abstract: We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it

Abstract PDF HTML Upgrade to Chat

References (66)

Citations (29)

View on Semantic Scholar

Summary

The paper introduces a novel method that converts text descriptions into high-fidelity 3D textures using deep convolutional optimization and physically-based rendering.
It employs Score-Distillation Sampling with U-Net kernels to progressively refine texture details from low to high frequency.
Experimental results demonstrate enhanced texture coherence and lower FID scores compared to traditional synthesis methods.

"Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering"

Introduction to Paint-it

The paper "Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering" describes a novel method for synthesizing high-fidelity 3D textures through text-driven guidance. The core contribution of this work lies in leveraging Deep Convolutional Physically-Based Rendering (DC-PBR) for parameterizing texture maps, thereby enhancing the quality and realism of the synthesized textures.

Method Overview

Paint-it operates by transforming text descriptions into physically-based rendering (PBR) texture maps for 3D meshes. The process begins with an untextured 3D mesh and a textual description of the desired appearance. The system employs a deep convolutional model to re-parameterize the PBR texture maps, optimizing them using a Score-Distillation Sampling (SDS) process. The use of U-Net convolutional kernels enhances the optimization by prioritizing low-frequency textures initially and gradually adapting to high-frequency details.

Figure 1: Paint-it's pipeline illustrating DC-PBR as an intermediary layer that enhances texture realism through convolutional re-parameterization.

Score-Distillation Sampling

Score-Distillation Sampling is pivotal for ensuring that the synthesized image aligns with the user-provided text description. By simulating noisy perturbations of rendered images and utilizing a pre-trained text-conditional noise estimator, SDS refines the 3D representation to best match the textual prompt. This method supports the generation of textures with intricate material properties like reflectance and surface normals.

Practical Applications

The practical utilities of Paint-it are manifold, extending to industries such as gaming and cinematic productions where realistic 3D assets are imperative. Paint-it's ability to synthesize diverse texture maps offers significant flexibility in applications requiring dynamic relighting and material property adjustments, seamlessly integrating into existing graphics engines like Blender.

Figure 2: Practical applications of Paint-it in managing dynamic lighting and material properties using PBR texture maps.

Comparative Advantages

Paint-it excels in producing vivid, consistent, and realistic textures compared to contemporaneous methodologies—specifically those relying on color-projected 3D textures. Techniques such as the pixel-based optimization or mesh-based re-meshing methods often fall short in quality or require substantial post-processing. Paint-it circumvents these issues by employing a global gradient update mechanism, improving coherence across surfaces and reducing artifacts like texture seams.

Figure 3: Comparison of Paint-it's PBR disaggregation capabilities illustrating superior material uniqueness versus Fantasia3D.

Experimental Insights

Experiments conducted on standard datasets like Objaverse demonstrated Paint-it's superiority in generating realistic textures with lower Fréchet Inception Distance (FID) scores and higher user-study ratings than alternative methods. The fidelity in texture synthesis is further reinforced by ablation studies emphasizing the significance of convolutional re-parameterization.

Conclusion

Paint-it represents a substantive advance in text-to-texture synthesis, merging deep learning techniques with physics-based rendering to facilitate the creation of high-quality, text-driven 3D textures. Although providing a robust solution for texture synthesis, the method's optimization latency signs potential areas for future research, such as employing more efficient loss functions or utilizing pre-trained models to expedite rendering processes. The approach lays a foundation for future developments in AI-assisted graphics design, pushing towards automated generation of sophisticated digital assets.