SKED: Sketch-guided Text-based 3D Editing

Published 19 Mar 2023 in cs.CV and cs.GR | (2303.10735v4)

Abstract: Text-to-image diffusion models are gradually introduced into computer graphics, recently enabling the development of Text-to-3D pipelines in an open domain. However, for interactive editing purposes, local manipulations of content through a simplistic textual interface can be arduous. Incorporating user guided sketches with Text-to-image pipelines offers users more intuitive control. Still, as state-of-the-art Text-to-3D pipelines rely on optimizing Neural Radiance Fields (NeRF) through gradients from arbitrary rendering views, conditioning on sketches is not straightforward. In this paper, we present SKED, a technique for editing 3D shapes represented by NeRFs. Our technique utilizes as few as two guiding sketches from different views to alter an existing neural field. The edited region respects the prompt semantics through a pre-trained diffusion model. To ensure the generated output adheres to the provided sketches, we propose novel loss functions to generate the desired edits while preserving the density and radiance of the base instance. We demonstrate the effectiveness of our proposed method through several qualitative and quantitative experiments. https://sked-paper.github.io/

Abstract PDF Upgrade to Chat

Citations (55)

View on Semantic Scholar

Summary

The paper presents a novel method using two perspective sketches to precisely guide the editing of 3D shapes represented by neural radiance fields.
The approach integrates text prompts with a pre-trained diffusion model while enforcing custom loss functions to maintain semantic integrity.
Experimental results show enhanced precision and control in interactive 3D editing, highlighting its potential in computer graphics and virtual reality applications.

The paper "SKED: Sketch-guided Text-based 3D Editing" addresses the challenges of interactive 3D shape editing through a novel approach that combines user-guided sketches with text-based pipelines grounded in diffusion models. Traditional Text-to-3D pipelines, which translate textual descriptions into 3D shapes using Neural Radiance Fields (NeRF), fall short in allowing precise, localized modifications essential for interactive design.

Core Contribution:

SKED introduces a technique where minimal user input—specifically, two guiding sketches from different perspectives—can be used to precisely manipulate a neural field representation of a 3D shape. The principal innovation lies in the integration of these sketches with a pre-trained text-to-image diffusion model to guide the editing of 3D shapes while maintaining their underlying structure and semantic integrity.

Methodology:

Sketch Integration: The system leverages as few as two sketches from distinct views to guide the modification process.
Diffusion Model Utilization: A pre-trained diffusion model ensures that the semantics derived from text prompts are respected throughout the editing process.
Loss Functions: The authors propose novel loss functions tailored to enforce adherence to the user's sketches while preserving the original density and radiance characteristics of the NeRF.
- These loss functions balance between the fidelity to the sketches and the consistency of the rendered images across various views, ensuring semantically meaningful and visually coherent edits.

Experimental Validation:

The paper validates the effectiveness of SKED through both qualitative and quantitative assessments:

Qualitative Results: Visual examples showcase the technique's ability to make detailed and precise edits on various 3D models, demonstrating a high degree of control offered by sketch-guided modifications.
Quantitative Metrics: Numerical evaluations emphasize the accuracy and consistency of the model in generating the edited 3D shapes.

Impact and Applications:

SKED opens new possibilities for intuitive and interactive 3D content creation, making it easier for users to perform complex edits with simple sketches, which has substantial implications for fields like computer graphics, game development, virtual reality, and more.

By addressing the limitations of traditional textual interfaces in 3D editing, SKED represents a significant advancement in user-guided, interactive manipulation of neural radiance fields, bridging the gap between textual and visual input methods.

Markdown Report Issue