Image Sculpting: Precise Object Editing with 3D Geometry Control
Abstract: We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics. This approach differs markedly from existing methods, which are confined to 2D spaces and typically rely on textual instructions, leading to ambiguity and limited control. Image Sculpting converts 2D objects into 3D, enabling direct interaction with their 3D geometry. Post-editing, these objects are re-rendered into 2D, merging into the original image to produce high-fidelity results through a coarse-to-fine enhancement process. The framework supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition. It marks an initial step towards combining the creative freedom of generative models with the precision of graphics pipelines.
- CLIP2StyleGAN: Unsupervised extraction of StyleGAN edit directions. In SIGGRAPH, 2022.
- Adobe. Adobe After Effects. https://www.adobe.com/products/aftereffects.html, 2023a.
- Adobe. Adobe Firefly. https://www.adobe.com/sensei/generative-ai/firefly.html, 2023b.
- Adobe. Adobe Substance 3D. https://www.adobe.com/creativecloud/3d-ar.html, 2023c.
- Stability AI. Stable Diffusion XL Refiner 1.0. https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0, 2023.
- Automatic rigging and animation of 3D characters. TOG, 2007.
- State of the art in quad meshing. In STARs, 2012.
- On linear variational surface deformation methods. TVCG, 2008.
- Polygon Mesh Processing. AK Peters, 2010.
- InstructPix2Pix: Learning to follow image editing instructions. In CVPR, 2023.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields. 2019.
- Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
- ShapeNet: An information-rich 3D model repository. arXiv:1512.03012, 2015.
- Objaverse-XL: A universe of 10M+ 3D objects. In NeurIPS, 2023a.
- Objaverse: A universe of annotated 3D objects. In CVPR, 2023b.
- Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
- Variational barycentric coordinates. TOG, 2023.
- Gerald Farin. Curves and Surfaces for CAGD: A Practical Guide. Morgan Kaufmann Publishers Inc., 5th edition, 2001.
- A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
- Generative adversarial networks. In NeurIPS, 2014.
- StyleNeRF: A style-based 3D-aware generator for high-resolution image synthesis. In ICLR, 2022.
- threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio, 2023.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Prompt-to-prompt image editing with cross attention control. In ICLR, 2023.
- LRM: Large reconstruction model for single image to 3D. arXiv:2311.04400, 2023.
- LoRA: Low-rank adaptation of large language models. In ICLR, 2021.
- Autodesk Inc. AutoDesk 3ds Max 2023. https://www.autodesk.com/products/3ds-max/overview, 2023.
- Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
- Bounded biharmonic weights for real-time deformation. TOG, 2011.
- Skinning: Real-time shape deformation. TOG, 2014.
- Mean value coordinates for closed triangular meshes. TOG, 2005.
- A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
- Analyzing and improving the image quality of stylegan. In CVPR, 2020.
- Imagic: Text-based real image editing with diffusion models. In CVPR, 2023.
- Segment anything. arXiv:2304.02643, 2023.
- Modular primitives for high-performance differentiable rendering. TOG, 2020.
- Learning skeletal articulations with neural blend shapes. TOG, 2021.
- Consistent123: One image to highly consistent 3D asset using case-aware diffusion priors. arXiv:2309.17261, 2023.
- FreeDrag: Point tracking is not you need for interactive point-based image editing. arXiv:2307.04684, 2023.
- One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization. arXiv:2306.16928, 2023a.
- Zero-1-to-3: Zero-shot one image to 3D object. In ICCV, 2023b.
- SyncDreamer: Generating multiview-consistent images from a single-view image. arXiv:2309.03453, 2023c.
- Wonder3D: Single image to 3D using cross-domain diffusion. arXiv:2310.15008, 2023.
- Marching cubes: A high resolution 3D surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field. ACM, 1998.
- Joint-dependent local deformations for hand animation and object grasping. In GI. CIPS, 1989.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- OBJECT 3DIT: Language-guided 3D-aware image editing. arXiv:2307.11073, 2023.
- MidJourney. MidJourney. www.midjourney.com.
- Null-text inversion for editing real images using guided diffusion models. In CVPR, 2023.
- Instant neural graphics primitives with a multiresolution hash encoding. TOG, 2022.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. 2022.
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, 2020.
- OpenAI. DALL·E 3 System Card. https://openai.com/research/dall-e-3-system-card, 2023.
- Drag Your GAN: Interactive point-based manipulation on the generative image manifold. In TOG, 2023.
- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In ICCV, 2021.
- Scalable diffusion models with transformers. In CVPR, 2023.
- SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv:2307.01952, 2023.
- DreamFusion: Text-to-3D using 2D diffusion. ICLR, 2023.
- Magic123: One image to high-quality 3D object generation using both 2D and 3D diffusion priors. arXiv:2306.17843, 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.
- Common objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction. In ICCV, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
- Free-form deformation of solid geometric models. In Computer Graphics. ACM, 1986.
- Interpreting the latent space of gans for semantic face editing. In CVPR, 2020.
- Zero123++: a single image to consistent multi-view diffusion base model. arXiv:2310.15110, 2023a.
- Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512, 2023b.
- DragDiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv:2306.14435, 2023c.
- Fundamentals of Computer Graphics. AK Peters, 2009.
- Fem simulation of 3d deformable solids: A practitioner’s guide to theory, discretization and model reduction. TOG, 2012.
- Scene representation networks: Continuous 3D-structure-aware neural scene representations. In NeurIPS, 2019.
- Denoising diffusion implicit models. In ICLR, 2021.
- As-rigid-as-possible surface modeling. In SGP, 2007.
- Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, 2023.
- Score jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In CVPR, 2023a.
- Deep learning for image super-resolution: A survey. TPAMI, 43(10):3365–3387, 2020.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In NeurIPS, 2023b.
- Multiview compressive coding for 3d reconstruction. In CVPR, 2023.
- Stylespace analysis: Disentangled controls for stylegan image generation. In CVPR, 2021.
- Tedigan: Text-guided diverse face image generation and manipulation. In CVPR, 2021.
- Holistically-nested edge detection. In ICCV, 2015.
- Jonathan Young. xatlas. https://github.com/jpcy/xatlas, 2023.
- PixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021.
- Scaling autoregressive models for content-rich text-to-image generation. TMLR, 2023.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.