SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Abstract: We propose a novel feed-forward 3D editing framework called Shap-Editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks. This is achieved via a process called distillation, which transfers knowledge from the 2D network to 3D assets. Distillation necessitates at least tens of minutes per asset to attain satisfactory editing results, and is thus not very practical. In contrast, we ask whether 3D editing can be carried out directly by a feed-forward network, eschewing test-time optimisation. In particular, we hypothesise that editing can be greatly simplified by first encoding 3D objects in a suitable latent space. We validate this hypothesis by building upon the latent space of Shap-E. We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward editor network that only requires approximately one second per edit. Our experiments show that Shap-Editor generalises well to both in-distribution and out-of-distribution 3D assets with different prompts, exhibiting comparable performance with methods that carry out test-time optimisation for each edited instance.
- Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In CVPR, 2023.
- Text2live: Text-driven layered image and video editing. In ECCV, 2022.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
- Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. In SIGGRAPH, 2023.
- Neuraleditor: Editing neural radiance fields via manipulating point clouds. In CVPR, 2023a.
- Training-free layout control with cross-attention guidance. In WACV, 2023b.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023c.
- Transformers as meta-learners for implicit neural representations. In ECCV, 2022.
- Progressive3d: Progressively local editing for text-to-3d content creation with complex semantic prompts. arXiv preprint arXiv:2310.11784, 2023.
- Diffusion self-guidance for controllable image generation. In NeurIPS, 2023.
- Shapecrafter: A recursive text-conditioned 3d shape generation model. NeurIPS, 2022.
- Stylegan-nada: Clip-guided domain adaptation of image generators. In SIGGRAPH, 2021.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023.
- Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 2022.
- Recolornerf: Layer decomposed radiance field for efficient color editing of 3d scenes. arXiv preprint arXiv:2301.07958, 2023.
- Blended-nerf: Zero-shot object generation and blending in existing neural radiance fields. ICCV, 2023.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. In ICCV, 2023.
- Prompt-to-prompt image editing with cross-attention control. In ICLR, 2023.
- Classifier-free diffusion guidance. abs/2207.12598, 2022.
- Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
- Zero-shot text-guided object generation with dream fields. In CVPR, 2022.
- Nerfshop: Interactive editing of neural radiance fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(1), 2023.
- Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
- Instruct 3d-to-3d: Text instruction guided 3d-to-3d conversion. arXiv preprint arXiv:2303.15780, 2023.
- Conerf: Controllable neural radiance fields. In CVPR, 2022.
- Relu fields: The little non-linearity that could. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
- Imagic: Text-based real image editing with diffusion models. In CVPR, 2023.
- Decomposing nerf for editing via feature field distillation. NeurIPS, 35:23311–23330, 2022.
- Nerf-vae: A geometry aware 3d scene generative model. In ICML. PMLR, 2021.
- Palettenerf: Palette-based appearance editing of neural radiance fields. In CVPR, 2023.
- Multi-concept customization of text-to-image diffusion. In CVPR, 2023.
- Ice-nerf: Interactive color editing of nerfs via decomposition-aware weight optimization. In ICCV, 2023.
- Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. NeurIPS, 2022.
- Language-driven semantic segmentation. In ICLR, 2022a.
- Grounded language-image pre-training. In CVPR, 2022b.
- Focaldreamer: Text-driven 3d editing via focal-fusion assembly. arXiv preprint arXiv:2308.10608, 2023a.
- Gligen: Open-set grounded text-to-image generation. In CVPR, 2023b.
- Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
- Editing conditional radiance fields. In ICCV, 2021.
- Image segmentation using text and image prompts. In CVPR, 2022.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
- Text2mesh: Text-driven neural stylization for meshes. In CVPR, 2022.
- Sked: Sketch-guided text-based 3d editing. In ICCV, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Spin-nerf: Multiview segmentation and perceptual inpainting with neural radiance fields. In CVPR, 2023.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 conference papers, pages 1–8, 2022.
- Null-text inversion for editing real images using guided diffusion models. In CVPR, 2023.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Ed-nerf: Efficient text-guided editing of 3d scene using latent space nerf. arXiv preprint arXiv:2310.02712, 2023.
- Zero-shot image-to-image translation. In SIGGRAPH, 2023.
- Cagenerf: Cage-based neural radiance field for generalized 3d deformation and animation. NeurIPS, 2022.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Clip-sculptor: Zero-shot generation of high-fidelity and diverse shapes from natural language. In CVPR, 2023.
- Vox-e: Text-guided voxel editing of 3d objects. In ICCV, 2023a.
- Vox-e: Text-guided voxel editing of 3d objects. In ICCV, 2023b.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. NeurIPS, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265. PMLR, 2015.
- Blending-nerf: Text-driven localized editing in neural radiance fields. In ICCV, 2023.
- Denoising diffusion implicit models. In ICLR, 2021.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In 2022 International Conference on 3D Vision (3DV), pages 443–453. IEEE, 2022.
- Splicing vit features for semantic appearance transfer. In CVPR, 2022.
- Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, 2023.
- Proteusnerf: Fast lightweight nerf editing using 3d-aware image context. arXiv preprint arXiv:2310.09965, 2023a.
- Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In CVPR, 2022.
- Nerf-art: Text-driven neural radiance fields stylization. IEEE TVCG, 2023b.
- Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094, 2023c.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023d.
- Removing objects from neural radiance fields. In CVPR, 2023.
- Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In CVPR, 2023.
- Desrf: Deformable stylized radiance field. In CVPR, 2023.
- Deforming radiance fields with cages. In ECCV, 2022.
- Learning object-compositional neural radiance field for editable scene rendering. In ICCV, 2021.
- Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In ECCV, 2022.
- Edit-diffnerf: Editing 3d neural radiance fields using 2d diffusion model. arXiv preprint arXiv:2306.09551, 2023.
- Nerf-editing: geometry editing of neural radiance fields. In CVPR, 2022.
- Lion: Latent point diffusion models for 3d shape generation. arXiv preprint arXiv:2210.06978, 2022.
- Text-guided generation and editing of compositional 3d avatars. arXiv preprint arXiv:2309.07125, 2023a.
- Arf: Artistic radiance fields. In ECCV, 2022.
- Magicbrush: A manually annotated dataset for instruction-guided image editing. In NeurIPS, 2023b.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023c.
- Hive: Harnessing human feedback for instructional visual editing. arXiv preprint arXiv:2303.09618, 2023d.
- Editablenerf: Editing topologically varying neural radiance fields by key points. In CVPR.
- Repaint-nerf: Nerf editting via semantic masks and diffusion models. arXiv preprint arXiv:2306.05668, 2023.
- Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.