Papers
Topics
Authors
Recent
Search
2000 character limit reached

ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

Published 31 Jan 2024 in cs.CV, cs.AI, and cs.GR | (2401.17895v1)

Abstract: We introduce ReplaceAnything3D model (RAM3D), a novel text-guided 3D scene editing method that enables the replacement of specific objects within a scene. Given multi-view images of a scene, a text prompt describing the object to replace, and a text prompt describing the new object, our Erase-and-Replace approach can effectively swap objects in the scene with newly generated content while maintaining 3D consistency across multiple viewpoints. We demonstrate the versatility of ReplaceAnything3D by applying it to various realistic 3D scenes, showcasing results of modified foreground objects that are well-integrated with the rest of the scene without affecting its overall integrity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Blended Diffusion for Text-driven Editing of Natural Images. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18187–18197, New Orleans, LA, USA, 2022. IEEE.
  2. Blended Latent Diffusion, 2023. arXiv:2206.02779 [cs].
  3. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5855–5864, 2021.
  4. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
  5. Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
  6. InstructPix2Pix: Learning to Follow Image Editing Instructions, 2023. arXiv:2211.09800 [cs].
  7. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  8. Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051, 2022.
  9. Fastnerf: High-fidelity neural rendering at 200fps. arXiv preprint arXiv:2103.10380, 2021.
  10. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions, 2023. arXiv:2303.12789 [cs].
  11. Prompt-to-Prompt Image Editing with Cross Attention Control, 2022. arXiv:2208.01626 [cs].
  12. Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation, 2023. arXiv:2303.15413 [cs].
  13. Nerfshop: Interactive editing of neural radiance fields”. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 6(1), 2023.
  14. Relu fields: The little non-linearity that could. Transactions on Graphics (Proceedings of SIGGRAPH), 41(4):13:1–13:8, 2022.
  15. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  16. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  17. Uhdnerf: Ultra-high-definition neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23097–23108, 2023a.
  18. Dreamedit: Subject-driven image editing. Transactions on Machine Learning Research, 2023b.
  19. Barf: Bundle-adjusting neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021.
  20. Magic3d: High-resolution text-to-3d content creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  21. Zero-1-to-3: Zero-shot one image to 3d object, 2023.
  22. Editing conditional radiance fields. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  23. Att3d: Amortized text-to-3d object synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17946–17956, 2023.
  24. Luca Medeiros. Language segment anything. GitHub repository, 2021.
  25. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures, 2022. arXiv:2211.07600 [cs].
  26. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  27. Reference-guided controllable inpainting of neural radiance fields. In ICCV, 2023a.
  28. Reference-guided Controllable Inpainting of Neural Radiance Fields, 2023b. arXiv:2304.09677 [cs].
  29. SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields, 2023c. arXiv:2211.12254 [cs].
  30. Null-text Inversion for Editing Real Images using Guided Diffusion Models, 2022. arXiv:2211.09794 [cs].
  31. DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, 2023. arXiv:2307.02421 [cs].
  32. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  33. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
  34. DreamFusion: Text-to-3D using 2D Diffusion, 2022. arXiv:2209.14988 [cs, stat].
  35. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023a.
  36. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors, 2023b. arXiv:2306.17843 [cs].
  37. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  38. Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
  39. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
  40. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  41. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, 2023. arXiv:2208.12242 [cs].
  42. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022.
  43. MVDream: Multi-view Diffusion for 3D Generation, 2023. arXiv:2308.16512 [cs].
  44. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  45. Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields, 2023. arXiv:2308.11974 [cs].
  46. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023a.
  47. Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior, 2023b. arXiv:2303.14184 [cs].
  48. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3835–3844, 2022.
  49. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12619–12629, 2023a.
  50. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. NeurIPS, 2021.
  51. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation, 2023b. arXiv:2305.16213 [cs].
  52. Removing objects from neural radiance fields. In CVPR, 2023.
  53. Lin Yen-Chen. Nerf-pytorch. https://github.com/yenchenlin/nerf-pytorch/, 2020.
  54. pixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021.
  55. Nerf-editing: geometry editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18353–18364, 2022.
  56. HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance, 2023. arXiv:2305.18766 [cs].
  57. DreamEditor: Text-Driven 3D Scene Editing with Neural Fields, 2023. arXiv:2306.13455 [cs].
Citations (4)

Summary

  • The paper introduces RAM3D, which leverages text prompts for targeted object detection, removal, and replacement in 3D scenes.
  • It employs a multi-stage erase-and-replace method combining text-guided inpainting and neural radiance fields to maintain consistent views.
  • RAM3D demonstrates robustness across varied scene types and supports custom asset integration, marking a significant advancement in 3D scene editing.

Introduction

One of the burgeoning challenges in the 3D content creation domain is the ability to edit and manipulate 3D scenes post-reconstruction. While significant strides have been made in the field of 3D reconstruction and generation, efficient and intuitive techniques for 3D content editing lag behind. The ReplaceAnything3D model (RAM3D) situates itself as a pioneering advancement in this area. RAM3D is a novel method that leverages text guidance to identify, erase, and replace objects within 3D scenes while ensuring consistency and realism across multiple viewpoints.

Erase-and-Replace Approach

RAM3D's core methodology revolves around an Erase-and-Replace paradigm which is executed in several stages. Initially, object detection and segmentation are informed by natural language prompts through the LangSAM framework, isolating the target object to be eliminated. Subsequently, a text-guided 3D inpainting technique fills in the background vacated by the erased object. The third stage applies a similar technique to generate a new object that aligns with the provided text description. Finally, the generated object is seamlessly integrated into the scene. A neural radiance field (NeRF) is then applied to these edited multi-view images, resulting in a novel 3D scene representation that can be rendered from new viewpoints.

The proposed model exhibits superiority in object replacement within 3D scenes, overcoming the challenges encountered by traditional 2D methods in maintaining multi-view consistency. By leveraging the strength of image diffusion models and learned 3D scene representations, alongside Hifa's text-to-3D distillation approach, RAM3D introduces a compositional structure that significantly enhances the visual coherence of the edited scenes.

Methodology

The ReplaceAnything3D framework introduces a unique pipeline:

  1. The Erase stage employs a novel text-guided 3D inpainting technique for background restoration, optimizing parameters to implicitly represent an accurately repainted scene behind the removed element.
  2. During the Replace stage, new objects prescribed by text prompts are generated and composited over the repainted background, employing pre-trained inpainting diffusion models.
  3. The final step involves creating a modified training dataset using the edited views to train a new NeRF, thereby synthesizing the modified scene from unexplored viewpoints.

Results and Contributions

In evaluating RAM3D's capabilities, the paper outlines distinct contributions. The model successfully implements a local edit that replaces high-resolution scene objects specified by users. Furthermore, the process is adept at removing or incorporating multiple objects within a 3D scene, demonstrating robustness across various scene types including both forward-facing and 360-degree scenarios.

Through extensive experimentation, RAM3D has exhibited quantitatively impressive results. The model showcases its prowess not only in replacing objects but also in adept object removal and addition to scenes. An innovative feature of RAM3D is the capability for users to integrate personalized assets into scenes—by fine-tuning a diffusion model with images of an object, RAM3D is able to incorporate or replace objects with custom content.

Conclusion and Future Directions

ReplaceAnything3D emerges as a significant leap forward in the arena of text-guided 3D scene editing. Its multi-stage approach provides remarkable flexibility, enabling users to perform intricate edits that were previously challenging. Looking ahead, the paper identifies opportunities for extending RAM3D to handle other scene representations, further refine editing controls, and expedite the editing process. RAM3D thus sets the stage for substantial future advancements in 3D content creation and manipulation, promising new horizons in VR/MR, gaming, and digital media.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 553 likes about this paper.