Papers
Topics
Authors
Recent
Search
2000 character limit reached

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

Published 29 May 2024 in cs.CV | (2405.19203v2)

Abstract: This paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: (1) The unstructured nature of 3D Gaussian makes it incompatible with current generation pipelines; (2) the expressive animation of 3D Gaussian in a generative setting that involves training with multiple subjects remains unexplored. In this paper, we propose a novel avatar generation method named $E3$Gen, to effectively address these challenges. First, we propose a novel generative UV features plane representation that encodes unstructured 3D Gaussian onto a structured 2D UV space defined by the SMPL-X parametric model. This novel representation not only preserves the representation ability of the original 3D Gaussian but also introduces a shared structure among subjects to enable generative learning of the diffusion model. To tackle the second challenge, we propose a part-aware deformation module to achieve robust and accurate full-body expressive pose control. Extensive experiments demonstrate that our method achieves superior performance in avatar generation and enables expressive full-body pose control and editing. Our project page is https://olivia23333.github.io/E3Gen.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Gaussian Shell Maps for Efficient 3D Human Generation. arXiv:2311.17857 [cs.CV]
  2. Driving-signal aware full-body avatars. ACM Trans. Graph. 40, 4, Article 143 (jul 2021), 17 pages. https://doi.org/10.1145/3450626.3459850
  3. Generative Neural Articulated Radiance Fields. In NeurIPS.
  4. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In Proc. CVPR.
  5. Efficient Geometry-Aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16123–16133.
  6. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2416–2425.
  7. VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8986–8997.
  8. gdna: Towards generative detailed neural avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20427–20437.
  9. Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11594–11604.
  10. PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation. In Thirty-seventh Conference on Neural Information Processing Systems.
  11. Expressive telepresence via modular codec avatars. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16. Springer, 330–345.
  12. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (ToG) 34, 4 (2015), 1–13.
  13. Smplicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11875–11885.
  14. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
  15. AG3D: Learning to Generate 3D Avatars from 2D Image Collections. In International Conference on Computer Vision (ICCV).
  16. HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 14300–14310.
  17. Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems 35 (2022), 31841–31854.
  18. Stylepeople: A generative model of fullbody human avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5151–5160.
  19. StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis. In International Conference on Learning Representations. https://openreview.net/forum?id=iUuzzTMUw9K
  20. The relightables: Volumetric performance capture of humans with realistic relighting. ACM Transactions on Graphics (ToG) 38, 6 (2019), 1–19.
  21. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  22. EVA3D: Compositional 3D Human Generation from 2D Image Collections. In International Conference on Learning Representations. https://openreview.net/forum?id=g7U9jD_2CUr
  23. HumanLiff: Layer-wise 3D Human Generation with Diffusion Model. arXiv preprint (2023).
  24. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 694–711.
  25. HoloDiffusion: Training a 3D Diffusion Model using 2D Images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
  26. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.
  27. Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.
  28. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
  29. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42, 4 (2023), 1–14.
  30. Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15965–15976.
  31. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics (ToG) 39, 6 (2020), 1–14.
  32. Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  33. TADA! Text to Animatable Digital Avatars. In International Conference on 3D Vision (3DV).
  34. MeshDiffusion: Score-based Generative 3D Mesh Modeling. In International Conference on Learning Representations. https://openreview.net/forum?id=0cpM2ApF9p6
  35. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.
  36. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16.
  37. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
  38. Michael Niemeyer and Andreas Geiger. 2021. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
  39. Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations. In European Conference on Computer Vision.
  40. Autodecoding latent 3d diffusion models. Advances in Neural Information Processing Systems 36 (2023), 67021–67047.
  41. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13503–13513.
  42. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
  43. DreamFusion: Text-to-3D using 2D Diffusion. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=FjNys5c7VyY
  44. Shell maps. ACM Trans. Graph. 24, 3 (jul 2005), 626–633. https://doi.org/10.1145/1073204.1073239
  45. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  46. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.
  47. Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. In International Conference on Learning Representations. https://openreview.net/forum?id=TIdIXIpzhoI
  48. SCULPT: Shape-conditioned unpaired learning of pose-dependent clothed and textured human meshes. arXiv preprint arXiv:2308.10638 (2023).
  49. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems 33 (2020), 20154–20166.
  50. Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).
  51. 3d neural field generation using triplane diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20875–20886.
  52. K Simonyan and A Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations (ICLR 2015), 1–14.
  53. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations. https://openreview.net/forum?id=PxTIG12RRHS
  54. RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 4563–4573.
  55. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In Advances in Neural Information Processing Systems (NeurIPS).
  56. Modeling clothing as a separate layer for an animatable human avatar. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–15.
  57. Get3dhuman: Lifting stylegan-human into a 3d generative model using pixel-aligned reconstruction priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9297.
  58. Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR2021).
  59. LION: Latent Point Diffusion Models for 3D Shape Generation. In Advances in Neural Information Processing Systems (NeurIPS).
  60. AvatarGen: A 3D Generative Model for Animatable Human Avatars. In Arxiv.
  61. GETAvatar: Generative Textured Meshes for Animatable Human Avatars. In ICCV.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.