Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

Published 4 Oct 2023 in cs.CV and cs.AI | (2310.03205v2)

Abstract: We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. An augmented lagrangian approach to the constrained optimization formulation of imaging inverse problems. IEEE Transactions on Image Processing (TIP), 20(3):681–695, 2010.
  2. A stochastic conditioning scheme for diverse human motion prediction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  3. A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning (ICML), 2019.
  4. Regressing robust and discriminative 3D morphable models with a very deep neural network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  5. Monocular reconstruction of neural face reflectance fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  6. Digiface-1m: 1 million digital face images for face recognition. In IEEE Winter Conf. on Applications of Computer Vision (WACV), 2023.
  7. On construction of a reliable ground truth for evaluation of visual slam algorithms. In Conference on Planning in Artificial Intelligence and Robotics, 2016.
  8. V. Blanz and T. Vetter. Face recognition based on fitting a 3d morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 25(9), 2003.
  9. A morphable model for the synthesis of 3d faces. ACM Transactions on Graphics (SIGGRAPH), 1999.
  10. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision (ECCV), 2016.
  11. Learning temporal 3d human pose estimation with pseudo-labels. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2021.
  12. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In IEEE International Conference on Computer Vision (ICCV), 2017.
  13. VGGFace2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, 2018.
  14. Accurate and robust 3d facial capture using a single rgbd camera. In IEEE International Conference on Computer Vision (ICCV), 2013.
  15. Cross-attention of disentangled modalities for 3d human mesh recovery with transformers. In European Conference on Computer Vision (ECCV), 2022.
  16. Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision (ECCV), 2020.
  17. Voxceleb2: Deep speaker recognition. INTERSPEECH, 2018.
  18. Yaim Cooper. Global minima of overparameterized neural networks. SIAM Journal on Mathematics of Data Science, 2021.
  19. Id-reveal: Identity-aware deepfake video detection. In IEEE International Conference on Computer Vision (ICCV), 2021.
  20. Capture, learning, and synthesis of 3D speaking styles. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  21. EMOCA: Emotion driven monocular face capture and animation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  22. Gradient descent finds global minima of deep neural networks. In International Conference on Machine Learning (ICML), 2019a.
  23. Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Machine Learning (ICML), 2019b.
  24. 3d morphable face models—past, present, and future. ACM Transactions on Graphics (SIGGRAPH), 39(5), 2020.
  25. Egocentric videoconferencing. ACM Transactions on Graphics (SIGGRAPH), 39(6), 2020.
  26. A 3-d audio-visual corpus of affective communication. IEEE Transactions on Multimedia, 12(6), 2010.
  27. Reconstructing 3d human pose by watching humans in the mirror. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  28. Towards racially unbiased skin tone estimation via scene disambiguation. In European Conference on Computer Vision (ECCV), 2022.
  29. Joint 3d face reconstruction and dense alignment with position map regression network. In European Conference on Computer Vision (ECCV), 2018.
  30. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (SIGGRAPH), 40(8), 2021.
  31. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  32. Towards fast, accurate and stable 3d dense face alignment. In European Conference on Computer Vision (ECCV), 2020.
  33. Resolving 3D human pose ambiguities with 3D scene constraints. In IEEE International Conference on Computer Vision (ICCV), 2019.
  34. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  35. Capturing and inferring dense full-body human-scene contact. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  36. Fitting 3d morphable face models using local features. In IEEE International Conference on Image Processing (ICIP), 2015.
  37. Panoptic studio: A massively multiview system for social motion capture. In IEEE International Conference on Computer Vision (ICCV), 2015.
  38. Total capture: A 3d deformation model for tracking faces, hands, and bodies. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  39. Exemplar fine-tuning for 3d human pose fitting towards in-the-wild 3d human pose estimation. In International Conference on 3D Vision (3DV), 2020.
  40. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  41. Realistic one-shot mesh-based head avatars. In European Conference on Computer Vision (ECCV), 2022.
  42. Deep video portraits. ACM Transactions on Graphics (SIGGRAPH), 37(4), 2018.
  43. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In IEEE International Conference on Computer Vision (ICCV), 2019.
  44. Acav100m: Automatic curation of large-scale datasets for audio-visual video representation learning. In IEEE International Conference on Computer Vision (ICCV), 2021.
  45. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics (SIGGRAPH Asia), 36(6), 2017.
  46. Mesh graphormer. In IEEE International Conference on Computer Vision (ICCV), 2021a.
  47. End-to-end human pose and mesh reconstruction with transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021b.
  48. Stableface: Analyzing and improving motion stability for talking face generation. arXiv preprint arXiv:2208.13717, 2022.
  49. Deep learning face attributes in the wild. In IEEE International Conference on Computer Vision (ICCV), 2015.
  50. Pixel codec avatars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  51. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. In IEEE International Conference on Computer Vision (ICCV), 2019.
  52. On self-contact and human pose. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  53. Voxceleb: a large-scale speaker identification dataset. In INTERSPEECH, 2017.
  54. Learning audio-video modalities from image captions. In European Conference on Computer Vision (ECCV), 2022.
  55. Towards understanding the role of over-parametrization in generalization of neural networks. arXiv preprint arXiv:1805.12076, 2018.
  56. Learning to listen: Modeling non-deterministic dyadic facial motion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  57. Deep face recognition. In British Machine Vision Conference (BMVC), 2015.
  58. Expressive body capture: 3d hands, face, and body from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  59. A 3d face model for pose and illumination invariant face recognition. In Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments, 2009.
  60. Pva: Pixel-aligned volumetric avatars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  61. Generating 3D faces using convolutional mesh autoencoders. In European Conference on Computer Vision (ECCV), 2018.
  62. Humor: 3d human motion model for robust pose estimation. In IEEE International Conference on Computer Vision (ICCV), 2021.
  63. Meshtalk: 3d face animation from speech using cross-modality disentanglement. In IEEE International Conference on Computer Vision (ICCV), 2021.
  64. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179, 2018.
  65. FaceForensics++: Learning to detect manipulated facial images. In IEEE International Conference on Computer Vision (ICCV), 2019.
  66. 300 faces in-the-wild challenge: Database and results. Image and Vision Computing (IMAVIS), 47, 2016.
  67. Mathieu Salzmann. Continuous inference in graphical models with polynomial energies. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
  68. Learning to regress 3d face shape and expression from an image without 3d supervision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019a.
  69. Learning to regress 3d face shape and expression from an image without 3d supervision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7763–7772, 2019b.
  70. Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In European Conference on Computer Vision (ECCV), 2020.
  71. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
  72. Pie: Portrait image embedding for semantic control. ACM Transactions on Graphics (SIGGRAPH Asia), 39(6), 2020.
  73. Real-time expression transfer for facial reenactment. ACM Transactions on Graphics (SIGGRAPH), 34(6), 2015.
  74. Face2face: Real-time face capture and reenactment of rgb videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  75. Headon: Real-time reenactment of human portrait videos. ACM Transactions on Graphics (SIGGRAPH), 37(4), jul 2018. ISSN 0730-0301. doi: 10.1145/3197517.3201350.
  76. Regressing robust and discriminative 3d morphable models with a very deep neural network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  77. Mead: A large-scale audio-visual dataset for emotional talking-face generation. In European Conference on Computer Vision (ECCV), 2020.
  78. Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In IEEE International Conference on Computer Vision (ICCV), 2019.
  79. One-shot free-view neural talking-head synthesis for video conferencing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  80. Component-based face recognition with 3d morphable models. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2004.
  81. Fake it till you make it: Face analysis in the wild using synthetic data alone. In IEEE International Conference on Computer Vision (ICCV), 2021.
  82. 3d face reconstruction with dense landmarks. In European Conference on Computer Vision (ECCV), 2022.
  83. Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation. ACM Transactions on Graphics (SIGGRAPH), 2022.
  84. Humbi: A large multiview dataset of human body expressions and benchmark challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
  85. Clip-actor: Text-driven recommendation and stylization for animating human meshes. In European Conference on Computer Vision (ECCV), 2022.
  86. Jun Zhang. The mean field theory in em procedures for blind markov random field image restoration. IEEE Transactions on Image Processing (TIP), 2(1):27–40, 1993.
  87. CelebV-HQ: A large-scale video facial attributes dataset. In European Conference on Computer Vision (ECCV), 2022.
  88. Towards metrical reconstruction of human faces. In European Conference on Computer Vision (ECCV), 2022.
  89. State of the art on monocular 3d face reconstruction, tracking, and applications. Computer Graphics Forum, 2018.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.