Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose

Published 20 Aug 2020 in cs.CV | (2008.09047v3)

Abstract: Most of the recent deep learning-based 3D human pose and mesh estimation methods regress the pose and shape parameters of human mesh models, such as SMPL and MANO, from an input image. The first weakness of these methods is an appearance domain gap problem, due to different image appearance between train data from controlled environments, such as a laboratory, and test data from in-the-wild environments. The second weakness is that the estimation of the pose parameters is quite challenging owing to the representation issues of 3D rotations. To overcome the above weaknesses, we propose Pose2Mesh, a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose. The 2D human pose as input provides essential human body articulation information, while having a relatively homogeneous geometric property between the two domains. Also, the proposed system avoids the representation issues, while fully exploiting the mesh topology using a GraphCNN in a coarse-to-fine manner. We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets. For the codes, see https://github.com/hongsukchoi/Pose2Mesh_RELEASE.

Abstract PDF Upgrade to Chat

Citations (346)

View on Semantic Scholar

Summary

The paper introduces Pose2Mesh, a graph convolutional network that directly regresses 3D human poses and meshes from 2D poses.
It employs a coarse-to-fine cascaded architecture that integrates PoseNet for 3D pose estimation and MeshNet for detailed mesh recovery.
Results demonstrate reduced MPJPE and improved mesh quality on benchmarks like Human3.6M and 3DPW, underscoring its robustness in real-world scenarios.

Insights into Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery

The paper "Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose" presents a method to address challenges in 3D human pose and mesh estimation using a model-free graph convolutional approach. This research introduces Pose2Mesh, a novel system leveraging Graph Convolutional Networks (GraphCNN) to directly regress 3D coordinates of human mesh vertices from 2D human poses, overcoming limitations inherent in image-based methods.

Methodological Advancements

Pose2Mesh proposes a departure from traditional model-based or image-based approaches by focusing on the direct recovery of 3D human poses and meshes from 2D inputs. The method capitalizes on:

Domain Homogeneity: By using 2D poses as input, Pose2Mesh mitigates the appearance domain gap between controlled (e.g., laboratory) and in-the-wild environments. This approach leverages the consistent geometric properties of 2D poses across domains.
Representation Simplicity: Traditional methods often grapple with complex rotation representations like SMPL parameters. Pose2Mesh avoids these challenges by directly estimating 3D vertex coordinates using GraphCNN.
Coarse-to-Fine Structure: The graph neural network in Pose2Mesh is designed to upsample mesh information in a coarse-to-fine manner, allowing for efficient computation and enhanced precision.
Cascaded Architecture: The system is structured with PoseNet and MeshNet components. PoseNet converts 2D poses to 3D, and MeshNet estimates the detailed 3D mesh, utilizing both 2D and 3D inputs.

Results and Performance

Pose2Mesh demonstrates superior performance compared to previous state-of-the-art methods across several benchmarks, including Human3.6M and 3DPW, achieving reduced MPJPE and enhanced mesh quality. Notably, Pose2Mesh excels in 3DPW, an in-the-wild dataset, showcasing robustness to real-world conditions without reliance on additional visual data.

The experimental data underline the method's ability to deliver competitive results with fewer parameters and computational load. For instance, Pose2Mesh registers an impressive MPJPE of 64.9 on Human3.6M, outperforming several contemporary methods trained on more extensive datasets.

Implications and Future Directions

The implications of Pose2Mesh are multifold. Practically, the technique reduces dependency on image datasets for 3D pose estimation, lowering barriers to obtaining high-quality model training. Theoretically, it signifies a shift towards leveraging more abstract data inputs like 2D poses for complex 3D reconstructions.

Moving forward, Pose2Mesh could inspire enhancements incorporating additional human body markers or segmentation data to further refine shape recovery. The coherent integration of diverse data points, including texture or environmental factors, might lead to even more accurate and detailed human mesh reconstructions, paving the way for applications in animation, AR/VR environments, and human-computer interaction interfaces.

In essence, Pose2Mesh stands as a significant stride towards simplifying and strengthening the process of 3D mesh and pose recovery, scaling both scientific understanding and practical deployment in AI-driven modeling tasks.