MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Abstract: We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and effectively eliminates penetration issues between the two individuals. We devise a pipeline in which the motion estimated by a kinematic-based method is fed into a physics simulator in an autoregressive manner. We introduce distinct components that enable our model to harness the simulator's properties without compromising the accuracy of the kinematic estimates. This results in final motion estimates that are both kinematically coherent and physically compliant. Extensive evaluations on three challenging datasets characterized by substantial inter-person interaction show that our method significantly reduces errors associated with penetration and foot skating, while performing competitively with the state-of-the-art on motion accuracy and smoothness. Results and code can be found on our project page (http://www.iri.upc.edu/people/nugrinovic/multiphys/).
- Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1964–1973, 2021.
- Three-dimensional reconstruction of human interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Remips: Physically consistent 3d reconstruction of multiple interacting people under weak supervision. In Advances in Neural Information Processing Systems, 2021.
- Humans in 4d: Reconstructing and tracking humans with transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Differentiable dynamics for articulated 3d human motion reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Trajectory optimization for physics-based reconstruction of 3d human pose from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. In IEEE. trans. PAMI, 2014.
- Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In European conference on computer vision, pages 443–460. Springer, 2022.
- Coherent reconstruction of multiple humans from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Learning 3d human dynamics from video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5614–5623, 2019.
- Occluded human mesh recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1715–1725, June 2022.
- Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Pace: Human and motion estimation from in-the-wild videos. In 3DV, 2024.
- Questenvsim: Environment-aware simulated motion tracking from sparse sensors. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–9, 2023.
- D&d: Learning human dynamics from dynamic camera. In European Conference on Computer Vision (ECCV), 2022.
- 3d human motion estimation via motion compression and refinement. In Proceedings of the Asian Conference on Computer Vision, 2020.
- Dynamics-regulated kinematic policy for egocentric pose estimation. In Advances in Neural Information Processing Systems, 2021.
- Embodied scene-aware human pose estimation. In Advances in Neural Information Processing Systems, 2022.
- Amass: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- Single-shot multi-person 3d pose estimation from monocular rgb. In International Conference on 3D Vision (3DV), pages 120–130. IEEE, 2018.
- Generative proxemics: A prior for 3d social interaction from images. arXiv preprint arXiv:2306.09337, 2023.
- Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Human mesh recovery from multiple shots. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Sfv: Reinforcement learning of physical skills from videos. ACM Trans. Graph., 37(6), Nov. 2018.
- Humor: 3d human motion model for robust pose estimation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 2022.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Neural monocular 3d human motion capture with physical awareness. ACM Transactions on Graphics, 40(4), aug 2021.
- Physcap: Physically plausible monocular 3d motion capture in real time. ACM Transactions on Graphics, 39(6), dec 2020.
- Monocular, one-stage, regression of multiple 3d people. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021.
- TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Putting people in their place: Monocular regression of 3D people in depth. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2022.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- Body size and depth disambiguation in multi-person reconstruction from single images. 2021.
- Spatio-temporal detection of fine-grained dyadic human interactions. In Mohamed Chetouani, Jeffrey Cohn, and Albert Ali Salah, editors, Human Behavior Understanding, 2016.
- Multi-person extreme motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Questsim: Human motion tracking from sparse sensors with simulated avatars. In SIGGRAPH Asia 2022 Conference Papers, pages 1–8, 2022.
- Decoupling human and camera motion from videos in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
- Hi4d: 4d instance segmentation of close human interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Human dynamics from monocular video with dynamic camera movements. ACM Trans. Graph., 40(6), 2021.
- Glamr: Global occlusion-aware human mesh recovery with dynamic cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Residual force control for agile human behavior imitation and extended motion synthesis. In Advances in Neural Information Processing Systems, 2020.
- Simpoe: Simulated character control for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Light-weight multi-person total capture using sparse multi-view cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Realistic full-body tracking from sparse observations via joint-level modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14678–14688, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.