Accurate SMPL-X Estimation from Multi-View Videos

Develop an accurate and robust method for estimating SMPL-X parametric human body model parameters from multi-view video data that minimizes or eliminates the need for careful camera calibration and extensive parameter tuning.

Background

The paper compares UNICA against Animatable Gaussians, a state-of-the-art animatable avatar approach that relies on SMPL-X parameters estimated from videos. The authors note that the visual and motion fidelity of such parametric-model-based methods depend critically on precise SMPL-X tracking.

They explicitly state that achieving accurate SMPL-X estimation remains an open problem even with multi-view inputs, often requiring meticulous camera calibration and parameter tuning. This motivates UNICA’s design, which avoids dependence on SMPL-X estimation by directly generating geometry via an action-conditioned diffusion model.

References

However, accurate SMPL-X estimation—even from multi-view videos—remains an open problem that often demands careful camera calibration and parameter tuning.

UNICA: A Unified Neural Framework for Controllable 3D Avatars  (2604.02799 - Zhu et al., 3 Apr 2026) in Supplementary Material, Additional Results and Analysis, Per-Avatar Comparisons