Incorporating joint-level hand pose conditioning into video diffusion models
Establish an effective method for incorporating joint-level hand pose conditioning into video diffusion models to enable precise, dexterous hand–object interactions in egocentric settings.
References
As a result, it remains an open question how to effectively incorporate joint-level hand pose conditioning into video diffusion models.
— Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control
(2602.18422 - Xie et al., 20 Feb 2026) in Section 1: Introduction