Extending Talking Avatars to Grounded Human-Object Interaction (GHOI)
Establish a text-driven grounded human-object interaction (GHOI) capability for talking avatar video generation that enables avatars to perform interactions with surrounding objects aligned to textual commands.
References
Although existing methods can generate full-body talking avatars with simple human motion, extending this task to grounded human-object interaction (GHOI) remains an open challenge, requiring the avatar to perform text-aligned interactions with surrounding objects.
— Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars
(2602.01538 - Zhang et al., 2 Feb 2026) in Abstract