GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion
Abstract: In this work, we tackle the challenging problem of denoising hand-object interactions (HOI). Given an erroneous interaction sequence, the objective is to refine the incorrect hand trajectory to remove interaction artifacts for a perceptually realistic sequence. This challenge involves intricate interaction noise, including unnatural hand poses and incorrect hand-object relations, alongside the necessity for robust generalization to new interactions and diverse noise patterns. We tackle those challenges through a novel approach, GeneOH Diffusion, incorporating two key designs: an innovative contact-centric HOI representation named GeneOH and a new domain-generalizable denoising scheme. The contact-centric representation GeneOH informatively parameterizes the HOI process, facilitating enhanced generalization across various HOI scenarios. The new denoising scheme consists of a canonical denoising model trained to project noisy data samples from a whitened noise space to a clean data manifold and a "denoising via diffusion" strategy which can handle input trajectories with various noise patterns by first diffusing them to align with the whitened noise space and cleaning via the canonical denoiser. Extensive experiments on four benchmarks with significant domain variations demonstrate the superior effectiveness of our method. GeneOH Diffusion also shows promise for various downstream applications. Project website: https://meowuu7.github.io/GeneOH-Diffusion/.
- Learning character-agnostic motion for motion retargeting in 2d. arXiv preprint arXiv:1905.01680, 2019.
- Generalizing from several related classification tasks to a new unlabeled sample. Advances in neural information processing systems, 24, 2011.
- Improving diffusion models for inverse problems using manifold constraints. arXiv preprint arXiv:2206.00941, 2022.
- Hand motion from 3d point trajectories and a smooth surface model. In European Conference on Computer Vision, pp. 495–507. Springer, 2004.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Domain generalization via model-agnostic learning of semantic features. Advances in Neural Information Processing Systems, 32, 2019.
- ARCTIC: A dataset for dexterous bimanual hand-object manipulation. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Imos: Intent-driven full-body motion synthesis for human-object interactions. In Computer Graphics Forum, volume 42, pp. 1–12. Wiley Online Library, 2023.
- Contactopt: Optimizing contact to improve grasps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1471–1481, 2021.
- Visually plausible human-object interaction capture from wearable sensors. arXiv preprint arXiv:2205.02830, 2022.
- Lightweight palm and finger tracking for real-time 3d gesture control. In 2011 IEEE Virtual Reality Conference, pp. 19–26. IEEE, 2011.
- Honnotate: A method for 3d annotation of hand and object poses. In CVPR, 2020.
- Real-time motion retargeting to highly varied user-created morphologies. ACM Transactions on Graphics (TOG), 27(3):1–11, 2008.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Diffusion-based generation, optimization, and planning in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16750–16761, 2023.
- Transferability in deep learning: A survey. arXiv preprint arXiv:2201.05867, 2022.
- Virtual object manipulation on a table-top ar environment. In Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000), pp. 111–119. Ieee, 2000.
- H2o: Two hands manipulating objects for first person interaction recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10138–10148, 2021.
- Task-oriented human-object interactions generation with implicit neural representations. arXiv preprint arXiv:2303.13129, 2023.
- Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21013–21022, 2022.
- A variational perspective on solving inverse problems with diffusion models. arXiv preprint arXiv:2305.04391, 2023.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2021.
- Domain generalization via invariant feature representation. In International conference on machine learning, pp. 10–18. PMLR, 2013.
- Johnny Núñez. Comparison of Spatio-Temporal Hand Pose Denoising Models. PhD thesis, Universitat DE Barcelona, 2022.
- Virtual object manipulation by combining touch and head interactions for mobile augmented reality. Applied Sciences, 9(14):2933, 2019.
- Novel-view synthesis and pose estimation for hand-object interaction from sparse views. arXiv preprint arXiv:2308.11198, 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint arXiv:2201.02610, 2022.
- Batch normalization embeddings for deep domain generalization. Pattern Recognition, 135:109115, 2023.
- Tangible user interfaces: past, present, and future directions. Foundations and Trends® in Human–Computer Interaction, 3(1–2):4–137, 2010.
- Domain adversarial neural networks for domain generalization: When it works and how to improve. Machine Learning, pp. 1–37, 2023.
- Solving inverse problems with latent diffusion models via hard data consistency. arXiv preprint arXiv:2307.08123, 2023.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Grab: A dataset of whole-body human grasping of objects. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp. 581–600. Springer, 2020.
- A physically-based motion retargeting filter. ACM Transactions on Graphics (TOG), 24(1):98–117, 2005.
- Flex: Full-body grasping without full-body grasps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21179–21189, 2023.
- Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.
- Pose-ndf: Modeling human pose manifolds with neural distance fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V, pp. 572–589. Springer, 2022.
- Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1921–1930, 2023.
- Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
- Sharpness-aware gradient matching for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3769–3778, 2023.
- Saga: Stochastic whole-body grasping with contact. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI, pp. 257–274. Springer, 2022.
- Chore: Contact, human and object reconstruction from a single rgb image. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pp. 125–145. Springer, 2022.
- Cpf: Learning a contact potential field to model the hand-object interaction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11097–11106, 2021.
- Diffusion-guided reconstruction of everyday hand-object interaction clips. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19717–19728, 2023.
- Manipnet: neural manipulation synthesis with a hand-object spatial representation. ACM Transactions on Graphics (ToG), 40(4):1–14, 2021.
- Federated domain generalization with generalization adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3954–3963, 2023.
- The wanderings of odysseus in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20481–20491, 2022.
- Domain generalization with mixstyle. In ICLR, 2021a.
- Stgae: Spatial-temporal graph auto-encoder for hand motion denoising. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 41–49. IEEE, 2021b.
- Toch: Spatio-temporal object correspondence to hand for motion refinement. In European Conference on Computer Vision (ECCV). Springer, October 2022.
- Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.