Papers
Topics
Authors
Recent
Search
2000 character limit reached

Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

Published 8 May 2024 in cs.MM | (2405.04963v1)

Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different views, audio signals, and detailed 3D motion annotations of the body, hands, instrument, and bow. Moreover, to acquire the detailed motion annotations, we propose an audio-guided multi-modal motion capture framework that explicitly incorporates hand-string contacts detected from the audio signals for solving detailed hand poses. This framework serves as a baseline for string performance capture in a completely markerless manner without imposing any external devices on performers, eliminating the potential of introducing distortion in such delicate movements. We argue that the movements of performers, particularly the sound-producing gestures, contain subtle information often elusive to visual methods but can be inferred and retrieved from audio cues. Consequently, we refine the vision-based motion capture results through our innovative audio-guided approach, simultaneously clarifying the contact relationship between the performer and the instrument, as deduced from the audio. We validate the proposed framework and conduct ablation studies to demonstrate its efficacy. Our results outperform current state-of-the-art vision-based algorithms, underscoring the feasibility of augmenting visual motion capture with audio modality. To the best of our knowledge, SPD is the first dataset for musical instrument performance, covering fine-grained hand motion details in a multi-modal, large-scale collection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Vision-based detection of acoustic timed events: a case study on clarinet note onsets. arXiv preprint arXiv:1706.09556 (2017).
  2. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291–7299.
  3. Capturing detailed deformations of moving human bodies. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–18.
  4. TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement. ICCV (2023).
  5. George ElKoura and Karan Singh. 2003. Handrix: animating the human hand. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation. 110–119.
  6. DART: Articulated hand model with diverse accessories and rich textures. Advances in Neural Information Processing Systems 35 (2022), 37055–37067.
  7. Olivier Gillet and Gaël Richard. 2006. Enst-drums: an extensive audio-visual database for drum signals processing. In International Society for Music Information Retrieval Conference (ISMIR).
  8. Characterizing movement fluency in musical performance: Toward a generic measure for technology enhanced learning. Frontiers in psychology 10 (2019), 84.
  9. Motion analysis of music ensembles with the Kinect. In Conference on New Interfaces for Musical Expression. 106–110.
  10. UmeTrack: Unified multi-view end-to-end hand tracking for VR. In SIGGRAPH Asia 2022 Conference Papers. 1–9.
  11. Multi-layer adaptation of group coordination in musical ensembles. Scientific reports 9, 1 (2019), 5854.
  12. Audio-Driven Violin Performance Animation with Clear Fingering and Bowing. In ACM SIGGRAPH 2022 Posters. 1–2.
  13. Bowing-Net: Motion Generation for String Instruments Based on Bowing Information. In ACM SIGGRAPH 2021 Posters. 1–2.
  14. Extracting coarse body movements from video in music performance: A comparison of automated computer vision techniques with motion capture data. Frontiers in Digital Humanities 4 (2017), 9.
  15. YOLO by Ultralytics. https://github.com/ultralytics/ultralytics
  16. Hsuan-Kai Kao and Li Su. 2020. Temporally guided music-to-body-movement generation. In Proceedings of the 28th ACM International Conference on Multimedia. 147–155.
  17. Neural network-based violinist’s hand animation. In Proceedings Computer Graphics International 2000. IEEE, 37–41.
  18. Crepe: A convolutional representation for pitch estimation. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 161–165.
  19. CG animation for piano performance. In SIGGRAPH’09: Posters. 1–1.
  20. Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications. IEEE Transactions on Multimedia 21, 2 (2018), 522–535.
  21. A survey on 3D hand pose estimation: Cameras, methods, and datasets. Pattern Recognition 93 (2019), 251–272.
  22. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019).
  23. Enriched multimodal representations of music performances: Online access and visualization. Ieee Multimedia 24, 1 (2017), 24–34.
  24. The sense of ensemble: a machine learning approach to expressive performance modelling in string quartets. Journal of New Music Research 43, 3 (2014), 303–317.
  25. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, 548–564.
  26. Recovering 3D hand mesh sequence from a single blurry image: A new dataset and temporal unfolding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 554–563.
  27. VOCAL: Vowel and Consonant Layering for Expressive Animator-Centric Singing Animation. In SIGGRAPH Asia 2022 Conference Papers. 1–9.
  28. Panagiotis Papiotis et al. 2016. A computational approach to studying interdependence in string quartet performance. Ph. D. Dissertation. Universitat Pompeu Fabra.
  29. Human gesture quantification: An evaluation tool for somatic training and piano performance. In 2014 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) Proceedings. IEEE, 100–105.
  30. Alfonso Perez-Carrillo. 2019. Finger-string interaction analysis in guitar playing with optical motion capture. Frontiers in Computer Science 1 (2019), 8.
  31. Estimation of guitar fingering and plucking controls based on multimodal analysis of motion, audio and musical score. In Music, Mind, and Embodiment: 11th International Symposium, CMMR 2015, Plymouth, UK, June 16-19, 2015, Revised Selected Papers 11. Springer, 71–87.
  32. Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint arXiv:2201.02610 (2022).
  33. Assessing the effects of a primary control impairment on the cellists’ bowing gesture inducing harsh sounds. IEEE Access 6 (2018), 43683–43695.
  34. Erwin Schoonderwaldt and Matthias Demoucron. 2009. Extraction of bowing parameters from violin performance combining motion capture and sensors. The Journal of the Acoustical Society of America 126, 5 (2009), 2695–2708.
  35. AIMusicGuru: Music Assisted Human Pose Correction. arXiv preprint arXiv:2203.12829 (2022).
  36. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1145–1153.
  37. Motion analysis of clarinet performers. Journal of New Music Research 44, 2 (2015), 97–111.
  38. Interpersonal Coordination in Dyadic Performance. In The Routledge Companion to Embodied Music Interaction. Routledge, 186–194.
  39. Feature extraction and expertise analysis of pianists’ Motion-Captured Finger Gestures. In ICMC.
  40. Helping hand: an anatomically accurate inverse dynamics solution for unconstrained hand motion. In Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation. 319–328.
  41. A multimodal corpus for technology-enhanced learning of violin playing. In Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter. 1–5.
  42. Marcelo M Wanderley. 2022. The Oxford Handbook of Music Performance. Vol. 2. Oxford University Press. 465–494 pages.
  43. State of the art in hand and finger modeling and animation. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 735–760.
  44. Effective whole-body pose estimation with two-stages distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4210–4220.
  45. Diana Young and Anagha Deshmane. 2007. Bowstroke database: a web-accessible archive of violin bowing data. In Proceedings of the 7th international conference on New interfaces for musical expression. 352–357.
  46. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
  47. Hand Pose Estimation with Mems-Ultrasonic Sensors. In SIGGRAPH Asia 2023 Conference Papers. 1–11.
  48. OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15112–15121.
  49. CCOM-HuQin: an Annotated Multimodal Chinese Fiddle Performance Dataset. arXiv preprint arXiv:2209.06496 (2022).
  50. Deep learning-based human pose estimation: A survey. Comput. Surveys 56, 1 (2023), 1–37.
  51. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5745–5753.
  52. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on mathematical software (TOMS) 23, 4 (1997), 550–560.
  53. A system for automatic animation of piano performances. Computer Animation and Virtual Worlds 24, 5 (2013), 445–457.
  54. Contrastive representation learning for hand shape estimation. In DAGM German Conference on Pattern Recognition. Springer, 250–264.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.