LM2D: Lyrics- and Music-Driven Dance Synthesis
Abstract: Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on audio signals. In this work, we make two contributions to bridge this gap. First, we propose LM2D, a novel probabilistic architecture that incorporates a multimodal diffusion model with consistency distillation, designed to create dance conditioned on both music and lyrics in one diffusion generation step. Second, we introduce the first 3D dance-motion dataset that encompasses both music and lyrics, obtained with pose estimation technologies. We evaluate our model against music-only baseline models with objective metrics and human evaluations, including dancers and choreographers. The results demonstrate LM2D is able to produce realistic and diverse dance matching both lyrics and music. A video summary can be accessed at: https://youtu.be/4XCgvYookvA.
- Groovenet: Real-time music-driven dance movement generation using artificial neural networks. networks, 8(17):26, 2017.
- Listen, denoise, action! audio-driven motion synthesis with diffusion models. ACM Transactions on Graphics (TOG), 42(4):1–20, 2023.
- Choreomaster: choreography-oriented music-driven dance synthesis. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
- Diffusion-based co-speech gesture generation using joint text and audio representation. In Proceedings of the 25th International Conference on Multimodal Interaction, pages 755–762, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Motion capture from internet videos. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 210–227. Springer, 2020.
- Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics, 18(3):501–515, 2011.
- A bi-directional attention guided cross-modal network for music based dance generation. Computers and Electrical Engineering, 103:108310, 2022.
- Music content driven automated choreography with beat-wise motion connectivity constraints. Proceedings of SMC, pages 177–183, 2015.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Kimerer LaMothe. The dancing species: how moving together in time helps make us human. Aeon, June, 1:1, 2019.
- Music-driven group choreography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8673–8682, 2023.
- Music similarity-based approach to generating dance motion sequence. Multimedia tools and applications, 62:895–912, 2013.
- Learning to generate diverse dance motions with transformer. arXiv preprint arXiv:2008.08171, 2020.
- Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13401–13412, 2021.
- Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1272–1279, 2022.
- Magic: Multi art genre intelligent choreography dataset and network for 3d dance generation. arXiv preprint arXiv:2212.03741, 2022.
- Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6):1–16, 2015.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
- librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, pages 18–25, 2015.
- Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
- 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7753–7762, 2019.
- Hayley Elizabeth Powell. Modern dance choreography: Beyond the movement an analysis between lyrics and movement: Can identities be developed through modern dance choreography? Annual Review of Education, Communication & Language Sciences, 16(2), 2019.
- Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042, 2023.
- Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050–11059, 2022.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Consistency models. arXiv preprint arXiv:2303.01469, 2023.
- Real-time controllable motion transition for characters. ACM Transactions on Graphics (TOG), 41(4):1–10, 2022.
- Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.
- Edge: Editable dance generation from music. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 448–458, 2023.
- Aist dance video database: Multi-genre, multi-dancer, and multi-camera database for dance information processing. In ISMIR, volume 1, page 6, 2019.
- Transflower: probabilistic autoregressive dance generation with multimodal attention. ACM Transactions on Graphics (TOG), 40(6):1–14, 2021.
- Groupdancer: Music to multi-people dance synthesis with style collaboration. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1138–1146, 2022.
- Choreonet: Towards music to dance synthesis with choreographic action unit. In Proceedings of the 28th ACM International Conference on Multimedia, pages 744–752, 2020.
- Multimodal dance style transfer. Machine Vision and Applications, 34(4):1–14, 2023.
- Music-to-dance generation with multiple conformer. In Proceedings of the 2022 International Conference on Multimedia Retrieval, pages 34–38, 2022.
- T2m-gpt: Generating human motion from textual descriptions with discrete representations. arXiv preprint arXiv:2301.06052, 2023.
- On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
- Avatargpt: All-in-one framework for motion understanding, planning, generation and beyond. arXiv preprint arXiv:2311.16468, 2023.
- Motionbert: A unified perspective on learning human motion representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15085–15099, 2023.
- Music2dance: Dancenet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(2):1–21, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.