End-to-end controllable song generation with multi-condition inputs
Establish an end-to-end controllable song generation approach that jointly conditions on textual style descriptions, lyrics, and reference audio to guide the music synthesis process.
References
Furthermore, end-to-end controllable song generation jointly guided by style descriptions, lyrics, and reference audio remains an open challenge.
— HeartMuLa: A Family of Open Sourced Music Foundation Models
(2601.10547 - Yang et al., 15 Jan 2026) in Section 1 (Introduction)