Papers
Topics
Authors
Recent
Search
2000 character limit reached

SingIt! Singer Voice Transformation

Published 7 May 2024 in eess.AS and cs.SD | (2405.04627v1)

Abstract: In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning. Our goal is to give anyone the opportunity to sing any song in a timely manner. We present a system comprising several available blocks, as well as a modified auto-encoder, and show how this highly-complex challenge can be achieved by tailoring rather simple solutions together. We demonstrate the applicability of the proposed system using a group of 25 non-expert listeners. Samples of the data generated from our model are provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. B. L. Sturm, O. Ben-Tal, Ú. Monaghan, N. Collins, D. Herremans, E. Chew, G. Hadjeres, E. Deruty, and F. Pachet, “Machine learning research that matters for music creation: A case study,” Journal of New Music Research, vol. 48, no. 1, pp. 36–55, 2019.
  2. R. Fiebrink and B. Caramiaux, “The machine learning algorithm as creative musical tool,” in The Oxford handbook of algorithmic music, A. Mclean and R. T. Dean, Eds.   Oxford University Press, 2018, pp. 181–208.
  3. M. Blaauw and J. Bonada, “A neural parametric singing synthesizer modeling timbre and expression from natural songs,” Applied Sciences, vol. 7, no. 12, p. 1313, 2017.
  4. X. Zhuang, T. Jiang, S.-Y. Chou, B. Wu, P. Hu, and S. Lui, “Litesing: Towards fast, lightweight and expressive singing voice synthesis,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7078–7082.
  5. P. Chandna, M. Blaauw, J. Bonada, and E. Gomez, “WGANSing: A multi-voice singing voice synthesizer based on the wasserstein-GAN,” in 2019 27th European Signal Processing Conference (EUSIPCO).   IEEE, sep 2019. [Online]. Available: https://doi.org/10.239192Feusipco.2019.8903099
  6. K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson, “AutoVC: Zero-shot voice style transfer with only autoencoder loss,” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019, pp. 5210–5219.
  7. R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam, “Spleeter: a fast and efficient music source separation tool with pre-trained models,” Journal of Open Source Software, vol. 5, 2020.
  8. L. Ansal and C. Ancy, “Research on DNN methods in music source separation tools with emphasis to Spleeter,” International Research Journal on Advanced Science Hub, vol. 3, no. Special Issue 6S, pp. 24–28, 2021.
  9. L. Wan, Q. Wang, A. Papir, and I. L. Moreno, “Generalized end-to-end loss for speaker verification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4879–4883.
  10. J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomvrgiannakis, and Y. Wu, “Natural TTS synthesis by conditioning Wavenet on mel spectrogram predictions,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4779–4783.
  11. N. Perraudin, P. Balazs, and P. L. Søndergaard, “A fast Griffin-Lim algorithm,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013.
  12. J. Kong, J. Kim, and J. Bae, “Hifi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis,” Advances in Neural Information Processing Systems, vol. 33, pp. 17 022–17 033, 2020.
  13. L. Sheng, D.-Y. Huang, and E. N. Pavlovskiy, “High-quality speech synthesis using super-resolution mel-spectrogram,” arXiv preprint arXiv:1912.01167, 2019.
  14. L. Wyse, “Audio spectrogram representations for processing with convolutional neural networks,” in Proceedings of the First International Conference on Deep Learning and Music, 2017, pp. 37–41.
  15. B. Sharma, X. Gao, K. Vijayan, X. Tian, and H. Li, “NHSS: A Speech and Singing Parallel Database,” Speech Commun., vol. 133, no. C, pp. 9––22, Oct. 2021.
  16. M. K. Veaux Christophe, Yamagishi Junichi, “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit,” University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 6 likes about this paper.