Papers
Topics
Authors
Recent
Search
2000 character limit reached

StreamVC: Real-Time Low-Latency Voice Conversion

Published 5 Jan 2024 in eess.AS, cs.LG, and cs.SD | (2401.03078v1)

Abstract: We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Kou Tanaka Nobukatsu Hojo Takuhiro Kaneko, Hirokazu Kameoka, “StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion,” in INTERSPEECH 2019.
  2. “AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss,” in ICLR 2019.
  3. “VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion,” in INTERSPEECH 2021.
  4. “One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization,” in INTERSPEECH 2019.
  5. “Streaming Voice Conversion via Intermediate Bottleneck Features and Non-Streaming Teacher Guidance,” in ICASSP 2023.
  6. “Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling,” IEEE/ACM Transactions on ASLP, 2021.
  7. “A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion,” in ICASSP 2022.
  8. “FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion,” in ICASSP 2023.
  9. “QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion,” in arXiv:2302.08296; 2023/02/23.
  10. “Any-to-Any Voice Conversion with F0 and Timbre Disentanglement and Novel Timbre Conditioning,” in ICASSP 2023.
  11. “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units,” IEEE/ACM Transactions on ASLP, 2021.
  12. “WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing,” IEEE Journal of Selected Topics in Signal Processing, 2022.
  13. “Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” in ICML 2021.
  14. “HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis,” in NeurIPS 2020.
  15. “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis,” in NeurIPS 2019. Curran Associates, Inc.
  16. “SoundStream: An End-to-End Neural Audio Codec,” IEEE/ACM Transactions on ASLP, 2022.
  17. “FiLM: Visual Reasoning with a General Conditioning Layer,” in AAAI 2018.
  18. “YIN, a fundamental frequency estimator for speech and music,” Journal of the Acoustical Society of America, 2002.
  19. “Real-Time Speech Frequency Bandwidth Extension,” in ICASSP 2021.
  20. “Streaming Keyword Spotting on Mobile Devices,” in INTERSPEECH 2020.
  21. “XNNPACK,” https://github.com/google/XNNPACK, Accessed: 2023-08-30.
  22. “Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme,” in ICLR 2022.
  23. “LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech,” INTERSPEECH 2019.
  24. “DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors,” in ICASSP 2021.
Citations (9)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.