Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Abstract: We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score.
- “An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning” In IEEE/ACM TASLP 29, 2021, pp. 132–157
- “Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia” In Proc. Interspeech, 2021, pp. 4833–4837 DOI: 10.21437/Interspeech.2021-697
- “Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation” In Proc. Interspeech, 2019
- “Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech” In Proc. ICASSP, 2021, pp. 6988–6992 DOI: 10.1109/ICASSP39728.2021.9414644
- Mark I. Singer and Eric D. Blom “An Endoscopic Technique for Restoration of Voice after Laryngectomy” PMID: 7458140 In Annals of Otology, Rhinology & Laryngology 89.6, 1980, pp. 529–533 DOI: 10.1177/000348948008900608
- “Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling” In Proc. ASRU, 2021, pp. 650–657 DOI: 10.1109/ASRU51503.2021.9687908
- “Attention is all you need” In Proc. NeurIPS 30, 2017
- “Pretraining Techniques for Sequence-to-Sequence Voice Conversion” In IEEE/ACM TASLP 29, 2021, pp. 745–755
- “Conformer Parrotron: a Faster and Stronger End-to-end Speech Conversion and Recognition Model for Atypical Speech” In Proc. Interspeech, 2021
- “Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion” In Proc. SLT, 2023, pp. 949–954 IEEE
- “Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling” In IEEE/ACM TASLP 29, 2021, pp. 1717–1728 DOI: 10.1109/TASLP.2021.3076867
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units” In IEEE/ACM TASLP 29 IEEE, 2021, pp. 3451–3460
- “A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion” In Proc. ICASSP, 2022
- Jonathan Ho, Ajay Jain and Pieter Abbeel “Denoising diffusion probabilistic models” In Proc. NeurIPS 33, 2020, pp. 6840–6851
- “Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion” In Proc. Interspeech 2021, 2021, pp. 4813–4817 DOI: 10.21437/Interspeech.2021-285
- “Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition” In Proc. ICASSP, 2023 DOI: 10.1109/ICASSP49357.2023.10095931
- Lester Phillip Violeta and Tomoki Toda “An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing” In Proc. APSIPA, 2023
- “Personalizing ASR for Dysarthric and Accented Speech with Limited Data” In Proc. Interspeech, 2019, pp. 784–788
- “Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases” In Proc. Interspeech, 2021, pp. 4778–4782
- “Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets” In Proc. ICASSP, 2022, pp. 6637–6641 DOI: 10.1109/ICASSP43922.2022.9747516
- Suyoun Kim, Takaaki Hori and Shinji Watanabe “Joint CTC-attention based end-to-end speech recognition using multi-task learning” In Proc. ICASSP, 2017, pp. 4835–4839
- Takuma Okamoto, Yoshinori Shiga and Hisashi Kawai “Hi-Fi-CAPTAIN: High-fidelity and high-capacity conversational speech synthesis corpus developed by NICT”, 2023
- Sungwon Kim, Heeseung Kim and Sungroh Yoon “Guided-TTS 2: A diffusion model for high-quality adaptive text-to-speech with untranscribed data” In arXiv preprint arXiv:2205.15370, 2022
- “Classifier-free diffusion guidance” In Proc. NeurIPS, 2021
- Jungil Kong, Jaehyeon Kim and Jaekyoung Bae “HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis” In Proc. NeurIPS, 2020, pp. 17022–17033
- “Construction of a Large-Scale Japanese ASR Corpus on TV Recordings” In Proc. ICASSP, 2021, pp. 6948–6952 DOI: 10.1109/ICASSP39728.2021.9413425
- Ryosuke Sonobe, Shinnosuke Takamichi and Hiroshi Saruwatari “JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis” arXiv, 2017 DOI: 10.48550/ARXIV.1711.00354
- “JVS corpus: free Japanese multi-speaker voice corpus” In arXiv preprint arXiv:1908.06248, 2019
- J.Yamagishi C.Veaux and K.MacDonald “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit” In University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2012 URL: %5Curl%7Bhttp://dx.doi.org/10.7488/ds/1994%7D
- “Conformer: Convolution-augmented Transformer for Speech Recognition” In Proc. Interspeech, 2020, pp. 5036–5040 DOI: 10.21437/Interspeech.2020-3015
- “Diffsinger: Singing voice synthesis via shallow diffusion mechanism” In arXiv preprint arXiv:2105.02446 2, 2021
- “AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data”, 2021 arXiv:2104.09715 [cs.SD]
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.