Exploration of Adapter for Noise Robust Automatic Speech Recognition
Abstract: Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial. Integrating adapters into neural networks has emerged as a potent technique for transfer learning. This study thoroughly investigates adapter-based ASR adaptation in noisy environments. We conducted experiments using the CHiME--4 dataset. The results show that inserting the adapter in the shallow layer yields superior effectiveness, and there is no significant difference between adapting solely within the shallow layer and adapting across all layers. The simulated data helps the system to improve its performance under real noise conditions. Nonetheless, when the amount of data is the same, the real data is more effective than the simulated data. Multi-condition training is still useful for adapter training. Furthermore, integrating adapters into speech enhancement-based ASR systems yields substantial improvements.
- “An Overview of Noise-Robust Automatic Speech Recognition,” IEEE/ACM TASLP, vol. 22, no. 4, pp. 745–777, 2014.
- “Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition,” in Proc. ICASSP, 2018, pp. 5884–5888.
- “Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR,” in Proc. LVA/ICA, 2015, pp. 91–99.
- “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” in Proc. NIPS, 2020, vol. 33, pp. 12449–12460.
- “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units,” IEEE/ACM TASLP, vol. 29, pp. 3451–3460, 2021.
- “Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1242–1252, 2022.
- “Data Augmentation for Deep Neural Network Acoustic Modeling,” IEEE/ACM TASLP, vol. 23, no. 9, pp. 1469–1477, 2015.
- “Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition,” in Proc. APSIPA ASC, 2021, pp. 438–442.
- “A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers,” in Proc. ICASSP, 2020, pp. 6219–6223.
- “Domain Adversarial Training for Accented Speech Recognition,” in Proc. ICASSP, 2018, pp. 4854–4858.
- “Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition,” in Proc. ICASSP, 2023, pp. 1–5.
- “Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition,” in Proc. ICASSP, 2022, pp. 7102–7106.
- “Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition,” IEEE/ACM TASLP, vol. 30, pp. 2842–2853, 2022.
- “Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning,” in Proc. Machine Learning Research, 2023, vol. 202, pp. 40475–40487.
- “From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition,” in Proc. ICASSP, 2023, pp. 1–5.
- “Contextual Adapters for Personalized Speech Recognition in Neural Transducers,” in Proc. ICASSP, 2022, pp. 8537–8541.
- “NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR,” in Proc. SLT, 2023, pp. 190–196.
- “FindAdaptNet: Find and Insert Adapters by Learned Layer Importance,” in Proc. ICASSP, 2023, pp. 1–5.
- “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. Interspeech, 2020, pp. 5036–5040.
- “Revisiting Recurrent Neural Networks for robust ASR,” in Proc. ICASSP, 2012, pp. 4085–4088.
- “Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR,” IEEE/ACM TASLP, vol. 28, pp. 1778–1787, 2020.
- “Exploring multi-channel features for denoising-autoencoder-based speech enhancement,” in Proc. ICASSP, 2015, pp. 116–120.
- “ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration,” in Proc. SLT, 2021, pp. 785–792.
- “Learning multiple visual domains with residual adapters,” in Advances in Neural Information Processing Systems, 2017, vol. 30.
- “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv, 2021.
- “Librispeech: An ASR corpus based on public domain audio books,” in Proc. ICASSP, 2015, pp. 5206–5210.
- “MUSAN: A Music, Speech, and Noise Corpus,” 2015.
- “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech, 2019, pp. 2613–2617.
- “Real Time Speech Enhancement in the Waveform Domain,” in Proc. Interspeech, 2020, pp. 3291–3295.
- “What does a network layer hear? analyzing hidden representations of end-to-end asr through speech synthesis,” in Proc. ICASSP, 2020, pp. 6434–6438.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.