Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient infusion of self-supervised representations in Automatic Speech Recognition

Published 19 Apr 2024 in cs.CL | (2404.12628v1)

Abstract: Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some approaches suggest incorporating these models as a trainable encoder or a learnable frontend, training such systems is extremely slow and requires a lot of computation cycles. In this work, we propose two simple approaches that use (1) framewise addition and (2) cross-attention mechanisms to efficiently incorporate the representations from the SSL model(s) into the ASR architecture, resulting in models that are comparable in size with standard encoder-decoder conformer systems while also avoiding the usage of SSL models during training. Our approach results in faster training and yields significant performance gains on the Librispeech and Tedlium datasets compared to baselines. We further provide detailed analysis and ablation studies that demonstrate the effectiveness of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. wav2vec 2.0: A framework for self-supervised learning of speech representations, 2020.
  2. Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition, 2021.
  3. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  4. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, page 369–376, New York, NY, USA, 2006. Association for Computing Machinery. ISBN 1595933832. doi: 10.1145/1143844.1143891. URL https://doi.org/10.1145/1143844.1143891.
  5. Conformer: Convolution-augmented transformer for speech recognition, 2020.
  6. Hubert: Self-supervised speech representation learning by masked prediction of hidden units, 2021.
  7. Cross-modal distillation with audio–text fusion for fine-grained emotion classification using bert and wav2vec 2.0. Neurocomputing, 506:168–183, 2022. ISSN 0925-2312. doi: https://doi.org/10.1016/j.neucom.2022.07.035. URL https://www.sciencedirect.com/science/article/pii/S0925231222008931.
  8. Joint ctc-attention based end-to-end speech recognition using multi-task learning, 2017.
  9. Accent-robust automatic speech recognition using supervised and unsupervised wav2vec embeddings, 2021.
  10. Iiith-cstd corpus: Crowdsourced strategies for the collection of a large-scale telugu speech corpus. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(7):1–26, 2023.
  11. Librispeech: an asr corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 5206–5210. IEEE, 2015.
  12. Ted-lium: an automatic speech recognition dedicated corpus. In Conference on Language Resources and Evaluation (LREC), pages 125–129, 2012.
  13. Long-range acoustic detection and localization of blue whale calls in the northeast Pacific Ocean. The Journal of the Acoustical Society of America, 104(6):3616–3625, 12 1998. ISSN 0001-4966. doi: 10.1121/1.423944. URL https://doi.org/10.1121/1.423944.
  14. Attention is all you need, 2023.
  15. ESPnet: End-to-end speech processing toolkit. In Proceedings of Interspeech, pages 2207–2211, 2018. doi: 10.21437/Interspeech.2018-1456. URL http://dx.doi.org/10.21437/Interspeech.2018-1456.
  16. Leaf: A learnable frontend for audio classification, 2021.
  17. Multi-level fusion of wav2vec 2.0 and bert for multimodal emotion recognition, 2022.
  18. Incorporating bert into neural machine translation, 2020.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.