Papers
Topics
Authors
Recent
Search
2000 character limit reached

Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

Published 31 Mar 2024 in eess.AS, cs.LG, and cs.SD | (2404.07226v1)

Abstract: The Fearless Steps APOLLO Community Resource provides unparalleled opportunities to explore the potential of multi-speaker team communications from NASA Apollo missions. This study focuses on discovering the characteristics that make Apollo recordings more or less intelligible to Automatic Speech Recognition (ASR) methods. We extract, for each audio recording, interpretable metadata on recordings (signal-to-noise ratio, spectral flatness, presence of pauses, sentence duration), transcript (number of words spoken, speaking rate), or known a priori (speaker). We identify subgroups of audio recordings based on combinations of these metadata and compute each subgroup's performance (e.g., Word Error Rate) and the difference in performance (''divergence'') w.r.t the overall population. We then apply the Whisper model in different sizes, trained on English-only or multilingual datasets, in zero-shot or after fine-tuning. We conduct several analyses to (i) automatically identify and describe the most problematic subgroups for a given model, (ii) examine the impact of fine-tuning w.r.t. zero-shot at the subgroup level, (iii) understand the effect of model size on subgroup performance, and (iv) analyze if multilingual models are more sensitive than monolingual to subgroup performance disparities. The insights enhance our understanding of subgroup-specific performance variations, paving the way for advancements in optimizing ASR systems for Earth-to-space communications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (6)
  1. J. H. Hansen, A. Joglekar, S.-J. Chen, M. C. Shekar, and C. Belitz, “Fearless steps apollo: Advanced naturalistic corpora development,” in Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022, 2022.
  2. E. Pastor, L. De Alfaro, and E. Baralis, “Looking for trouble: Analyzing classifier behavior via pattern divergence,” in Proceedings of the 2021 International Conference on Management of Data, 2021, pp. 1400–1412.
  3. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning.   PMLR, 2023, pp. 28 492–28 518.
  4. A. Koudounas, F. Giobergia, and E. Baralis, “Bad exoplanet! explaining degraded performance when reconstructing exoplanets atmospheric parameters,” in NeurIPS 2023 AI for Science Workshop, 2023. [Online]. Available: https://openreview.net/forum?id=9Z4XZOhwiz
  5. A. Koudounas, E. Pastor, G. Attanasio, V. Mazzia, M. Giollo, T. Gueudre, L. Cagliero, L. de Alfaro, E. Baralis, and D. Amberti, “Exploring subgroup performance in end-to-end speech models,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  6. A. Koudounas, E. Pastor, G. Attanasio, V. Mazzia, M. Giollo, T. Gueudre, E. Reale, L. Cagliero, S. Cumani, L. de Alfaro, E. Baralis, and D. Amberti, “Towards comprehensive subgroup performance analysis in speech models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 7 likes about this paper.