ASR4REAL: An extended benchmark for speech models

Published 16 Oct 2021 in eess.AS, cs.AI, cs.CL, cs.LG, and cs.SD | (2110.08583v1)

Abstract: Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching real-life conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models show a strong performance drop when tested on conversational speech, and in this precise context even a LLM trained on a dataset as big as Common Crawl does not seem to have significant positive effect which reiterates the importance of developing conversational LLMs