Papers
Topics
Authors
Recent
Search
2000 character limit reached

Leveraging LLMs for Scalable Non-intrusive Speech Quality Assessment

Published 8 Aug 2025 in eess.AS | (2508.06284v1)

Abstract: Non-intrusive speech quality assessment (SQA) systems suffer from limited training data and costly human annotations, hindering their generalization to real-time conferencing calls. In this work, we propose leveraging LLMs as pseudo-raters for speech quality to address these data bottlenecks. We construct LibriAugmented, a dataset consisting of 101,129 speech clips with simulated degradations labeled by a fine-tuned auditory LLM (Vicuna-7b-v1.5). We compare three training strategies: using human-labeled data, using LLM-labeled data, and a two-stage approach (pretraining on LLM labels, then fine-tuning on human labels), using both DNSMOS Pro and DeePMOS. We test on several datasets across languages and quality degradations. While LLM-labeled training yields mixed results compared to human-labeled training, we provide empirical evidence that the two-stage approach improves the generalization performance (e.g., DNSMOS Pro achieves 0.63 vs. 0.55 PCC on NISQA_TEST_LIVETALK and 0.73 vs. 0.65 PCC on Tencent with reverb). Our findings demonstrate the potential of using LLMs as scalable pseudo-raters for speech quality assessment, offering a cost-effective solution to the data limitation problem.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.