Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model
Abstract: Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device. However, quality-based device selection for rooms with multiple recording devices may benefit from a multi-channel approach where the descriptive metrics are predicted for multiple devices in parallel. Following our hypothesis that a model may benefit from multi-channel training, we develop a multi-channel model for joint MOS and room acoustics prediction (MOSRA) for five channels in parallel. The lack of multi-channel audio data with ground truth labels necessitated the creation of simulated data using an acoustic simulator with room acoustic labels extracted from the generated impulse responses and labels for MOS generated in a student-teacher setup using a wav2vec2-based MOS prediction model. Our experiments show that the multi-channel model improves the prediction of the direct-to-reverberation ratio, clarity, and speech transmission index over the single-channel model with roughly 5$\times$ less computation while suffering minimal losses in the performance of the other metrics.
- ITU-T Recommendation P.800, “Methods for objective and subjective assessment of quality,” 1996.
- ITU-T Recommendation P.808, “Subjective evaluation of speech quality with a crowdsourcing approach,” 2021.
- “Conferencingspeech 2022 challenge: Non-intrusive objective speech quality assessment (NISQA) challenge for online conferencing applications,” arXiv preprint arXiv:2203.16032, 2022.
- “NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets,” in Proc. INTERSPEECH 2021, 2021.
- “DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- “Blind reverberation time estimation using a convolutional neural network,” in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018.
- “eSTImate: A real-time speech transmission index estimator with speech enhancement auxiliary task using self-attention feature pyramid network,” in Proc. INTERSPEECH 2023, 2023.
- “Joint blind room acoustic characterization from speech and music signals using convolutional recurrent neural networks,” arXiv preprint arXiv:2010.11167, 2020.
- “A universal deep room acoustics estimator,” in 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021.
- “Efficient speech quality assessment using self-supervised framewise embeddings,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
- “MOSRA: Joint mean opinion score and room acoustics speech quality assessment,” in Proc. INTERSPEECH 2022, 2022.
- “Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning,” in Proc. INTERSPEECH 2023, 2023.
- “The 1st clarity prediction challenge: A machine learning challenge for hearing aid intelligibility prediction,” in Proc. INTERSPEECH 2022, 2022.
- “End-to-end alexa device arbitration,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- “Picknet: Real-time channel selection for ad hoc microphone arrays,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
- Michael Vorländer, Auralization. Fundamentals of Acoustics, Modeling, Simulation, Algorithms and Acoustic Virtual Reality, Springer Publishing Company, Incorporated, 2008.
- “MLS: A large-scale multilingual dataset for speech research,” ArXiv, vol. abs/2012.03411, 2020.
- “ICASSP 2022 deep noise suppression challenge,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- “Evaluation of speech transmission channels by using artificial signals,” Acta Acustica united with Acustica, 1971.
- “Estimation of room acoustic parameters: The ACE challenge,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016.
- ISO/TC 43/SC 2 Building acoustics, “Acoustics — measurement of room acoustic parameters — part 1: Performance spaces,” 2009.
- “Adam: A method for stochastic optimization,” 2017.
- “Non-intrusive speech quality assessment with transfer learning and subject-specific scaling,” in Proc. INTERSPEECH 2021, 2021.
- “Universal speech enhancement with score-based diffusion,” arXiv preprint arXiv:2206.03065, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.