Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model

Published 21 Sep 2023 in eess.AS and cs.SD | (2309.11976v2)

Abstract: Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device. However, quality-based device selection for rooms with multiple recording devices may benefit from a multi-channel approach where the descriptive metrics are predicted for multiple devices in parallel. Following our hypothesis that a model may benefit from multi-channel training, we develop a multi-channel model for joint MOS and room acoustics prediction (MOSRA) for five channels in parallel. The lack of multi-channel audio data with ground truth labels necessitated the creation of simulated data using an acoustic simulator with room acoustic labels extracted from the generated impulse responses and labels for MOS generated in a student-teacher setup using a wav2vec2-based MOS prediction model. Our experiments show that the multi-channel model improves the prediction of the direct-to-reverberation ratio, clarity, and speech transmission index over the single-channel model with roughly 5$\times$ less computation while suffering minimal losses in the performance of the other metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. ITU-T Recommendation P.800, “Methods for objective and subjective assessment of quality,” 1996.
  2. ITU-T Recommendation P.808, “Subjective evaluation of speech quality with a crowdsourcing approach,” 2021.
  3. “Conferencingspeech 2022 challenge: Non-intrusive objective speech quality assessment (NISQA) challenge for online conferencing applications,” arXiv preprint arXiv:2203.16032, 2022.
  4. “NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets,” in Proc. INTERSPEECH 2021, 2021.
  5. “DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  6. “Blind reverberation time estimation using a convolutional neural network,” in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018.
  7. “eSTImate: A real-time speech transmission index estimator with speech enhancement auxiliary task using self-attention feature pyramid network,” in Proc. INTERSPEECH 2023, 2023.
  8. “Joint blind room acoustic characterization from speech and music signals using convolutional recurrent neural networks,” arXiv preprint arXiv:2010.11167, 2020.
  9. “A universal deep room acoustics estimator,” in 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021.
  10. “Efficient speech quality assessment using self-supervised framewise embeddings,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  11. “MOSRA: Joint mean opinion score and room acoustics speech quality assessment,” in Proc. INTERSPEECH 2022, 2022.
  12. “Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning,” in Proc. INTERSPEECH 2023, 2023.
  13. “The 1st clarity prediction challenge: A machine learning challenge for hearing aid intelligibility prediction,” in Proc. INTERSPEECH 2022, 2022.
  14. “End-to-end alexa device arbitration,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  15. “Picknet: Real-time channel selection for ad hoc microphone arrays,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  16. “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  17. Michael Vorländer, Auralization. Fundamentals of Acoustics, Modeling, Simulation, Algorithms and Acoustic Virtual Reality, Springer Publishing Company, Incorporated, 2008.
  18. “MLS: A large-scale multilingual dataset for speech research,” ArXiv, vol. abs/2012.03411, 2020.
  19. “ICASSP 2022 deep noise suppression challenge,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  20. “Evaluation of speech transmission channels by using artificial signals,” Acta Acustica united with Acustica, 1971.
  21. “Estimation of room acoustic parameters: The ACE challenge,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016.
  22. ISO/TC 43/SC 2 Building acoustics, “Acoustics — measurement of room acoustic parameters — part 1: Performance spaces,” 2009.
  23. “Adam: A method for stochastic optimization,” 2017.
  24. “Non-intrusive speech quality assessment with transfer learning and subject-specific scaling,” in Proc. INTERSPEECH 2021, 2021.
  25. “Universal speech enhancement with score-based diffusion,” arXiv preprint arXiv:2206.03065, 2022.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.