Papers
Topics
Authors
Recent
Search
2000 character limit reached

IANS: Intelligibility-aware Null-steering Beamforming for Dual-Microphone Arrays

Published 9 Jul 2023 in eess.AS and eess.SP | (2307.04179v1)

Abstract: Beamforming techniques are popular in speech-related applications due to their effective spatial filtering capabilities. Nonetheless, conventional beamforming techniques generally depend heavily on either the target's direction-of-arrival (DOA), relative transfer function (RTF) or covariance matrix. This paper presents a new approach, the intelligibility-aware null-steering (IANS) beamforming framework, which uses the STOI-Net intelligibility prediction model to improve speech intelligibility without prior knowledge of the speech signal parameters mentioned earlier. The IANS framework combines a null-steering beamformer (NSBF) to generate a set of beamformed outputs, and STOI-Net, to determine the optimal result. Experimental results indicate that IANS can produce intelligibility-enhanced signals using a small dual-microphone array. The results are comparable to those obtained by null-steering beamformers with given knowledge of DOAs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. R. Chakraborty and C. Nadeu, “Joint recognition and direction-of-arrival estimation of simultaneous meeting-room acoustic events,” in Proc. INTERSPEECH, 2013, pp. 2948–2952.
  2. “Microphone array post-filter based on spatially-correlated noise measurements for distant speech recognition,” in Proc. INTERSPEECH, 2012, pp. 298–301.
  3. “Unified architecture for multichannel end-to-end speech recognition with neural beamforming,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1274–1288, 2017.
  4. “Robust adaptive beamforming,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 10, pp. 1365–1376, 1987.
  5. “Study of the null directions on the performance of differential beamformers,” in Proc. ICASSP, 2022, pp. 4928–4932.
  6. Acoustic array systems: theory, implementation, and application, John Wiley & Sons, 2013.
  7. J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.
  8. “Acoustic feedback cancellation for a multi-microphone earpiece based on a null-steering beamformer,” in Proc. IWAENC, 2016, pp. 1–5.
  9. “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in Proc. ICASSP, 2002, pp. 881–884.
  10. “Beamforming based on null-steering with small spacing linear microphone arrays,” The Journal of the Acoustical Society of America, vol. 143, no. 5, pp. 2651–2665, 2018.
  11. R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, 1986.
  12. R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 7, pp. 984–995, 1989.
  13. “Robust DOA estimation of multiple speech sources,” in Proc. ICASSP, 2014, pp. 2287–2291.
  14. J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, Ph.D. thesis, Brown University, Providence, R.I., 2000.
  15. M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288–292, 1997.
  16. C. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.
  17. S. Markovich-Golan and S. Gannot, “Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method,” in Proc. ICASSP, 2015, pp. 544–548.
  18. “Integrating neural network based beamforming and weighted prediction error dereverberation.,” in Proc. INTERSPEECH, 2018, pp. 3043–3047.
  19. “Neural network based spectral mask estimation for acoustic beamforming,” in Proc. ICASSP, 2016, pp. 196–200.
  20. “Deep complex-valued neural beamformers,” in Proc. ICASSP, 2019, pp. 2902–2906.
  21. “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
  22. A. N. S. Institute S3.5-1997, “Methods for calculation of the speech intelligibility index,” American National Standards Institute (ANSI), 1997.
  23. “STOI-Net: A deep learning based non-intrusive speech intelligibility assessment model,” in Proc. APSIPA, 2020, pp. 482–486.
  24. “Beamforming for a source located in the interior of a sensor array,” in Proc. ISSPA, 1999, vol. 2, pp. 873–876.
  25. Y. Luo and N. Mesgarani, “Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256–1266, 2019.
  26. Y. Liu and D. Wang, “Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2092–2102, 2019.
  27. “DPCRN: Dual-path convolution recurrent network for single channel speech enhancement,” arXiv preprint arXiv:2107.05429, 2021.
  28. “MetricGAN+: An improved version of MetricGAN for speech enhancement,” arXiv preprint arXiv:2104.03538, 2021.
  29. “ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding,” arXiv preprint arXiv:2207.09514, 2022.
  30. O. L. Frost, “An algorithm for linearly constrained adaptive array processing,” Proceedings of the IEEE, vol. 60, no. 8, pp. 926–935, 1972.
  31. “New insights into the MVDR beamformer in room acoustics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 158–170, 2009.
  32. “Quality-net: An end-to-end non-intrusive speech quality assessment model based on blstm,” arXiv preprint arXiv:1808.05344, 2018.
  33. W. Huang and J. Feng, “Differential beamforming for uniform circular array with directional microphones,” in Proc. INTERSPEECH, 2020, pp. 71–75.
  34. “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in Proc. ICASSP, 2018, pp. 351–355.
  35. “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.
  36. D. B. Paul and J. Baker, “The design for the Wall Street Journal-based CSR corpus,” in Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, 1992, pp. 357–362.
  37. P. C. Loizou, Speech enhancement: theory and practice, CRC press, 2007.
  38. A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247–251, 1993.
  39. M.-W. Huang, “Development of Taiwan Mandarin hearing in noise test,” M.S. thesis, Department of speech language pathology and audiology, National Taipei University of Nursing and Health science, 2005.
  40. “The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings,” Proceedings of Meetings on Acoustics, vol. 19, no. 1, pp. 035081, 2013.
  41. “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proc. ICASSP, 2001, pp. 749–752.
  42. “PESQ (perceptual evaluation of speech quality) wrapper for python users,” May 2022.
  43. “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.