iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation
Abstract: In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to increase the efficiency of user-defined keyword spotting models, achieving a remarkable 95% reduction in mean absolute error with a minimal increase in model size (0.13%) compared to the baseline model, PhonMatchNet. We also present an efficient model structure and demonstrate its capability to learn iAEC functionality without requiring a clean signal. The findings of our study indicate that the proposed model achieves competitive performance in real-world deployment conditions of smart devices.
- “Ica-based efficient blind dereverberation and echo cancellation method for barge-in-able robot audition,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009, pp. 3677–3680.
- “Development of a robot quizmaster with auditory functions for speech-based multiparty interaction,” in 2014 IEEE/SICE International Symposium on System Integration. IEEE, 2014, pp. 328–333.
- “A study for improving device-directed speech detection toward frictionless human-machine interaction.,” in INTERSPEECH, 2019, pp. 3342–3346.
- “Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 7310–7314.
- “Acoustic echo canceller with high speech quality,” in ICASSP’87. IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 1987, vol. 12, pp. 2125–2128.
- “Study of the general kalman filter for echo cancellation,” IEEE transactions on audio, speech, and language processing, vol. 21, no. 8, pp. 1539–1549, 2013.
- “Deep learning for joint acoustic echo and noise cancellation with nonlinear distortions.,” in Interspeech, 2019, pp. 4255–4259.
- “Acoustic Echo Cancellation with the Dual-Signal Transformation LSTM Network,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7138–7142.
- “Low-complexity acoustic echo cancellation with neural kalman filtering,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Icassp 2021 acoustic echo cancellation challenge: Datasets, testing framework, and results,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 151–155.
- “Icassp 2022 acoustic echo cancellation challenge,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9107–9111.
- “Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 1052–1058.
- “Query-by-example keyword spotting using long short-term memory networks,” in Proc. ICASSP 2015, 2015, pp. 5236–5240.
- “Donut: Ctc-based query-by-example keyword spotting,” NeurIPS 2018 - Workshop on Interpretability and Robustness in Audio, Speech, and Language (IRASL), 2018.
- “Query-by-example keyword spotting system using multi-head attention and soft-triple loss,” in Proc. ICASSP 2021, 2021, pp. 6858–6862.
- “Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting,” in Proc. Interspeech 2022, 2022, pp. 1871–1875.
- “PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords,” in Proc. INTERSPEECH 2023, 2023, pp. 3964–3968.
- “Flexible keyword spotting based on homogeneous audio-text embedding,” arXiv preprint arXiv:2308.06472, 2023.
- “Training keyword spotters with limited and synthesized speech data,” in Proc. ICASSP 2020, 2020, pp. 7474–7478.
- Pete Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” arXiv preprint arXiv:1804.03209, 2018.
- “Query-by-example on-device keyword spotting,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 532–538.
- “MUSAN: A Music, Speech, and Noise Corpus,” 2015, arXiv:1510.08484v1.
- “A scalable noisy speech dataset and online subjective test framework,” Proc. Interspeech 2019, pp. 1816–1820, 2019.
- “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 351–355.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.