Papers
Topics
Authors
Recent
Search
2000 character limit reached

Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study

Published 6 Jul 2025 in eess.AS | (2507.04368v1)

Abstract: Advanced long-context modeling backbone networks, such as Transformer, Conformer, and Mamba, have demonstrated state-of-the-art performance in speech enhancement. However, a systematic and comprehensive comparative study of these backbones within a unified speech enhancement framework remains lacking. In addition, xLSTM, a more recent and efficient variant of LSTM, has shown promising results in language modeling and as a general-purpose vision backbone. In this paper, we investigate the capability of xLSTM in speech enhancement, and conduct a comprehensive comparison and analysis of the Transformer, Conformer, Mamba, and xLSTM backbones within a unified framework, considering both causal and noncausal configurations. Overall, xLSTM and Mamba achieve better performance than Transformer and Conformer. Mamba demonstrates significantly superior training and inference efficiency, particularly for long speech inputs, whereas xLSTM suffers from the slowest processing speed.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.