Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improving curriculum learning for target speaker extraction with synthetic speakers

Published 1 Oct 2024 in cs.SD and eess.AS | (2410.00811v2)

Abstract: Target speaker extraction (TSE) aims to isolate individual speaker voices from complex speech environments. The effectiveness of TSE systems is often compromised when the speaker characteristics are similar to each other. Recent research has introduced curriculum learning (CL), in which TSE models are trained incrementally on speech samples of increasing complexity. In CL training, the model is first trained on samples with low speaker similarity between the target and interference speakers, and then on samples with high speaker similarity. To further improve CL, this paper uses a $k$-nearest neighbor-based voice conversion method to simulate and generate speech of diverse interference speakers, and then uses the generated data as part of the CL. Experiments demonstrate that training data based on synthetic speakers can effectively enhance the model's capabilities and significantly improve the performance of multiple TSE systems.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.