Unreported processing speeds of prior in-the-wild speech preprocessing pipelines
Determine the processing speeds of previously proposed automatic preprocessing pipelines for in-the-wild speech data, specifically AutoPrep and WenetSpeech4TTS, whose efficiencies were not reported, to enable fair comparisons with the open-source Emilia-Pipe in terms of throughput and scalability.
References
While previous works propose automatic preprocessing pipelines to address these issues, they rely heavily on proprietary models, making their pipelines less accessible to the broader community. Additionally, the processing speed of these pipelines remains unknown.
— Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
(2407.05361 - He et al., 2024) in Section 1 (Introduction)