Scalability of TTS-based augmentation as a substitute for real learner speech
Determine whether text-to-speech-based mispronunciation augmentation can scale to effectively substitute for real learner speech when training Modern Standard Arabic mispronunciation detection and diagnosis systems, thereby guiding data collection and augmentation strategies for Arabic pronunciation assessment.
References
The IQRA 2026 challenge marks a significant milestone in Arabic pronunciation assessment, yet the results also surface important open questions that the community must address to move this research toward real-world impact. Despite Iqra_Extra_IS26 containing only 1,333 utterances, systems that carefully leveraged it consistently outperformed those relying on far larger synthetic corpora. This raises a fundamental question about the scalability of TTS-based augmentation as a substitute for real learner speech, and motivates investment in larger-scale human data collection as the primary bottleneck for future progress.