Papers
Topics
Authors
Recent
Search
2000 character limit reached

THCHS-30 : A Free Chinese Speech Corpus

Published 7 Dec 2015 in cs.CL and cs.SD | (1512.01882v2)

Abstract: Speech data is crucially important for speech recognition research. There are quite some speech databases that can be purchased at prices that are reasonable for most research institutes. However, for young people who just start research activities or those who just gain initial interest in this direction, the cost for data is still an annoying barrier. We support the `free data' movement in speech recognition: research institutes (particularly supported by public funds) publish their data freely so that new researchers can obtain sufficient data to kick of their career. In this paper, we follow this trend and release a free Chinese speech database THCHS-30 that can be used to build a full- edged Chinese speech recognition system. We report the baseline system established with this database, including the performance under highly noisy conditions.

Citations (223)

Summary

  • The paper introduces THCHS-30, a free Chinese speech corpus with over 30 hours of data recorded from 50 speakers to support ASR research.
  • The authors detail a baseline HMM-DNN system using Kaldi that achieves a CER of 30.11% and a PER of 14.81%, with a denoising auto-encoder boosting performance in noisy conditions.
  • The release of THCHS-30 democratizes ASR research by providing comprehensive resources, enhancing reproducibility, and advancing noise-robust modeling techniques.

Overview of "THCHS-30: A Free Chinese Speech Corpus"

The paper "THCHS-30: A Free Chinese Speech Corpus" by Dong Wang and Xuewei Zhang addresses a significant barrier in the field of Automatic Speech Recognition (ASR): access to large, high-quality speech datasets. The authors contribute to the 'free data' movement by releasing THCHS-30, a Chinese speech database designed to facilitate ASR research and innovation, particularly for resource-limited researchers and institutions.

Contribution Highlights

THCHS-30 emerges as one of the first free-to-access Chinese corpora that aim to support the construction of comprehensive Chinese ASR systems. It comprises over 30 hours of speech data recorded from 50 participants. Accompanied by extensive resources such as lexica, LMs, and training recipes, THCHS-30 offers a complete toolkit to build a large vocabulary continuous speech recognition system. The corpus is characterized by a focus on diversity in phone coverage, enhancing its utility in testing and developing robust speech recognition models.

Baseline System and Results

The authors present a baseline ASR system developed using the THCHS-30 corpus with the Kaldi toolkit, employing a hidden Markov model-deep neural network (HMM-DNN) architecture. The initial results indicate a character error rate (CER) of 30.11% and a phone error rate (PER) of 14.81% on clean test data. However, performance significantly deteriorated under noisy conditions, a common challenge in ASR systems. Importantly, the application of a denoising auto-encoder (DAE) improved recognition accuracy under noise, showcasing a practical solution to the noise-related challenges in ASR.

Implications and Future Directions

This paper's release of THCHS-30 offers a pivotal resource for stimulating Chinese speech recognition research and lowers the entry barrier for researchers lacking substantial funding. The availability of a free corpus with comprehensive resources is expected to enhance reproducibility and comparability across research outputs, facilitating more objective evaluations of different models and techniques.

Furthermore, the application of DAE reflects an emerging direction in noise-robust ASR contexts that merit further exploration. The promising results from this approach suggest that continuous advancements in noise cancellation methods could yield substantial improvements in the practical deployment of ASR systems across diverse environments.

Speculation on Future Developments

Looking forward, the provision of resources like THCHS-30 may lead to innovations that incorporate novel machine learning techniques, such as more advanced neural architecture designs or unsupervised learning paradigms to enhance model performance. Additionally, there could be a push towards sourcing real-time data collection and annotation methods that mitigate the cost barriers associated with creating large-scale datasets. Expanding on DAEs, future work may explore integrating more sophisticated noise-specific modeling or exploring adversarial approaches for robustness across various real-world conditions.

Overall, THCHS-30 advances the democratization of resources crucial to the field of ASR and stands as a testament to a shared commitment to collaborative progress in speech technology research.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.