Concurrent Validity of Automatic Speech and Pause Measures During Passage Reading in ALS

Published 22 Aug 2022 in cs.SD and eess.AS | (2208.10597v1)

Abstract: The analysis of speech measures in individuals with amyotrophic lateral sclerosis (ALS) can provide essential information for early diagnosis and tracking disease progression. However, current methods for extracting speech and pause features are manual or semi-automatic, which makes them time-consuming and labour-intensive. The advent of speech-text alignment algorithms provides an opportunity for inexpensive, automated, and accurate analysis of speech measures in individuals with ALS. There is a need to validate speech and pause features calculated by these algorithms against current gold standard methods. In this study, we extracted 8 speech/pause features from 646 audio files of individuals with ALS and healthy controls performing passage reading. Two pretrained forced alignment models - one using transformers and another using a Gaussian mixture / hidden Markov architecture - were used for automatic feature extraction. The results were then validated against semi-automatic speech/pause analysis software, with further subgroup analyses based on audio quality and disease severity. Features extracted using transformer-based forced alignment had the highest agreement with gold standards, including in terms of audio quality and disease severity. This study lays the groundwork for future intelligent diagnostic support systems for clinicians, and for novel methods of tracking disease progression remotely from home.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that transformer-based Wav2Vec2 reliably extracts speech and pause features, showing high concordance with manual assessments.
It compares Gaussian mixture/HMM-based approaches with Wav2Vec2, revealing that audio quality significantly impacts feature extraction accuracy.
The study highlights the potential for automated speech assessment in ALS to enable efficient remote monitoring and clinical intervention.

Concurrent Validity of Automatic Speech and Pause Measures During Passage Reading in ALS

Introduction

The study "Concurrent Validity of Automatic Speech and Pause Measures During Passage Reading in ALS" explores the deployment of automatic feature extraction techniques in evaluating speech and pause metrics in individuals suffering from Amyotrophic Lateral Sclerosis (ALS). Given the neurodegenerative nature of ALS and its profound impact on speech, this research addresses the inefficiency of current manual or semi-automatic speech assessment tools. The study leverages advanced speech-text alignment algorithms employing transformer-based models, presented as a potential automation to replace laborious conventional methods.

Methodology

The study was conducted on 646 audio samples from ALS patients and control subjects. The analysis focused on two computational models: a Gaussian mixture/hidden Markov model (GMM/HMM) as implemented in the Montreal Forced Aligner (MFA), and a transformer-based Wav2Vec2 model. Both models executed forced alignment for speech and pause feature extraction. A total of eight features were extracted, including pause duration, speech duration, and total duration. Validation was performed against the semi-automated Speech and Pause Analysis (SPA) software, assessing correlation across various subsets of audio quality and ALS severity.

Results

The transformer-based Wav2Vec2 model demonstrated superior performance in feature extraction accuracy compared to the gold-standard SPA software. Notably, the agreement of features like pause and speech duration with SPA outputs was robust across quality variants of audio samples, with the strongest correlations observed in high-quality recordings. This is evident in the Bland–Altman plots which articulate the close agreement of these parameters within acceptable limits (Figure 1).

Figure 1: The Bland-Altman plots between Wav2Vec2 and SPA features of pause duration, speech duration, and total duration on Good audio quality data. The dashed lines indicate the limits of agreement.

Discussion

These findings suggest a promising shift towards automated assessment using transformer-based models, particularly the Wav2Vec2, as they offer consistent performance regardless of disease severity. However, the sensitivity to audio quality suggests a need for higher quality data acquisition in clinical settings. The adaptation to utilize high-performing models like transformers highlights a path forward for remote and efficient ALS symptom tracking, marking significant potential for intervention and management strategies in home environments.

The study acknowledges limitations due to variance in audio sample quality and suggests further research into adaptive algorithms that can mitigate these variations. Future work could expand beyond ALS to encompass other neurodegenerative diseases affecting speech, such as Parkinson's disease and post-stroke conditions, further enhancing the generalizability of the findings.

Conclusion

This research underscores the viability of transformer-based automatic speech alignment in extracting reliable speech measures in ALS patients. The Wav2Vec2 model, in particular, demonstrates high validity against established methods, emphasizing the critical need for precise audio input. This advancement could ultimately revolutionize the landscape of clinical assessment tools, facilitating more accessible diagnostic and progression-monitoring methodologies for speech impairments arising from neurodegenerative disorders.

Markdown Report Issue