Papers
Topics
Authors
Recent
Search
2000 character limit reached

A multi-modal approach for identifying schizophrenia using cross-modal attention

Published 26 Sep 2023 in eess.SP, cs.MM, cs.SD, eess.AS, and eess.IV | (2309.15136v3)

Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Automated speech-and text-based classification of neuropsychiatric conditions in a multidiagnostic setting,” arXiv preprint arXiv:2301.06916, 2023.
  2. Institute of Health Metrics and Evaluation, “Global health data exchange,” 2021.
  3. N.C. Andreasen and S. Olsen, “Negative v Positive Schizophrenia:Definition and Validation,” Archives of General Psychiatry, vol. 39, pp. 789–794, 07 1982.
  4. “Reflections on the nature of measurement in language-based automated assessments of patients’ mental state and cognitive function,” Schizophrenia Research, 2022.
  5. “Using language processing and speech analysis for the identification of psychosis and other disorders,” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 5, no. 8, pp. 770–779, 2020.
  6. “Multimodal approach for assessing neuromotor coordination in schizophrenia using convolutional neural networks,” in Proc. of the ICMI, 2021, pp. 768–772.
  7. “Inverted vocal tract variables and facial action units to quantify neuromotor coordination in schizophrenia,” in 12th ISSP, 2021.
  8. “Attention is all you need,” 2017.
  9. “Effective approaches to attention-based neural machine translation,” in Proc. of the Conference on Empirical Methods in NLP, Lisbon, Portugal, Sept. 2015, pp. 1412–1421, Association for Computational Linguistics.
  10. “Hierarchical attention networks for document classification,” in Proc. of the conf. of the North American chapter of the association for computational linguistics: human language technologies, 2016, pp. 1480–1489.
  11. “Depressnet: A multimodal hierarchical attention mechanism approach for depression detection,” Int. J. Eng. Sci., vol. 15, no. 1, pp. 24–32, 2022.
  12. “Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables,” in Proc. Interspeech, 2019, pp. 1448–1452.
  13. “Blinded clinical ratings of social media data are correlated with in-person clinical ratings in participants diagnosed with either depression, schizophrenia, or healthy controls,” Psychiatry Research, vol. 294, pp. 113496, 2020.
  14. Y.M. Siriwardena and C. Espy-Wilson, “The secret source: Incorporating source features to improve acoustic-to-articulatory speech inversion,” in IEEE ICASSP, 2023, pp. 1–5.
  15. “Use of temporal information: detection of periodicity, aperiodicity, and pitch in speech,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 776–786, 2005.
  16. “Extended study on the use of vocal tract variables to quantify neuromotor coordination in depression.,” in Interspeech, 2020, pp. 4551–4555.
  17. “Avec workshop and challenge: State-of-mind, detecting depression with ai, and cross-cultural affect recognition,” in Proc. of the 9th International on Audio/Visual Emotion Challenge and Workshop. 2019, p. 3–12, Association for Computing Machinery.
  18. “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.
  19. “Snore Sound Classification Using Image-Based Deep Spectrum Features,” in Proc. Interspeech, 2017, pp. 3512–3516.
  20. “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proc. of the 34th International Conference on Neural Information Processing Systems, 2020.
  21. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” CoRR, vol. abs/2106.07447, 2021.
  22. “Openface 2.0: Facial behavior analysis toolkit,” in 13th IEEE international conf. on automatic face & gesture recognition. IEEE, 2018, pp. 59–66.
  23. “Facial action coding system,” 2015.
  24. “Exploiting vocal tract coordination using dilated cnns for depression detection in naturalistic environments,” in ICASSP. IEEE, 2020, pp. 6549–6553.
  25. “Glove: Global vectors for word representation,” in Proc. of the conference on EMNLP, 2014, pp. 1532–1543.
  26. “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
  27. S. Syed and M. Spruit, “Full-text or abstract? examining topic coherence scores using latent dirichlet allocation,” in IEEE International conf. on data science and advanced analytics (DSAA). IEEE, 2017, pp. 165–174.
  28. “Automated analysis of free speech predicts psychosis onset in high-risk youths,” npj Schizophrenia, vol. 1, no. 1, pp. 1–7, 2015.
Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 1 like about this paper.