Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Low-rank Matching Attention based Cross-modal Feature Fusion Method for Conversational Emotion Recognition

Published 16 Jun 2023 in cs.CV, cs.SD, and eess.AS | (2306.17799v2)

Abstract: Conversational emotion recognition (CER) is an important research topic in human-computer interactions. {Although recent advancements in transformer-based cross-modal fusion methods have shown promise in CER tasks, they tend to overlook the crucial intra-modal and inter-modal emotional interaction or suffer from high computational complexity. To address this, we introduce a novel and lightweight cross-modal feature fusion method called Low-Rank Matching Attention Method (LMAM). LMAM effectively captures contextual emotional semantic information in conversations while mitigating the quadratic complexity issue caused by the self-attention mechanism. Specifically, by setting a matching weight and calculating inter-modal features attention scores row by row, LMAM requires only one-third of the parameters of self-attention methods. We also employ the low-rank decomposition method on the weights to further reduce the number of parameters in LMAM. As a result, LMAM offers a lightweight model while avoiding overfitting problems caused by a large number of parameters. Moreover, LMAM is able to fully exploit the intra-modal emotional contextual information within each modality and integrates complementary emotional semantic information across modalities by computing and fusing similarities of intra-modal and inter-modal features simultaneously. Experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods on the premise of being more lightweight. Also, LMAM can be embedded into any existing state-of-the-art CER methods in a plug-and-play manner, and can be applied to other multi-modal recognition tasks, e.g., session recommendation and humour detection, demonstrating its remarkable generalization ability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems 184 (2019), 104886.
  2. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation 42, 4 (2008), 335–359.
  3. Benchmarking multimodal sentiment analysis. In Computational Linguistics and Intelligent Text Processing: 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017, Revised Selected Papers, Part II 18. Springer, 166–179.
  4. Adaptive feature selection-based AdaBoost-KNN with direct optimization for dynamic emotion recognition in human–robot interaction. IEEE Transactions on Emerging Topics in Computational Intelligence 5, 2 (2019), 205–213.
  5. M2FNet: multi-modal fusion network for emotion recognition in conversation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4652–4661.
  6. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  7. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia. ACM, 1459–1462.
  8. DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 154–164.
  9. COMMA-DEER: COmmon-sense aware Multimodal Multitask Approach for Detection of Emotion and Emotional Reasoning in Conversations. In Proceedings of the 29th International Conference on Computational Linguistics. 6978–6990.
  10. UR-FUNNY: A Multimodal Language Dataset for Understanding Humor. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2046–2056.
  11. Icon: Interactive conversational memory network for multimodal emotion detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. ACL, 2594–2604.
  12. Conversational memory network for emotion recognition in dyadic dialogue videos. In Proceedings of the conference. Association for Computational Linguistics., Vol. 2018. 2122.
  13. MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7037–7041.
  14. MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7037–7041.
  15. DialogueCRN: Contextual Reasoning Networks for Emotion Recognition in Conversations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 7042–7052.
  16. MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5666–5675.
  17. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1746–1751.
  18. Contrast and generation make bart a good dialogue emotion recognizer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11002–11010.
  19. EmoCaps: Emotion Capsule based Model for Conversational Emotion Recognition. In Findings of the Association for Computational Linguistics: ACL 2022. 1610–1618.
  20. CTNet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 985–1000.
  21. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ACL, 2247–2256.
  22. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. AAAI, 6818–6825.
  23. Saif M Mohammad. 2022. Ethics sheet for automatic emotion recognition and sentiment analysis. Computational Linguistics 48, 2 (2022), 239–278.
  24. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 1: Long papers). ACL, 873–883.
  25. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, 527–536.
  26. LR-GCN: Latent Relation-Aware Graph Convolutional Network for Conversational Emotion Recognition. IEEE Transactions on Multimedia (2021), 1–1.
  27. Carolin Schuster and Simon Hegelich. 2022. From BERT‘s Point of View: Revealing the Prevailing Contextual Differences. In Findings of the Association for Computational Linguistics: ACL 2022. 1120–1138.
  28. A Combined Rule-Based and Machine Learning Audio-Visual Emotion Recognition Approach. IEEE Transactions on Affective Computing 9, 1 (2018), 3–13. https://doi.org/10.1109/TAFFC.2016.2588488
  29. Novel dual-channel long short-term memory compressed capsule networks for emotion recognition. Expert Systems with Applications 188 (2022), 116080.
  30. Summarize before aggregate: a global-to-local heterogeneous graph inference network for conversational emotion recognition. In Proceedings of the 28th International Conference on Computational Linguistics. ICCL, 4153–4163.
  31. Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis. Neurocomputing 501 (2022), 629–639.
  32. Exploration meets exploitation: Multitask learning for emotion recognition based on discrete and dimensional models. Knowledge-Based Systems 235 (2022), 107598.
  33. Attention is all you need. Advances in neural information processing systems 30 (2017).
  34. Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4503–4511.
  35. Adapted Dynamic Memory Network for Emotion Recognition in Conversation. IEEE Transactions on Affective Computing 13, 3 (2022), 1426–1439.
  36. Hybrid curriculum learning for emotion recognition in conversation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11595–11603.
  37. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103–1114.
  38. Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
Citations (39)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.