Papers
Topics
Authors
Recent
Search
2000 character limit reached

Decoding the Underlying Meaning of Multimodal Hateful Memes

Published 28 May 2023 in cs.CL, cs.AI, and cs.CV | (2305.17678v2)

Abstract: Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme dataset annotated with the underlying hateful contextual reasons. We also define a new conditional generation task that aims to automatically generate underlying reasons to explain hateful memes and establish the baseline performance of state-of-the-art pre-trained LLMs on this task. We further demonstrate the usefulness of HatReD by analyzing the challenges of the new conditional generation task in explaining memes in seen and unseen domains. The dataset and benchmark models are made available here: https://github.com/Social-AI-Studio/HatRed

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR, 2018.
  2. Prompting for multimodal hateful meme classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 321–332, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
  3. Unifying vision-and-language tasks via text generation. In ICML, 2021.
  4. Latent hatred: A benchmark for understanding implicit hate speech. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 345–363, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  5. Semeval-2022 task 5: Multimedia automatic misogyny identification. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics, 2022.
  6. Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content. arXiv preprint arXiv:2106.08409, 2021.
  7. On explaining multimodal hateful meme detection models. In Proceedings of the ACM Web Conference 2022, WWW ’22, page 3651–3655, New York, NY, USA, 2022. Association for Computing Machinery.
  8. Fairface: Face attribute dataset for balanced race, gender, and age. arXiv preprint arXiv:1908.04913, 2019.
  9. The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems, 33:2611–2624, 2020.
  10. The hateful memes challenge: Competition report. In Hugo Jair Escalante and Katja Hofmann, editors, Proceedings of the NeurIPS 2020 Competition and Demonstration Track, volume 133 of Proceedings of Machine Learning Research, pages 344–360. PMLR, 06–12 Dec 2021.
  11. Disentangling hate in online memes. In Proceedings of the 29th ACM International Conference on Multimedia, pages 5138–5147, 2021.
  12. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019.
  13. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
  14. A multimodal framework for the detection of hateful memes. arXiv preprint arXiv:2012.12871, 2020.
  15. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  16. Findings of the WOAH 5 shared task on fine grained hateful memes detection. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pages 201–206, Online, August 2021. Association for Computational Linguistics.
  17. Clipcap: CLIP prefix for image captioning. CoRR, 2021.
  18. Niklas Muennighoff. Vilio: State-of-the-art visio-linguistic models applied to hateful memes. CoRR, 2020.
  19. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics.
  20. Detecting harmful memes and their targets. In Findings of the Association for Computational Linguistics: ACL/IJCNLP, pages 2783–2796, 2021.
  21. MOMENTA: A multimodal framework for detecting harmful memes and their targets. In Findings of the Association for Computational Linguistics: EMNLP, pages 4439–4455, 2021.
  22. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  23. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
  24. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online, November 2020. Association for Computational Linguistics.
  25. Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8:264–280, 2020.
  26. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online, July 2020. Association for Computational Linguistics.
  27. Disarm: Detecting the victims targeted by harmful memes. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1572–1588, 2022.
  28. Detecting and understanding harmful memes: A survey. arXiv preprint arXiv:2205.04274, 2022.
  29. Characterizing the entities in harmful memes: Who is the hero, the villain, the victim? arXiv preprint arXiv:2301.11219, 2023.
  30. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
  31. Multimodal meme dataset (multioff) for identifying offensive content in image and text. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 32–41, 2020.
  32. Detectron2. https://github.com/facebookresearch/detectron2, 2019. (accessed February 28, 2023).
  33. Multimodal hate speech detection via cross-domain knowledge transfer. In Proceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 4505–4514, New York, NY, USA, 2022. Association for Computing Machinery.
  34. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
Citations (23)

Summary

  • The paper introduces HatReD, a multimodal dataset with annotated explanations to decode hateful memes.
  • It employs conditional generation tasks using PLMs such as T5 and VisualBERT to generate contextual hate explanations.
  • Experimental results reveal that text-based models currently outperform vision-language counterparts, indicating key avenues for future research.

Decoding the Underlying Meaning of Multimodal Hateful Memes

The paper "Decoding the Underlying Meaning of Multimodal Hateful Memes" (2305.17678) presents a unique approach to classification tasks by emphasizing the incorporation of interpretable explanations for hateful memes. Primarily, it discusses the lack of datasets that provide such ground truth explanations, which hampers the development of models that can elucidate the reasoning behind classifications. This paper fills this gap with the introduction of the Hateful meme with Reasons Dataset (\textsf{HatReD}), a multimodal dataset that includes not only the hateful meme content but also annotated explanations of their underlying hateful messages.

Introduction and Motivation

The emergence of hateful memes on social media platforms poses significant challenges for content moderation. While automated hate detection models have progressed, many lack interpretability, leaving the rationale behind hate classification unexplained. Addressing this disconnect, the paper presents \textsf{HatReD}, a dataset tailored to decoding the intrinsic hateful context, thus allowing moderators to better understand and verify automated hate classifications.

Dataset Construction

The \textsf{HatReD} dataset involves the collection of annotated multimodal meme data, focusing on five key hate incitement categories: sexual, racial, religious, nationality, and disability hatred. Each meme pairs with an explanation that elucidates the hateful context, often rooted in socio-cultural knowledge, making it markedly different from other datasets. Figure 1

Figure 1: Example of a hateful meme in \textsf{HatReD}.

Hateful Meme Explanation Task

The paper proposes a conditional generation task where the primary objective is to generate text that explains the hatefulness of a meme. This task involves pretrained LLMs (PLMs), such as T5 and VisualBERT, trained to parse multimodal inputs (text and images) and produce coherent hate explanations. Figure 2

Figure 2: Input Saliency of T5 model on Meme \ref{tab:qualitative-analysis-hatred}.

Evaluation Metrics and Experimentation

The models are evaluated using automatic metrics (BLEU, ROUGE-L, and BERTScore) and human evaluations focusing on fluency and relevance of generated explanations. The evaluations conducted provide foundational benchmarks for future research.

The experiments highlight that text-only models like T5 often outperform vision-language counterparts initially in relevance, suggesting limitations in current vision-language integration and feature extraction methods.

Challenges and Future Work

The results revealed challenges in accurately generating contextual explanations, highlighting the importance of addressing hallucinations and improving multimodal comprehension. Future work aims to refine the \textsf{HatReD} dataset by expanding domains and integrating retrieval augmentation to supplement PLM reasoning.

Conclusion

"Decoding the Underlying Meaning of Multimodal Hateful Memes" introduces a novel dataset and task critical for moderating hate speech in online media. The \textsf{HatReD} dataset and the benchmarks foster opportunities to develop more interpretable and robust AI systems capable of contextually understanding and explaining hateful memes, ultimately assisting in the creation of safer online environments.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.