Decoding the Underlying Meaning of Multimodal Hateful Memes

Published 28 May 2023 in cs.CL, cs.AI, and cs.CV | (2305.17678v2)

Abstract: Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme dataset annotated with the underlying hateful contextual reasons. We also define a new conditional generation task that aims to automatically generate underlying reasons to explain hateful memes and establish the baseline performance of state-of-the-art pre-trained LLMs on this task. We further demonstrate the usefulness of HatReD by analyzing the challenges of the new conditional generation task in explaining memes in seen and unseen domains. The dataset and benchmark models are made available here: https://github.com/Social-AI-Studio/HatRed

Abstract PDF HTML Upgrade to Chat

References (34)

Citations (23)

View on Semantic Scholar

Summary

The paper introduces HatReD, a multimodal dataset with annotated explanations to decode hateful memes.
It employs conditional generation tasks using PLMs such as T5 and VisualBERT to generate contextual hate explanations.
Experimental results reveal that text-based models currently outperform vision-language counterparts, indicating key avenues for future research.

Decoding the Underlying Meaning of Multimodal Hateful Memes

The paper "Decoding the Underlying Meaning of Multimodal Hateful Memes" (2305.17678) presents a unique approach to classification tasks by emphasizing the incorporation of interpretable explanations for hateful memes. Primarily, it discusses the lack of datasets that provide such ground truth explanations, which hampers the development of models that can elucidate the reasoning behind classifications. This paper fills this gap with the introduction of the Hateful meme with Reasons Dataset (\textsf{HatReD}), a multimodal dataset that includes not only the hateful meme content but also annotated explanations of their underlying hateful messages.

Introduction and Motivation

The emergence of hateful memes on social media platforms poses significant challenges for content moderation. While automated hate detection models have progressed, many lack interpretability, leaving the rationale behind hate classification unexplained. Addressing this disconnect, the paper presents \textsf{HatReD}, a dataset tailored to decoding the intrinsic hateful context, thus allowing moderators to better understand and verify automated hate classifications.

Dataset Construction

The \textsf{HatReD} dataset involves the collection of annotated multimodal meme data, focusing on five key hate incitement categories: sexual, racial, religious, nationality, and disability hatred. Each meme pairs with an explanation that elucidates the hateful context, often rooted in socio-cultural knowledge, making it markedly different from other datasets.

Figure 1: Example of a hateful meme in \textsf{HatReD}.

Hateful Meme Explanation Task

The paper proposes a conditional generation task where the primary objective is to generate text that explains the hatefulness of a meme. This task involves pretrained LLMs (PLMs), such as T5 and VisualBERT, trained to parse multimodal inputs (text and images) and produce coherent hate explanations.

Figure 2: Input Saliency of T5 model on Meme \ref{tab:qualitative-analysis-hatred}.

Evaluation Metrics and Experimentation

The models are evaluated using automatic metrics (BLEU, ROUGE-L, and BERTScore) and human evaluations focusing on fluency and relevance of generated explanations. The evaluations conducted provide foundational benchmarks for future research.

The experiments highlight that text-only models like T5 often outperform vision-language counterparts initially in relevance, suggesting limitations in current vision-language integration and feature extraction methods.

Challenges and Future Work

The results revealed challenges in accurately generating contextual explanations, highlighting the importance of addressing hallucinations and improving multimodal comprehension. Future work aims to refine the \textsf{HatReD} dataset by expanding domains and integrating retrieval augmentation to supplement PLM reasoning.

Conclusion

"Decoding the Underlying Meaning of Multimodal Hateful Memes" introduces a novel dataset and task critical for moderating hate speech in online media. The \textsf{HatReD} dataset and the benchmarks foster opportunities to develop more interpretable and robust AI systems capable of contextually understanding and explaining hateful memes, ultimately assisting in the creation of safer online environments.

Markdown Report Issue