Multimodal and Explainable Internet Meme Classification

Published 11 Dec 2022 in cs.AI, cs.CL, and cs.LG | (2212.05612v3)

Abstract: In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.

Abstract PDF Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper presents a novel explainable framework that integrates example-based and prototype-based methods for internet meme classification.
It combines state-of-the-art textual (BERT, BERTweet) and visual (CLIP) models to capture nuanced semantic features for improved classification accuracy.
Experimental results on MAMI and Hateful Memes datasets demonstrate that integrating multimodal cues enhances both performance and explainability.

Multimodal and Explainable Internet Meme Classification

Introduction

The problem of internet meme classification, particularly in detecting harmful content such as hate speech and misogyny, has gained attention due to the weaponization of online platforms in political and social contexts. Traditional black-box methods for meme classification often overlook the semantic and contextual nuances inherent to memes. This paper introduces a modular and explainable architecture that leverages example- and prototype-based reasoning to better understand internet memes.

Methodology

The approach utilizes both textual and visual state-of-the-art (SOTA) models in a multimodal setup to ensure a comprehensive representation of memes. The architecture includes:

Example-based Meme Classification: This method uses similarity-based retrieval, powered by a pre-trained model, to explain classification results by presenting similar memes from the dataset.
Figure 1: Classification and feature extraction model within the Example-based explanation model.

Figure 2: Example-based explanation based on similarity-based meme search. The Train meme-features database contains pre-computed features using the Feature Extractor module.
Prototype-based Meme Classification (xDNN): This inherently interpretable method generates class-wise prototypes and uses rule-based decision-making to classify new memes.
Figure 3: Our architecture for prototype-based explainable classification called Explainable Deep Neural Networks (xDNN). Figure reused from (Angelov et al., 2019).
Feature Extraction Models: This involves the use of BERT and BERTweet for textual information, and CLIP for visual information extraction. The integration of these modalities allows for capturing the complex interplay between text and image in memes.

Experimental Evaluation

The classification performance was evaluated on two datasets: the MAMI (Misogyny Identification) and the Hateful Memes datasets. The methodology was assessed in terms of both accuracy and explainability.

Performance: The example-based method demonstrated superior accuracy over the prototype-based method. CLIP-based visual models outperformed text-based ones, and the combination of CLIP and BERTweet yielded the best results, indicating the importance of integrating visual and textual cues.
Explainability: Visualization tools were developed to aid in the understanding of the classification results, demonstrating that the combined model produced more reliable explanations through retrieval of relevant examples.
Figure 4: Explanatory interface for our Example-based classification method.

Discussion

The paper highlights several challenges in meme classification:

Contextual Dependency: Memes often rely on cultural or real-world contexts that are not captured by current models, leading to potential misclassification.
Integration with Background Knowledge: For more robust classifications, future models need to incorporate background knowledge and address the nuanced use of language and imagery.
Subjectivity in Labeling: The subjective nature of tasks like misogyny detection requires critical examination of the ground truth labels, as biases in annotation can affect model training.

Prior work has mainly focused on basic meme tracking and hate speech detection using SOTA visual and textual models. This paper distinguishes itself by providing a comprehensive, explainable framework that balances performance with transparency. Despite advances, this remains an underexplored area in AI, with significant potential for future research focused on integrating deep contextual understanding with machine learning models.

Conclusion

This research advances the field of meme classification by presenting methods that highlight the importance of combining different data representations and explainability techniques. Future work should address the limitations identified, such as context integration and subjective labeling, to improve classification robustness and transparency. The code and methods presented aim to inspire further development in the explainable classification of internet memes.