RepMD: DCG-Based Harmful Meme Detection
- RepMD is a method for harmful meme detection that uses a Design Concept Graph (DCG) to capture invariant design strategies underlying meme evolution.
- It constructs the DCG by adapting attack trees and employs an SVD-based pruning algorithm to streamline nodes for precise guidance in multimodal classifier input.
- RepMD shows around a 10-point increase in accuracy and reduces annotation time by about 25 seconds per meme, demonstrating robust performance both in-domain and out-of-domain.
RepMD is a method for ever-shifting harmful meme detection that operationalizes the principle that invariant design concepts underlie the type-shifting and temporal-evolving phenomena characteristic of online harmful memes. The core innovation is the Design Concept Graph (DCG), which encodes the underlying strategies malicious actors employ in meme creation, guiding multimodal LLM (MLLM) classifiers through graph-derived, interpretive prompts. RepMD delivers substantial improvements in accuracy and robustness on both in-domain and out-of-domain meme classification tasks, while substantially accelerating human annotation workflows (Jiang et al., 8 Jan 2026).
1. Formalization and Construction of the Design Concept Graph
1.1. Attack Trees and the RepMD Adaptation
RepMD draws conceptual inspiration from attack trees—formal structures representing how an adversary achieves goals via combinations of methods using AND/OR logic. The adaptation replaces the “fail-reason tree” (a tree explaining why memes evade detection) with a DCG encoding "how a malicious designer creates" such memes, making latent design strategies explicit.
1.2. Fail-Reason Tree (): Precursor to DCG
The fail-reason tree is generated by identifying historical memes that induce failure in an ensemble of MLLMs (≥ 3/5 misclassification). A large LLM (Qwen3VL-235B) structures with:
- Type nodes (): Hierarchical—macro type (), subtype (), further subtype ().
- Fail-reason nodes (): One-sentence failure explanations plus extended text.
Edges include:
- Type edges :
- Link edges :
1.3. DCG Definitions and Structure
The Design Concept Graph formalizes historical meme analysis into a heterogeneous, logic-structured graph:
- Nodes:
- : Type (from )
- : Reproduction-method (“steps malicious user takes”)
- : Logic gates (, , )
- : Goals (e.g., eliciting human-specified harmful reactions)
- Each node has a binary "harmful indicator".
- Edges:
- Type edges :
- Achievement edges :
- Link edges :
1.4. SVD-Based Graph Pruning
After initial DCG construction, redundancy is eliminated via an SVD-based pruning algorithm:
- Construct adjacency-score matrix
- Compute node degrees.
- For : a) b) SVD on : , obtain singular values c) Select cut = d) Retain top-cut nodes as e) Break if
- Return
Scoring for adjacency uses TF-IDF cosine similarity and , with root nodes providing reference for substructure similarity metrics. Scaling coefficients control the influence of edge types.
2. Design Concept Reproduction and DCG Derivation
2.1. Extraction and Reflection on Failures
Historical memes are processed through an ensemble of MLLMs (Doubao-1.5, GPT-4o, Qwen2.5VL, Gemini-Pro, InternVL3) under a “Prompt .” Cases with three model failures () are then analyzed via Qwen3VL, generating fail-reason nodes (), hierarchically classified by type.
Keypoints are summarized and appended to prompts, yielding an optimized classifier prompt ().
2.2. From Fail-Tree to Compact DCG
For each fail-reason node , Qwen3VL (Prompt ) is queried for:
- Reproduction methods () and logic (), using targeted questions (e.g., replacement possibilities, rationale, and persistence of harm).
- Alignment with visual cues and extraction of goals ().
Nodes and edges from multiple memes are aggregated into a raw DCG, subsequently pruned via the SVD procedure for compactness.
2.3. DCG Linearization for MLLM Input
Per-meme, a relevant DCG subgraph () is linearized as a logical formula, such as:
This is rendered as stepwise natural-language guidance for downstream classifiers.
3. DCG-Guided Multimodal LLM Classification
3.1. Workflow
The classification pipeline comprises:
- Relevant-DCG Retrieval: Prompts (, ) are used with an LLM to retrieve the subset of DCG’ nodes (types, methods, goals) relevant to a given meme, yielding .
- Guidance Generation & Classification: is linearized into ordered instructions (). The meme and guidance constitute the input for an MLLM evaluated with prompt to predict harmfulness.
3.2. Prompt-based Operation
RepMD does not introduce further trainable fusion layers or custom loss functions. Fusion of meme content and DCG-derived context occurs through standard cross-modal attention mechanisms within the base MLLMs, operating in a zero- or few-shot setup.
4. Quantitative Performance and Ablation Analysis
4.1. Datasets
- Type-shifting: Revised GOAT-Bench (5 categories: Racism, Misogyny, Offensiveness, Sarcasm, Toxicity), $56,192$ DCG memes plus $7,626$ target memes.
- Temporal-evolving: $2,000$ Twitter memes (2025 Q1–Q4; $400$ DCG + $100$ targets per quarter).
- Expert-labeled (Cohen's ).
4.2. Baselines
- Few-shot SFT: Mod-Hate, RA-HMD
- RAG-based: MIND
- Vanilla MLLMs: Qwen2.5VL, GPT-4o, Doubao-1.5-Vision-Lite/Pro
4.3. Type-Shifting Results
| Metric | Vanilla | +RepMD_ID | +RepMD_OOD |
|---|---|---|---|
| F1 | 66.9% | 76.7% | 74.9% |
| Accuracy | 61.4% | 75.8% | 74.2% |
- In-domain gains: +9.8 F1, +14.4 Accuracy.
- Out-of-domain degradation: –1.8 F1, –1.6 Accuracy.
- Baselines: Best RAG/SFT suffers ~19.5 points accuracy drop crossing domains.
4.4. Temporal-Evolving Results
| Quarter | Vanilla (F1/Acc) | +RepMD_TF | +RepMD_TE |
|---|---|---|---|
| Apr–Jun | 51.3 / 50.0 | 60.2/60.0 | 67.4/65.0 |
| Oct–Dec | 67.4 / 65.0 | 83.5/80.0 | 82.4/78.0 |
- In-quarter gain: +13.7 F1, +14.3 Accuracy.
- Cross-quarter drop: –0.7 F1, –1.2 Accuracy.
4.5. Ablation Study
| Component Removed | Acc (ID) | Acc (TE) |
|---|---|---|
| –VoteMLLMs | –1.3 | –4.0 |
| –OptPrompt | –3.3 | –4.4 |
| DCGTree | –5.5 | –8.8 |
| –Retrieval | –6.0 | –9.2 |
SVD pruning is computationally more efficient than LLM pruning, which offers marginal TE improvements at 10–14 time cost per meme.
5. Human Evaluation and Qualitative Analysis
5.1. DCG Interpretability and Actionability
Likert-rated (1–5, evaluators):
| Criterion | Fail-Tree | DCG | Δ |
|---|---|---|---|
| Relevance | 3.1 | 4.3 | +1.2 |
| Correctness | 3.2 | 4.1 | +0.9 |
| Actionability | 2.9 | 4.0 | +1.1 |
| Uniqueness | 3.0 | 4.2 | +1.2 |
| Explainability | 1.8 | 4.4 | +2.6 |
DCG scores for all criteria, with the most pronounced improvement in Explainability (+2.6).
5.2. Annotation Efficiency
- Without DCG guidance: 45–60 s per meme.
- With DCG guidance: 15–30 s (mean savings 25 s/meme).
- Greatest time savings observed for Offensiveness and Sarcasm subtypes.
5.3. Case Studies
- Implicit violence ("Minecraft Creeper + Ignition tool"): Vanilla MLLMs fail; DCG guidance (weapon-implied violence motif) enables correct detection.
- Sarcastic "Stan abbreviation" memes: DCG highlights cultural subtypes, increasing correct Sarcasm detection.
6. Summary and Implications
RepMD introduces an explicit, explainable DCG framework, derived and pruned from historical MLLM failures, and operationalized via prompt-based, subgraph-guided input to base MLLMs. The approach consistently yields approximately +10 percentage point accuracy gains on in-domain harmful meme detection. These benefits degrade by only 2 points out-of-domain and confer mean annotation speedups of 25 seconds per instance, all without further model fine-tuning or additional fusion layers. The method demonstrates that making latent malicious design strategies explicit and actionable can measurably enhance both automation and human-in-the-loop evaluation in adversarial, dynamic content moderation (Jiang et al., 8 Jan 2026).