RepMD: DCG-Based Harmful Meme Detection

Updated 15 January 2026

RepMD is a method for harmful meme detection that uses a Design Concept Graph (DCG) to capture invariant design strategies underlying meme evolution.
It constructs the DCG by adapting attack trees and employs an SVD-based pruning algorithm to streamline nodes for precise guidance in multimodal classifier input.
RepMD shows around a 10-point increase in accuracy and reduces annotation time by about 25 seconds per meme, demonstrating robust performance both in-domain and out-of-domain.

RepMD is a method for ever-shifting harmful meme detection that operationalizes the principle that invariant design concepts underlie the type-shifting and temporal-evolving phenomena characteristic of online harmful memes. The core innovation is the Design Concept Graph (DCG), which encodes the underlying strategies malicious actors employ in meme creation, guiding multimodal LLM (MLLM) classifiers through graph-derived, interpretive prompts. RepMD delivers substantial improvements in accuracy and robustness on both in-domain and out-of-domain meme classification tasks, while substantially accelerating human annotation workflows (Jiang et al., 8 Jan 2026).

1. Formalization and Construction of the Design Concept Graph

1.1. Attack Trees and the RepMD Adaptation

RepMD draws conceptual inspiration from attack trees—formal structures representing how an adversary achieves goals via combinations of methods using AND/OR logic. The adaptation replaces the “fail-reason tree” (a tree explaining why memes evade detection) with a DCG encoding "how a malicious designer creates" such memes, making latent design strategies explicit.

1.2. Fail-Reason Tree ( $G_F$ ): Precursor to DCG

The fail-reason tree $G_F$ is generated by identifying historical memes that induce failure in an ensemble of MLLMs (≥ 3/5 misclassification). A large LLM (Qwen3VL-235B) structures $G_F$ with:

Type nodes ( $\mathcal{N}_T$ ): Hierarchical—macro type ( $L_1$ ), subtype ( $L_2$ ), further subtype ( $L_3$ ).
Fail-reason nodes ( $\mathcal{N}_F$ ): One-sentence failure explanations plus extended text.

Edges include:

Type edges $E_T$ : $\mathcal{N}_T^{L_i} \rightarrow \mathcal{N}_T^{L_{i+1}}$
Link edges $G_F$ 0: $G_F$ 1

1.3. DCG Definitions and Structure

The Design Concept Graph $G_F$ 2 formalizes historical meme analysis into a heterogeneous, logic-structured graph:

$G_F$ 3

Nodes:
- $G_F$ 4: Type (from $G_F$ 5)
- $G_F$ 6: Reproduction-method (“steps malicious user takes”)
- $G_F$ 7: Logic gates ( $G_F$ 8, $G_F$ 9, $G_F$ 0)
- $G_F$ 1: Goals (e.g., eliciting human-specified harmful reactions)
Each node has a binary "harmful indicator".
Edges:
- Type edges $G_F$ 2: $G_F$ 3
- Achievement edges $G_F$ 4: $G_F$ 5
- Link edges $G_F$ 6: $G_F$ 7

1.4. SVD-Based Graph Pruning

After initial DCG construction, redundancy is eliminated via an SVD-based pruning algorithm:

Construct adjacency-score matrix
Compute node degrees.
For $G_F$ 8: a) $G_F$ 9 b) SVD on $\mathcal{N}_T$ 0: $\mathcal{N}_T$ 1, obtain singular values $\mathcal{N}_T$ 2 c) Select cut = $\mathcal{N}_T$ 3 d) Retain top-cut nodes as $\mathcal{N}_T$ 4 e) Break if $\mathcal{N}_T$ 5
Return $\mathcal{N}_T$ 6

Scoring for adjacency uses TF-IDF cosine similarity and $\mathcal{N}_T$ 7, with root nodes providing reference for substructure similarity metrics. Scaling coefficients $\mathcal{N}_T$ 8 control the influence of edge types.

2. Design Concept Reproduction and DCG Derivation

2.1. Extraction and Reflection on Failures

Historical memes are processed through an ensemble of MLLMs (Doubao-1.5, GPT-4o, Qwen2.5VL, Gemini-Pro, InternVL3) under a “Prompt $\mathcal{N}_T$ 9.” Cases with $L_1$ 0 three model failures ( $L_1$ 1) are then analyzed via Qwen3VL, generating fail-reason nodes ( $L_1$ 2), hierarchically classified by type.

Keypoints are summarized and appended to prompts, yielding an optimized classifier prompt ( $L_1$ 3).

2.2. From Fail-Tree to Compact DCG

For each fail-reason node $L_1$ 4, Qwen3VL (Prompt $L_1$ 5) is queried for:

Reproduction methods ( $L_1$ 6) and logic ( $L_1$ 7), using targeted questions (e.g., replacement possibilities, rationale, and persistence of harm).
Alignment with visual cues and extraction of goals ( $L_1$ 8).

Nodes and edges from multiple memes are aggregated into a raw DCG, subsequently pruned via the SVD procedure for compactness.

2.3. DCG Linearization for MLLM Input

Per-meme, a relevant DCG subgraph ( $L_1$ 9) is linearized as a logical formula, such as:

$L_2$ 0

This is rendered as stepwise natural-language guidance for downstream classifiers.

3. DCG-Guided Multimodal LLM Classification

3.1. Workflow

The classification pipeline comprises:

Relevant-DCG Retrieval: Prompts ( $L_2$ 1, $L_2$ 2) are used with an LLM to retrieve the subset of DCG’ nodes (types, methods, goals) relevant to a given meme, yielding $L_2$ 3.
Guidance Generation & Classification: $L_2$ 4 is linearized into ordered instructions ( $L_2$ 5). The meme and guidance constitute the input for an MLLM evaluated with prompt $L_2$ 6 to predict harmfulness.

3.2. Prompt-based Operation

RepMD does not introduce further trainable fusion layers or custom loss functions. Fusion of meme content and DCG-derived context occurs through standard cross-modal attention mechanisms within the base MLLMs, operating in a zero- or few-shot setup.

4. Quantitative Performance and Ablation Analysis

4.1. Datasets

Type-shifting: Revised GOAT-Bench (5 categories: Racism, Misogyny, Offensiveness, Sarcasm, Toxicity), $L_2$ 7 DCG memes plus $L_2$ 8 target memes.
Temporal-evolving: $L_2$ 9 Twitter memes (2025 Q1–Q4; $L_3$ 0 DCG + $L_3$ 1 targets per quarter).
Expert-labeled (Cohen's $L_3$ 2).

4.2. Baselines

Few-shot SFT: Mod-Hate, RA-HMD
RAG-based: MIND
Vanilla MLLMs: Qwen2.5VL, GPT-4o, Doubao-1.5-Vision-Lite/Pro

4.3. Type-Shifting Results

Metric	Vanilla	+RepMD_ID	+RepMD_OOD
F1	66.9%	76.7%	74.9%
Accuracy	61.4%	75.8%	74.2%

In-domain gains: +9.8 F1, +14.4 Accuracy.
Out-of-domain degradation: –1.8 F1, –1.6 Accuracy.
Baselines: Best RAG/SFT suffers ~19.5 points accuracy drop crossing domains.

4.4. Temporal-Evolving Results

Quarter	Vanilla (F1/Acc)	+RepMD_TF	+RepMD_TE
Apr–Jun	51.3 / 50.0	60.2/60.0	67.4/65.0
Oct–Dec	67.4 / 65.0	83.5/80.0	82.4/78.0

In-quarter gain: +13.7 F1, +14.3 Accuracy.
Cross-quarter drop: –0.7 F1, –1.2 Accuracy.

4.5. Ablation Study

Component Removed	$L_3$ 3Acc (ID)	$L_3$ 4Acc (TE)
–VoteMLLMs	–1.3	–4.0
–OptPrompt	–3.3	–4.4
DCG $L_3$ 5Tree	–5.5	–8.8
–Retrieval	–6.0	–9.2

SVD pruning is computationally more efficient than LLM pruning, which offers marginal TE improvements at 10–14 $L_3$ 6 time cost per meme.

5. Human Evaluation and Qualitative Analysis

5.1. DCG Interpretability and Actionability

Likert-rated (1–5, $L_3$ 7 evaluators):

Criterion	Fail-Tree	DCG	Δ
Relevance	3.1	4.3	+1.2
Correctness	3.2	4.1	+0.9
Actionability	2.9	4.0	+1.1
Uniqueness	3.0	4.2	+1.2
Explainability	1.8	4.4	+2.6

DCG scores $L_3$ 8 for all criteria, with the most pronounced improvement in Explainability (+2.6).

5.2. Annotation Efficiency

Without DCG guidance: 45–60 s per meme.
With DCG guidance: 15–30 s (mean savings $L_3$ 925 s/meme).
Greatest time savings observed for Offensiveness and Sarcasm subtypes.

5.3. Case Studies

Implicit violence ("Minecraft Creeper + Ignition tool"): Vanilla MLLMs fail; DCG guidance (weapon-implied violence motif) enables correct detection.
Sarcastic "Stan abbreviation" memes: DCG highlights cultural subtypes, increasing correct Sarcasm detection.

6. Summary and Implications

RepMD introduces an explicit, explainable DCG framework, derived and pruned from historical MLLM failures, and operationalized via prompt-based, subgraph-guided input to base MLLMs. The approach consistently yields approximately +10 percentage point accuracy gains on in-domain harmful meme detection. These benefits degrade by only $\mathcal{N}_F$ 02 points out-of-domain and confer mean annotation speedups of $\mathcal{N}_F$ 125 seconds per instance, all without further model fine-tuning or additional fusion layers. The method demonstrates that making latent malicious design strategies explicit and actionable can measurably enhance both automation and human-in-the-loop evaluation in adversarial, dynamic content moderation (Jiang et al., 8 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

All Changes May Have Invariant Principles: Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RepMD.