Papers
Topics
Authors
Recent
Search
2000 character limit reached

RepMD: DCG-Based Harmful Meme Detection

Updated 15 January 2026
  • RepMD is a method for harmful meme detection that uses a Design Concept Graph (DCG) to capture invariant design strategies underlying meme evolution.
  • It constructs the DCG by adapting attack trees and employs an SVD-based pruning algorithm to streamline nodes for precise guidance in multimodal classifier input.
  • RepMD shows around a 10-point increase in accuracy and reduces annotation time by about 25 seconds per meme, demonstrating robust performance both in-domain and out-of-domain.

RepMD is a method for ever-shifting harmful meme detection that operationalizes the principle that invariant design concepts underlie the type-shifting and temporal-evolving phenomena characteristic of online harmful memes. The core innovation is the Design Concept Graph (DCG), which encodes the underlying strategies malicious actors employ in meme creation, guiding multimodal LLM (MLLM) classifiers through graph-derived, interpretive prompts. RepMD delivers substantial improvements in accuracy and robustness on both in-domain and out-of-domain meme classification tasks, while substantially accelerating human annotation workflows (Jiang et al., 8 Jan 2026).

1. Formalization and Construction of the Design Concept Graph

1.1. Attack Trees and the RepMD Adaptation

RepMD draws conceptual inspiration from attack trees—formal structures representing how an adversary achieves goals via combinations of methods using AND/OR logic. The adaptation replaces the “fail-reason tree” (a tree explaining why memes evade detection) with a DCG encoding "how a malicious designer creates" such memes, making latent design strategies explicit.

1.2. Fail-Reason Tree (GFG_F): Precursor to DCG

The fail-reason tree GFG_F is generated by identifying historical memes that induce failure in an ensemble of MLLMs (≥ 3/5 misclassification). A large LLM (Qwen3VL-235B) structures GFG_F with:

  • Type nodes (NT\mathcal{N}_T): Hierarchical—macro type (L1L_1), subtype (L2L_2), further subtype (L3L_3).
  • Fail-reason nodes (NF\mathcal{N}_F): One-sentence failure explanations plus extended text.

Edges include:

  • Type edges ETE_T: NTLiNTLi+1\mathcal{N}_T^{L_i} \rightarrow \mathcal{N}_T^{L_{i+1}}
  • Link edges ELinkE_{Link}: NTNF\mathcal{N}_T \rightarrow \mathcal{N}_F

1.3. DCG Definitions and Structure

The Design Concept Graph GD\mathcal{G}_D formalizes historical meme analysis into a heterogeneous, logic-structured graph:

GD={NTtype,NMreproduction methods,NGgoals,Nσlogic gates},{ET,EA,ELink}\mathcal{G}_D = \langle \{\, \underbrace{\mathcal{N}_T}_{\text{type}}, \underbrace{\mathcal{N}_M}_{\text{reproduction methods}}, \underbrace{\mathcal{N}_G}_{\text{goals}}, \underbrace{\mathcal{N}_\sigma}_{\text{logic gates}} \},\, \{\, \mathcal{E}_T,\,\mathcal{E}_A,\,\mathcal{E}_\mathrm{Link} \} \rangle

  • Nodes:
    • NT\mathcal{N}_T: Type (from GFG_F)
    • NM\mathcal{N}_M: Reproduction-method (“steps malicious user takes”)
    • Nσ\mathcal{N}_\sigma: Logic gates (\wedge, \vee, ¬\neg)
    • NG\mathcal{N}_G: Goals (e.g., eliciting human-specified harmful reactions)
  • Each node has a binary "harmful indicator".
  • Edges:
    • Type edges ET\mathcal{E}_T: NTNT\mathcal{N}_T \rightarrow \mathcal{N}_T
    • Achievement edges EA\mathcal{E}_A: (NM,Nσ,NG)(\mathcal{N}_M, \mathcal{N}_\sigma, \mathcal{N}_G)
    • Link edges ELink\mathcal{E}_{Link}: NTNM\mathcal{N}_T \rightarrow \mathcal{N}_M

1.4. SVD-Based Graph Pruning

After initial DCG construction, redundancy is eliminated via an SVD-based pruning algorithm:

  1. Construct adjacency-score matrix
  2. Compute node degrees.
  3. For t=1,,5t = 1, \ldots, 5: a) L=ID1/2AtD1/2L = I - D^{-1/2} A^t D^{-1/2} b) SVD on LL: L=UΛVTL = U \Lambda V^T, obtain singular values λi\lambda_i c) Select cut = argmaxiln(λi+1λi)\arg\max_i |\ln(\lambda_{i+1} - \lambda_i)| d) Retain top-cut nodes as GDG_D' e) Break if cut/nodesθ\mathrm{cut} / |\mathrm{nodes}| \geq \theta
  4. Return GDG_D'

Scoring for adjacency uses TF-IDF cosine similarity and ReLU\mathrm{ReLU}, with root nodes providing reference for substructure similarity metrics. Scaling coefficients α<β<1\alpha < \beta < 1 control the influence of edge types.

2. Design Concept Reproduction and DCG Derivation

2.1. Extraction and Reflection on Failures

Historical memes are processed through an ensemble of MLLMs (Doubao-1.5, GPT-4o, Qwen2.5VL, Gemini-Pro, InternVL3) under a “Prompt PHarmP_\mathrm{Harm}.” Cases with \geq three model failures (MfailM_\mathrm{fail}) are then analyzed via Qwen3VL, generating fail-reason nodes (NF\mathcal{N}_F), hierarchically classified by type.

Keypoints are summarized and appended to prompts, yielding an optimized classifier prompt (PHarmP'_\mathrm{Harm}).

2.2. From Fail-Tree to Compact DCG

For each fail-reason node NFN_F, Qwen3VL (Prompt PDP_D) is queried for:

  • Reproduction methods (NM\mathcal{N}_M) and logic (Nσ\mathcal{N}_\sigma), using targeted questions (e.g., replacement possibilities, rationale, and persistence of harm).
  • Alignment with visual cues and extraction of goals (NG\mathcal{N}_G).

Nodes and edges from multiple memes are aggregated into a raw DCG, subsequently pruned via the SVD procedure for compactness.

2.3. DCG Linearization for MLLM Input

Per-meme, a relevant DCG subgraph (GsubDCGG_{sub} \subseteq DCG') is linearized as a logical formula, such as:

NM1(NM2NM3)NG\mathcal{N}_{M_1} \land (\mathcal{N}_{M_2} \rightarrow \mathcal{N}_{M_3}) \rightarrow \mathcal{N}_G

This is rendered as stepwise natural-language guidance for downstream classifiers.

3. DCG-Guided Multimodal LLM Classification

3.1. Workflow

The classification pipeline comprises:

  1. Relevant-DCG Retrieval: Prompts (PFP_F, PDP_D) are used with an LLM to retrieve the subset of DCG’ nodes (types, methods, goals) relevant to a given meme, yielding GsubG_{sub}.
  2. Guidance Generation & Classification: GsubG_{sub} is linearized into ordered instructions (S1SnS_1 \rightarrow \ldots \rightarrow S_n). The meme and guidance constitute the input for an MLLM evaluated with prompt PHarmP'_\mathrm{Harm} to predict harmfulness.

3.2. Prompt-based Operation

RepMD does not introduce further trainable fusion layers or custom loss functions. Fusion of meme content and DCG-derived context occurs through standard cross-modal attention mechanisms within the base MLLMs, operating in a zero- or few-shot setup.

4. Quantitative Performance and Ablation Analysis

4.1. Datasets

  • Type-shifting: Revised GOAT-Bench (5 categories: Racism, Misogyny, Offensiveness, Sarcasm, Toxicity), $56,192$ DCG memes plus $7,626$ target memes.
  • Temporal-evolving: $2,000$ Twitter memes (2025 Q1–Q4; $400$ DCG + $100$ targets per quarter).
  • Expert-labeled (Cohen's κ=0.86\kappa=0.86).

4.2. Baselines

  • Few-shot SFT: Mod-Hate, RA-HMD
  • RAG-based: MIND
  • Vanilla MLLMs: Qwen2.5VL, GPT-4o, Doubao-1.5-Vision-Lite/Pro

4.3. Type-Shifting Results

Metric Vanilla +RepMD_ID +RepMD_OOD
F1 66.9% 76.7% 74.9%
Accuracy 61.4% 75.8% 74.2%
  • In-domain gains: +9.8 F1, +14.4 Accuracy.
  • Out-of-domain degradation: –1.8 F1, –1.6 Accuracy.
  • Baselines: Best RAG/SFT suffers ~19.5 points accuracy drop crossing domains.

4.4. Temporal-Evolving Results

Quarter Vanilla (F1/Acc) +RepMD_TF +RepMD_TE
Apr–Jun 51.3 / 50.0 60.2/60.0 67.4/65.0
Oct–Dec 67.4 / 65.0 83.5/80.0 82.4/78.0
  • In-quarter gain: +13.7 F1, +14.3 Accuracy.
  • Cross-quarter drop: –0.7 F1, –1.2 Accuracy.

4.5. Ablation Study

Component Removed Δ\DeltaAcc (ID) Δ\DeltaAcc (TE)
–VoteMLLMs –1.3 –4.0
–OptPrompt –3.3 –4.4
DCG\toTree –5.5 –8.8
–Retrieval –6.0 –9.2

SVD pruning is computationally more efficient than LLM pruning, which offers marginal TE improvements at 10–14×\times time cost per meme.

5. Human Evaluation and Qualitative Analysis

5.1. DCG Interpretability and Actionability

Likert-rated (1–5, N=6N=6 evaluators):

Criterion Fail-Tree DCG Δ
Relevance 3.1 4.3 +1.2
Correctness 3.2 4.1 +0.9
Actionability 2.9 4.0 +1.1
Uniqueness 3.0 4.2 +1.2
Explainability 1.8 4.4 +2.6

DCG scores 4.0\geq 4.0 for all criteria, with the most pronounced improvement in Explainability (+2.6).

5.2. Annotation Efficiency

  • Without DCG guidance: 45–60 s per meme.
  • With DCG guidance: 15–30 s (mean savings \sim25 s/meme).
  • Greatest time savings observed for Offensiveness and Sarcasm subtypes.

5.3. Case Studies

  • Implicit violence ("Minecraft Creeper + Ignition tool"): Vanilla MLLMs fail; DCG guidance (weapon-implied violence motif) enables correct detection.
  • Sarcastic "Stan abbreviation" memes: DCG highlights cultural subtypes, increasing correct Sarcasm detection.

6. Summary and Implications

RepMD introduces an explicit, explainable DCG framework, derived and pruned from historical MLLM failures, and operationalized via prompt-based, subgraph-guided input to base MLLMs. The approach consistently yields approximately +10 percentage point accuracy gains on in-domain harmful meme detection. These benefits degrade by only \sim2 points out-of-domain and confer mean annotation speedups of \sim25 seconds per instance, all without further model fine-tuning or additional fusion layers. The method demonstrates that making latent malicious design strategies explicit and actionable can measurably enhance both automation and human-in-the-loop evaluation in adversarial, dynamic content moderation (Jiang et al., 8 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RepMD.