Few-Anomaly Paradigm

Updated 7 February 2026

The Few-Anomaly Paradigm is a framework for anomaly detection that operates with few or no anomaly samples, integrating few-shot learning and one-class modeling.
It employs diverse methodologies including contrastive prompt learning, deviation-based losses, and generative diffusion to achieve high accuracy under minimal supervision.
Its unified, theoretically-informed approach delivers state-of-the-art results across various modalities such as images, text, graphs, and industrial data.

The Few-Anomaly Paradigm

The Few-Anomaly Paradigm refers to a set of approaches, architectures, and theoretical frameworks for anomaly detection that are data-efficient, explicitly designed to function under conditions where only a small number of anomaly examples (or, in extreme cases, only normal samples) are available at training time. These methods aim to provide robust and accurate detection, localization, and generation of anomalies despite scarcity of supervision, operating at the interface of few-shot learning, one-class modeling, domain adaptation, and anomaly-driven generative modeling. The paradigm spans modalities including images, text, graphs, video, and multimodal industrial data.

1. Problem Scope and Formalization

The Few-Anomaly Paradigm is characterized by settings where the anomaly sample pool is small (typically $k\leq 10^{1}-10^{2}$ per category or defect) or, in the one-class case, non-existent during training. It is agnostic to modality and encompasses:

One-Class Few-Shot Detection: Models trained with only a handful of normal examples, with the objective to distinguish anomalies at test time (e.g., PromptAD (Li et al., 2024), FoundAD (Zhai et al., 2 Oct 2025), CIF (Lin et al., 8 Nov 2025)).
Semi-Supervised Few-Anomaly Detection: Methods leveraging a small set of labeled anomalies, potentially augmented with a large normal or unlabeled pool (e.g., AE-SAD (Angiulli et al., 2023), FATE (Das et al., 2023), graph/multimodal approaches (Ding et al., 2021, Li et al., 9 Oct 2025)).
Few-Anomaly Generation: Diffusion-based and generative models trained with a handful of anomalies, aimed at producing realistic, diverse synthetic anomalies for data augmentation or downstream detection tasks (e.g., AnomalyDiffusion (Hu et al., 2023), GAA (Lu et al., 13 Jul 2025), AnoGen (Gui et al., 14 May 2025)).
Multimodal and Meta-Learning Extensions: Approaches exploiting modalities such as images, depth, text, or embedding transfer/structural priors, including meta-learning for rapid adaptation with minimal supervision (Sun et al., 2021, Li et al., 9 Oct 2025).

Problem formulations typically involve defining training pools $X_N$ (normal), $X_A$ (few anomalies), and learning scoring or generative functions $S: \mathcal{X} \rightarrow [0,1]$ or $\hat{x}: \mathcal{Z} \rightarrow \mathcal{X}$ that can distinguish or synthesize anomalies at test time, subject to tight sample constraints.

2. Core Methodologies

A range of methodologies have received empirical and theoretical support for their data efficiency and robustness:

a. Prompt-Based Contrastive Learning with Normal-Only Data

PromptAD (Li et al., 2024) implements contrastive prompt learning using CLIP-based visual and text encoders with no anomaly images. Synthetic negative prompts are created via semantic concatenation: learnable prompt tokens for each normal class are augmented with anomaly-related suffixes (manual or learnable). A combination of contrastive loss and explicit anomaly margin (EAM) ensures prompt features for normal and anomaly prompts are separated by a tunable margin, enhancing discriminability even with only normal samples. An alignment term further harmonizes synthetic and human-defined anomaly prompt representations.

b. Deviation-Based and Margin Losses

Deviation loss, originally developed in the text (FATE (Das et al., 2023)) and graph (GDN (Ding et al., 2021)) domains, models anomaly scores as z-deviations from a fitted normal prior (Gaussian), enforcing that labeled anomalies deviate by at least a specified margin. Normals are tightly clustered near the reference mean, anomalies are forced into the upper tail. This formulation permits efficient exploitation of a handful of anomalies (as few as one), regularizing the score space.

c. Autoencoder Contrast and Repulsion

Reconstruction error-based architectures (AE-SAD (Angiulli et al., 2023)) with contrastive augmentation use a small set of known anomalies to penalize their reconstructions, often by feeding transformations such as $F(x)=1-x$ as targets for anomalies. This sharpens separation between normal and anomalous reconstructions even in the presence of anomaly contamination.

d. Generative Diffusion with Embedding Control

Few-anomaly generation frameworks such as AnomalyDiffusion (Hu et al., 2023), GAA (Lu et al., 13 Jul 2025), and AnoGen (Gui et al., 14 May 2025) utilize latent or conditional diffusion models and learnable anomaly embeddings. Key mechanisms include disentanglement of anomaly appearance from spatial location via spatial anomaly embeddings, adaptive mask-guided loss, and multi-round anomaly-concept clustering. These methods can synthesize accurately aligned, high-fidelity image-mask pairs for downstream detection/segmentation.

e. Meta-Learning and Structural Priors

Meta-learning (MAML-style) is employed in settings where anomalies may differ in signature or domain across related tasks/networks (e.g., graph-level (Li et al., 9 Oct 2025), node-level/graph-level (Ding et al., 2021), cross-domain video (Sun et al., 2021)). Few-shot adaptation is enabled by learning initializations that are sensitized to anomaly-relevant features and can be rapidly tuned with sparse supervision.

3. Unified Theoretical Foundations

A central theoretical hypothesis, articulated in (Sadrani et al., 2024), is that anomaly-free manifolds in deep representation space are proximal to those of anomalous data for most real-world surface/material-structure categories. This proximity justifies:

The transferability of zero- or few-anomaly models pretrained on large normal datasets;
The efficacy of margin-/deviation-based losses, which can operate without exhaustive anomaly coverage;
The composition of joint losses combining representation learning with explicit regularization of anomaly-normal distances/divergences (e.g., Jensen-Shannon divergence, Wasserstein distance).

This supports the empirical finding that domain-specific pretraining followed by correlation-regularized fine-tuning with a handful of anomaly samples reliably improves detection performance, even in one-class or few-anomaly regimes.

4. Algorithmic Implementations and Architectures

Key architectures instantiated in the Few-Anomaly Paradigm include:

CLIP or foundation model backbones (PromptAD (Li et al., 2024), FoundAD (Zhai et al., 2 Oct 2025)): visual encoders remain frozen, with adaptation occurring via prompt tokens or nonlinear projectors on embedding spaces.
Autoencoder variants (AE-SAD (Angiulli et al., 2023)): encoders and decoders trained to reconstruct normals, with explicit repulsion or contrast incorporation of anomaly samples.
Meta-learned GNNs (GDN (Ding et al., 2021), MA-GAD (Li et al., 9 Oct 2025)): GNNs with (meta-)parameters optimized for rapid deviation-based discriminability and with graph condensation to improve data efficiency.
Diffusion models with learnable prompt/embedding control (AnomalyDiffusion (Hu et al., 2023), GAA (Lu et al., 13 Jul 2025), AnoGen (Gui et al., 14 May 2025)): exploit frozen large-scale priors, with minimal learnable tokens, for strong anomaly synthesis from sparse real data.
Hypergraph-enabled memory banks (CIF (Lin et al., 8 Nov 2025)): use hypergraph-based clustering of local patch embeddings to construct structurally-aware, compact memory representations for few-shot multimodal detection.

5. Empirical Results and Benchmarks

The paradigm yields state-of-the-art results across diverse domains, as substantiated by AUROC, AUPR, PRO, and classification accuracy on widely used benchmarks. Selected highlights include:

Method & Domain	Few-Shot Regime	Metric	Performance	Reference
PromptAD (image)	1-/4-shot	AUROC	94.6–96.6 on MVTec, 86.9–89.1 on VisA	(Li et al., 2024)
FoundAD (image)	1-shot	I-AUROC	96.1% MVTec-AD, 96.8% P-AUROC VisA	(Zhai et al., 2 Oct 2025)
AE-SAD (image/tabular)	s=8–128 samples	AUC	0.99 (MNIST), 18/23 ODDS datasets SOTA	(Angiulli et al., 2023)
FATE (text)	10–40 anomalies	AUROC	87.1 (20NG), 93.7 (AG News), 96.8	(Das et al., 2023)
AnomalyDiffusion (gen)	1–5 shots/class	AUROC	99.2 image, 99.1 pixel (localization)	(Hu et al., 2023)
GAA (gen)	3–5	Pixel-AUROC	96.3% (MVTec AD); Clf ACC: 84.7%	(Lu et al., 13 Jul 2025)
GDN/MA-GAD (graph)	1–8-shot	ROC-AUC	up to 0.991 (AIDS), 5–15pts over baselines	(Ding et al., 2021, Li et al., 9 Oct 2025)
CIF (multimodal)	1–4-shot	I-AUROC	72.0% (MVTec 3D-AD), +12.1pp over prior	(Lin et al., 8 Nov 2025)

Notably, many methods saturate or exceed performance of many-shot supervised models with only 1–4 anomaly or normal exemplars per class.

6. Limitations, Practical Considerations, and Research Directions

While the Few-Anomaly Paradigm offers substantial data efficiency and theoretical justification, certain open challenges and limitations are recognized:

Logical/complex anomalies: Methods based on synthetic prompt suffixes (e.g., PromptAD) may underperform on logical anomalies (part swaps/missing structures) not amenable to simple surface-text representations (Li et al., 2024).
Resolution and granularity: Extremely small defects or subtle out-of-distribution signals may remain undetected, especially when encoder tokenization or prompt architecture is coarse.
Domain shift: Success depends on the proximity of normal and anomaly manifolds; large domain gaps (e.g., in medical imaging) pose generalization risks, partially mitigated by domain-specific pretraining (Sadrani et al., 2024).
Hyperparameter sensitivity: Choice of loss weights, margins, and balance terms (e.g., $\lambda$ in deviation/correlation regularization) can affect few-shot performance.
Computational considerations: Large-scale generative models (diffusion) often freeze all weights except a small subset; memory- and compute-efficient deployment is an active area, especially in edge/industrial scenarios.
Unexplored modalities and tasks: Extensions to complex data such as point clouds, pure 3D, or cross-modal fusion remain under active investigation (Lin et al., 8 Nov 2025).

Future work emphasizes richer anomaly prompt mining, memory-efficient structure modeling, extension to continual/multi-class adaptation, and advanced meta-learning schedules.

7. Integrative Significance

The Few-Anomaly Paradigm unifies efforts across supervised/unsupervised anomaly detection, generative augmentation, transfer/meta-learning, and structure-based representation into an effective, scalable framework for detection, localization, and reasoning under profound data scarcity. This approach reduces dependence on large labeled anomaly corpora and is validated across vision, text, graph, and multimodal industrial applications. Its data-driven, theory-informed methodology is poised to become a standard for industrial and scientific anomaly detection settings where real anomaly events are rare but reliability is paramount (Li et al., 2024, Hu et al., 2023, Lin et al., 8 Nov 2025).