Papers
Topics
Authors
Recent
Search
2000 character limit reached

Patch-SVDD for Anomaly Detection

Updated 7 February 2026
  • The paper introduces Patch SVDD, an unsupervised framework that achieves 0.921 detection and 0.957 segmentation AUROC on the MVTec AD benchmark.
  • It employs dual-scale hierarchical encoding with 32×32 and 64×64 patches, using a pull-together loss and context-prediction to generate robust, fine-grained representations.
  • The approach fuses patch-level anomaly scores for both localization and segmentation, balancing local clustering against feature informativeness for superior performance.

Patch SVDD (Patch-level Support Vector Data Description) is an unsupervised anomaly detection and segmentation framework for images, designed to enable fine-grained localization of anomalies by extending the deep learning variant of SVDD to operate at the patch level and incorporating a self-supervised learning component. The method achieves state-of-the-art results on the MVTec AD benchmark, substantially improving both anomaly detection and segmentation performance relative to prior approaches (Yi et al., 2020).

1. Mathematical Foundation

Patch SVDD evolves from classical SVDD, which seeks the minimal-volume hypersphere that contains the embeddings of normal samples in feature space. In the deep SVDD model, standard kernel mappings are replaced with neural encoders, typically yielding the objective:

LSVDD=i=1Nfθ(xi)c2L_{\text{SVDD}} = \sum_{i=1}^N \| f_\theta(x_i) - c \|^2

where fθ(x)f_\theta(x) is the learned embedding, and cc is the centroid of normal embeddings.

A naïve patch-based extension directly applies this principle to overlapping image patches; however, heterogeneity among patches (e.g., object, background, texture) leads to high intra-class variation, making a single-centroid approach unsuitable for dense anomaly localization.

Patch SVDD circumvents this by introducing a “pull-together” loss between spatially adjacent patches:

LSVDD=fθ(p)fθ(p)2L_{\text{SVDD}}' = \| f_\theta(p) - f_\theta(p') \|_2

where pp' is a neighbor of pp in the 3×3 grid, thereby encouraging locally similar patch embeddings to cluster without enforcing a global unimodal structure.

To ensure meaningful, non-collapsed representations, Patch SVDD integrates a self-supervised context-prediction task. Given two patches p,p2p, p_2 from the 3×3 neighborhood with ground-truth relative position y{1,...,8}y \in \{1, ..., 8\}, a classifier CϕC_\phi predicts yy from their embedding difference:

LSSL=CrossEntropy(y,Cϕ(fθ(p),fθ(p2)))L_{\text{SSL}} = \mathrm{CrossEntropy}(y, C_\phi(f_\theta(p), f_\theta(p_2)))

The total loss for training is:

LPatchSVDD(θ,ϕ)=λLSVDD(θ)+LSSL(θ,ϕ)L_{\text{PatchSVDD}}(\theta, \phi) = \lambda L_{\text{SVDD}}'(\theta) + L_{\text{SSL}}(\theta, \phi)

2. Architecture and Self-Supervision

The encoder fθf_\theta consists entirely of convolutional layers (no biases), followed by LeakyReLU activations with slope α=0.1\alpha = 0.1. Patch SVDD employs a two-level hierarchical encoder, summarized as follows:

  • "Small" encoder (fsmallf_{\text{small}}): receptive field 32, operates on 32×3232 \times 32 patches.
  • "Big" encoder (fbigf_{\text{big}}): processes 64×6464 \times 64 patches by subdividing into four 32×3232 \times 32 sub-patches, embedding each with fsmallf_{\text{small}}, and aggregating via concatenation and 1×11 \times 1 convolutions.
  • Embedding dimensionality D=64D=64.

The self-supervised classifier CϕC_\phi is a two-layer MLP with 128 hidden units, operating on embedding differences. To suppress spurious cues, color channels are randomly perturbed during patch sampling.

3. Training Methodology

Training is conducted exclusively on normal images resized to 256×256256 \times 256:

  • For fbigf_{\text{big}}, extract overlapping 64×6464 \times 64 patches (stride 16).
  • For fsmallf_{\text{small}}, extract 32×3232 \times 32 patches (stride 4).
  • Each iteration samples a patch pp, a neighbor pp' (for LSVDDL_{\text{SVDD}}'), and another local patch p2p_2 (for LSSLL_{\text{SSL}}).
  • Optimization uses Adam (β1=0.9,β2=0.999\beta_1=0.9, \beta_2=0.999), learning rate $1e-4$, batch size 256, and 50 epochs.
  • No geometric augmentations (e.g., flips, rotations) are used.
  • The hyperparameter λ\lambda is selected per class; smaller values ($0.1-0.5$) for object classes, larger values (1\geq 1) for textures.

4. Anomaly Detection and Segmentation Pipeline

After training, all normal patches from the training set are encoded and stored for nearest-neighbor searches, forming databases SbigS_{\text{big}} and SsmallS_{\text{small}}.

For a test image:

  • Overlapping patches are extracted with the same strides and embedded via both encoders.
  • Patch anomaly score is computed as Apatch(p)=minhSf(p)h2A^{\text{patch}}(p) = \min_{h \in S} \| f(p) - h \|_2, independently at small and big scales.
  • Pixel-level anomaly maps MbigM_{\text{big}} and MsmallM_{\text{small}} are derived by distributing patch scores to pixels and averaging over coverage. They are fused via element-wise multiplication: Mmulti=MbigMsmallM_{\text{multi}} = M_{\text{big}} \odot M_{\text{small}}.
  • The global image anomaly score is Aimage(x)=maxi,jMmulti(x)i,jA^{\text{image}}(x) = \max_{i, j} M_{\text{multi}}(x)_{i, j}.

This design supports both image-level and fine-grained segmentation ROC analyses.

5. Experimental Results

Patch SVDD was evaluated on the MVTec AD dataset (15 industrial classes, comprising both objects and textures). Metrics are per-class AUROC for both detection and segmentation.

Method Detection AUROC Segmentation AUROC
Deep SVDD (ICML ’18) 0.592
GEOM (NeurIPS ’18) 0.672
GANomaly (ACCV ’18) 0.762
ITAE (arXiv ’19) 0.839
Patch SVDD (ours) 0.921 0.957
L₂-AE 0.804
SSIM-AE 0.818
VE-VAE (CVPR ’20) 0.861
VAE Proj (ICLR ’20) 0.893

Patch SVDD achieves +9.8% detection and +7.0% segmentation AUROC improvement over the prior best entries.

6. Functional Analysis and Ablation Insights

  • Replacing the single-center loss with "pull-together" (LSVDDL_{\text{SVDD}}') improves AUROC, and further addition of the context-prediction (LSSLL_{\text{SSL}}) yields the highest performance.
  • Ablation studies indicate that object classes, characterized by high intra-class patch variation, benefit more from self-supervised context prediction, while texture classes are less sensitive to this term.
  • Feature visualizations (t-SNE) show single-modality clusters without LSSLL_{\text{SSL}}, and semantic, multi-modal clusters when both losses are used. The lowest intrinsic feature dimension occurs when using both components.
  • Hierarchical multi-scale encoding (combining 64×6464 \times 64 and 32×3232 \times 32 scales) outperforms either scale alone and single-level multi-branch architectures, suggesting that shared sub-encoders induce regularization and an inductive bias.
  • The selection of λ\lambda balances local clustering against feature informativeness; optimal values differ per class.
  • Embedding dimensionality exhibits diminishing returns beyond D=64D=64.
  • Surprisingly, nearest-neighbor anomaly detection on random CNN features or even raw pixel patches can suffice for certain classes, as for one-layer convolutions, Euclidean distance in feature and pixel space are closely related.

7. Limitations and Potential Extensions

Patch SVDD requires maintaining large databases of normal patch embeddings and performing approximate nearest-neighbor search at inference time, which can be both memory- and compute-intensive. Hyperparameter tuning of λ\lambda is manual and dataset-dependent. Extending the approach to additional scales, or incorporating further self-supervised objectives (e.g., rotation or jigsaw prediction), are plausible routes for increased robustness and generalization (Yi et al., 2020).

Code and pretrained models are publicly available, facilitating adaptation and reproduction for diverse anomaly detection pipelines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Patch-SVDD for Anomaly Detection.