Retrieval-based Anomaly Detection (RAD)

Updated 6 February 2026

Retrieval-based Anomaly Detection (RAD) is a nonparametric, memory-driven method that uses similarity matching between test samples and anomaly-free exemplars.
It embeds data into a feature space via pretrained encoders and retrieves nearest neighbors to compute anomaly scores based on similarity gaps.
RAD applies across modalities like vision, logs, time series, and tabular data, achieving state-of-the-art performance in unsupervised and few-shot settings.

Retrieval-based Anomaly Detection (RAD) refers to a family of nonparametric, memory-driven anomaly detection methods that forgo traditional model fitting in favor of explicit similarity-based matching between test data and a repository of anomaly-free exemplars. These approaches operate across modalities—including vision, logs, time series, and tabular data—by retrieving nearest-neighbor representations in an embedding space and using these matches to compute anomaly scores. Recent research on RAD demonstrates that such memory-centric mechanisms can yield state-of-the-art performance in unsupervised, few-shot, and zero-shot anomaly detection regimes, often outperforming or theoretically subsuming reconstructor-based or classifier-based methods (Zhang et al., 30 Jan 2026, Xu et al., 31 Jan 2026, No et al., 2023, Thimonier et al., 2024, Maru et al., 2 Jun 2025, Pan et al., 2023).

1. Core Principles and Methodological Foundations

All retrieval-based anomaly detection methods share two defining elements:

Embedding and Memory Construction: An embedding function (often a frozen or pretrained encoder such as ViT, CLIP, BERT, or a deep tabular transformer) maps data samples into a feature space. A memory bank (or vector database) stores embeddings of anomaly-free (normal) data—at various granularities, including images, patches, time windows, or token-level logs (Zhang et al., 30 Jan 2026, Xu et al., 31 Jan 2026, No et al., 2023, Thimonier et al., 2024, Pan et al., 2023).
Retrieval and Scoring: During inference, each query (test) sample is embedded and matched against the memory to find one or more nearest neighbors according to a specified similarity metric (cosine, dot-product, attention, cross-correlation, token-level maxSim, etc.). The anomaly score is then computed as a function of the distance or similarity gap—lower similarity to any normal memory item increases the anomaly likelihood (Zhang et al., 30 Jan 2026, No et al., 2023, Pan et al., 2023, Maru et al., 2 Jun 2025).

This process can be implemented in various data regimes:

Pixel/patch (local) and image/window (global) for vision and timeseries (Zhang et al., 30 Jan 2026, Xu et al., 31 Jan 2026)
Token-level and sequence-level for log analysis (No et al., 2023, Pan et al., 2023)
Feature-vector and sample-level for tabular datasets (Thimonier et al., 2024)

2. Architectures and Retrieval Strategies

Vision and Multimodal RAD

In vision, RAD methods such as those in (Zhang et al., 30 Jan 2026) and (Xu et al., 31 Jan 2026) employ two-level memory architectures: global descriptors for image-level retrieval and patch-level (or pixel-level) representations for fine-grained localization. For example, a frozen backbone (ViT or CLIP) is used to extract multiscale features, which are stored per layer and per patch. In MRAD (Xu et al., 31 Jan 2026), both image-level and pixel-level feature-label pairs are stored, enabling retrieval-based anomaly classification as well as segmentation.

During inference, the query's global descriptor is first matched to select top-K candidate images or memory entries ("global retrieval"), followed by local matching of query patch descriptors to corresponding candidate regions ("spatially-conditioned patch retrieval"). Anomaly scoring then aggregates distance metrics (e.g., $1 - \max \cos(\cdot, \cdot)$ over candidate patches/layers) (Zhang et al., 30 Jan 2026, Xu et al., 31 Jan 2026).

Log and Textual RAD

For unstructured or semi-structured logs, RAPID (No et al., 2023) leverages pre-trained LLMs (e.g., BERT) to generate sequence- and token-level embeddings. Token-level late-interaction similarity ("maxSim") is computed between query and candidate normal logs, allowing the method to localize subtle, single-token deviations. To improve computational efficiency, a "core-set" of candidate normals is selected by fast CLS-token similarity before running full token-level similarity scoring (No et al., 2023).

RAGLog (Pan et al., 2023) applies transformer-based embedding plus LLM classification. Retrieved nearest-neighbor normal logs are supplied as context to an LLM prompt, which outputs a normal/abnormal judgment in a retrieval-augmented zero-shot QA configuration.

Time Series RAD

Retrieval-augmented time series foundation models (e.g., RATFM (Maru et al., 2 Jun 2025)) retrieve similar normal time windows (using normalized cross-correlation) for each test window, then adapt a pretrained forecasting model to condition on the retrieved example. The absolute error between the forecasted and observed values is used as the anomaly score.

Tabular Data RAD

In structured data, retrieval modules are incorporated into deep masked-feature autoencoding pipelines. As in (Thimonier et al., 2024), a transformer reconstructs masked features, optionally augmented with retrieval modules (KNN, attention-based) over the encoded batch. Sample-to-sample dependencies (not just within-feature) are exploited by mixing the target encoding with a weighted sum of nearest neighbor sample encodings, thereby improving anomaly localization power.

3. Mathematical Formulations and Scoring Functions

The mathematical basis for retrieval-based scoring is explicit in all domains.

Core Distance-Based Anomaly Score:

$S_\mathrm{ret}(z) = \min_{u \in \gamma} \|z-u\|$

where $z$ is the query embedding and $\gamma$ is the memory bank (Zhang et al., 30 Jan 2026).

For vision, patch-level scoring (after global filtering and neighborhood restriction):

$S_\ell(x,t) = 1 - \max_{z \in \mathcal{M}_\ell(x,t)} \langle z_t^{(\ell)}(x), z \rangle$

(Zhang et al., 30 Jan 2026)

For logs with token-level similarity:

$\mathrm{maxSim}(q, d) = \sum_{i=1}^{|E_q|} \max_{j=1}^{|E_d|} \cos\bigl(E_{q,i}, E_{d,j}\bigr)$

and

$s_\mathrm{ad}(q) = \min_{d \in \mathrm{CoreSet}(q)} [1 - \mathrm{maxSim}(q, d)]$

(No et al., 2023)

For time series, after retrieval and forecasting:

$\mathrm{AS}(x_t) = |\hat x_t - x_t|$

(Maru et al., 2 Jun 2025)

For tabular data:

Anomaly score is the masked-feature reconstruction error averaged over mask bank:

$\mathrm{AD}{\text{-}}\mathrm{score}(z) = \frac{1}{|M|} \sum_{m \in M} d(z^m, \phi_\theta(z^o; m, D_\mathrm{norm}))$

(Thimonier et al., 2024)

4. Theoretical Guarantees, Analysis, and Limitations

RAD offers important theoretical advantages compared to reconstructor-based anomaly detectors.

Upper-bound of Reconstruction Residuals: The canonical retrieval score $S_\mathrm{ret}(z)$ always upper-bounds reconstruction-residual scores when the same encoder and anomaly-free set are used. For any nonnegative $S_\mathrm{ret}(z) = \min_{u \in \gamma} \|z-u\|$ 0-Lipschitz anomaly score vanishing on $S_\mathrm{ret}(z) = \min_{u \in \gamma} \|z-u\|$ 1, $S_\mathrm{ret}(z) = \min_{u \in \gamma} \|z-u\|$ 2 (Zhang et al., 30 Jan 2026).
Non-expansiveness: The RAD anomaly score is non-expansive, with $S_\mathrm{ret}(z) = \min_{u \in \gamma} \|z-u\|$ 3. Thus, retrieval does not amplify benign perturbations, addressing the "fidelity–stability dilemma" in reconstructor-based approaches (Zhang et al., 30 Jan 2026).

Limitations include:

Memory Overhead: Storage grows with memory size and patch/sample granularity (Zhang et al., 30 Jan 2026, Xu et al., 31 Jan 2026, No et al., 2023).
Retrieval Cost: Nearest neighbor search, particularly at local/pixel level, introduces inference latency compared to single-pass methods (Zhang et al., 30 Jan 2026).
Semantic Gaps: For vision, large geometric or semantic shifts may yield false positives; in text/logs, unseen word compositions can degrade retrieval (No et al., 2023).
Scalability: For high-throughput log or streaming scenarios, hybrid pre-filtering or efficient vector-database indexing is often required (Pan et al., 2023, No et al., 2023).

5. Empirical Performance and Benchmarks

RAD achieves or surpasses state-of-the-art baseline metrics in multiple domains:

Modality	Benchmark	RAD Variant	Main Metric(s)	Performance	Top Comparative Baseline
Vision (MUAD)	MVTec-AD	RAD (Zhang et al., 30 Jan 2026)	Pixel AUROC	98.5%	Dinomaly: 98.4%
	VisA, Real-IAD,		Pixel F1-max	+2–10 pts over baselines	MambaAD, WinCLIP, IIPAD
	3D-ADAM			12.5 pts > Dinomaly	Dinomaly: 20.0 F1-max
Logs	BGL, Thunderbird	RAPID (No et al., 2023)	F1	0.9999 (BGL)	LogPal: 0.9900 (BGL)
		RAGLog (Pan et al., 2023)	F1	0.93 (Thund.)	LogPrompt: 0.38 (zero-shot)
Time Series	UCR Anomaly Archive	RATFM (Maru et al., 2 Jun 2025)	VUS-ROC/F1	76.1/13.2% (Time-MoE)	In-domain FT: 79.1/16.1%
Tabular	31 real-world/ODDS	att-bsim (Thimonier et al., 2024)	F1/AUROC	58.6%/84.4%	GOAD, DROCC, KNN, NPT-AD

Ablation studies universally highlight the importance of multi-level memories, core-set or neighborhood restriction, and attention-based aggregation (Zhang et al., 30 Jan 2026, Thimonier et al., 2024, No et al., 2023, Xu et al., 31 Jan 2026).

6. Variants, Ablations, and Practical Deployment

Retrieval-based frameworks admit multiple variants and optimization techniques:

Metric Learning and Prompt-Integration: In MRAD-FT and MRAD-CLIP, the retrieval metric is fine-tuned and prompt biases for CLIP are dynamically adjusted based on retrieved anomaly priors, leading to consistent 1–2 pt AUROC improvements even with minimal additional parameters (Xu et al., 31 Jan 2026).
Attention-Based Retrieval: Attention-bsim (learned metric) on transformer encodings robustly improves F1 and AUROC beyond static KNN or vanilla self-attention for tabular data (Thimonier et al., 2024).
Locality and Multi-Example Retrieval: Limiting retrieval to small spatial neighborhoods or expanding to multi-nearest-neighbors (k-NN) further improves both accuracy and efficiency (Zhang et al., 30 Jan 2026, Maru et al., 2 Jun 2025).
Post-processing: Moving average smoothing of anomaly scores (via FFT-estimated periodicity) suppresses false positives, especially in periodic signals (Maru et al., 2 Jun 2025).

Deployment considerations include vector database engine selection (FAISS, Pinecone), batching for log/text throughput, and core-set memory pruning or clustering to control latency/memory (No et al., 2023, Pan et al., 2023). Plug-and-play extensibility to unseen classes or new domains is a hallmark of RAD: updating memories with new normal samples suffices for adaptation (Zhang et al., 30 Jan 2026, No et al., 2023).

7. Future Directions and Open Challenges

Key research directions include:

Compact Memory and Fast Retrieval: Advancing memory efficiency via quantization, coreset sampling, and ANN search (Zhang et al., 30 Jan 2026).
Richer Feature Spaces: Learning or fine-tuning task-adaptive embedding spaces to improve separation (Xu et al., 31 Jan 2026, Thimonier et al., 2024).
Generalization to Multimodal and Multitask: Extending RAD to combine vision, text, time series, and other embeddings for joint anomaly reasoning (Pan et al., 2023).
Learning Multi-Example and Contextual Representations: Incorporating richer, multi-shot retrieval, context-aware scoring, and hierarchical memory for domains with higher semantic variability (Maru et al., 2 Jun 2025, Pan et al., 2023).
Hybrid Architectures: Grafting retrieval modules onto other deep anomaly detection paradigms (e.g., contrastive, diffusion, generative) (Thimonier et al., 2024).

Practical challenges remain in scaling memory and computation for very large or streaming datasets, and in handling heavy-tailed, distribution-shifting settings in production.

References: (Zhang et al., 30 Jan 2026, Xu et al., 31 Jan 2026, No et al., 2023, Thimonier et al., 2024, Maru et al., 2 Jun 2025, Pan et al., 2023)