Skoltech Anomaly Benchmark (SKAB)

Updated 2 February 2026

SKAB is a benchmark dataset for industrial anomaly detection, offering precisely labeled, highly imbalanced multivariate time series data.
It simulates realistic process anomalies in a water-pump circuit using controlled valve-switch operations and eight sensor features.
Benchmark evaluations include AE, ConAE, and LSTM models enhanced by ensemble techniques, showing significant improvements in F1 and AUC metrics.

The Skoltech Anomaly Benchmark (SKAB) provides a standardized testbed for evaluating anomaly detection algorithms on industrial multivariate time series. It is characterized by precisely labeled, highly imbalanced temporal data streams originating from a controlled water-pump circuit environment, with a focus on realistic process anomalies. SKAB has become a key resource for benchmarking time series anomaly detectors, particularly in predictive maintenance and related industrial contexts (Iliopoulos et al., 2023).

1. Dataset Composition and Structure

SKAB consists of 34 multivariate time series with an aggregate length of approximately 37,401 time points, averaging roughly 1,100 observations per series. Each record at time $t$ includes a timestamp, a binary anomaly label (0 for normal, 1 for anomaly), and eight sensor features: Accelerometer1RMS, Accelerometer2RMS, Current, Pressure, Temperature, Thermocouple, Voltage, and VolumeFlowRateRMS. The dataset is engineered so that all sequences initiate under normal operational conditions; anomalies are later introduced via valve-switch operations within the water-pump circuit.

The target variable is a ground-truth binary label attached to each time point, which enables supervised, semi-supervised, and unsupervised algorithmic evaluation. Anomalies are rare, accounting for less than 5% of total samples, resulting in a class-imbalance scenario characteristic of real-world fault monitoring (Iliopoulos et al., 2023).

2. Anomaly Generation and Labeling Protocol

Anomalies in SKAB are induced through controlled valve-switching manipulations that create deviations in the water flow dynamics, leading to sensor readings departing from normality. The corresponding binary labels are annotated at each time point, providing granular temporal localization of anomalous intervals. This design ensures that each anomaly has a direct causal mechanism rooted in equipment operation, distinguishing SKAB from synthetic or random-noise-injected time series datasets.

Given that all series commence in a normal regime and anomalies are infrequent, SKAB is inherently imbalanced—a property that poses significant challenges for classic anomaly detection techniques and underpins the need for algorithmic innovations addressing both detection accuracy and robustness to minority-class events (Iliopoulos et al., 2023).

3. Data Preprocessing and Feature Engineering

The SKAB dataset contains no missing values or duplicates, minimizing preprocessing overhead and making it suitable for benchmarking out-of-the-box as well as customized workflows. Sensor readings are normalized to zero mean and unit variance prior to subsequent feature transformations or learning pipeline stages.

A common preprocessing step in algorithms evaluated on SKAB is the computation of reconstruction error for anomaly scoring, formalized as $ASc(X_t) = \| X_t - \hat{X}_t \|^2$ , where $X_t$ represents the sensor vector at time $t$ and $\hat{X}_t$ its reconstruction by a trained model. Binary anomaly decisions are typically made by thresholding the anomaly score using a robust threshold, often defined relative to the interquartile range (IQR) of scores on the training data, e.g., $\delta = 1.5 \cdot IQR$ (Iliopoulos et al., 2023).

4. Algorithmic Methodologies and Baselines

SKAB enables systematic comparison of diverse anomaly detection techniques, with established baselines including feedforward autoencoders (AE), convolutional autoencoders (ConAE), and Long Short-Term Memory (LSTM) networks. These methods differ in their capacity to exploit temporal and feature correlations:

Autoencoder (AE): A feedforward network encodes input windows $X_t \in \mathbb{R}^{d \times w}$ into a latent code and reconstructs $X_t$ ; anomaly scores derive from reconstruction loss $\| X_t - \hat{X}_t \|^2$ .
Convolutional Autoencoder (ConAE): Utilizes 1D convolutions to capture both local temporal and spatial feature dependencies, facilitating detection of collective and contextual anomalies occurring over short time horizons.
LSTM: Sequence-to-one architectures model temporal dependencies across sliding windows $X_{t-w+1:t}$ ; final hidden states are mapped to outlier scores, identifying anomalous subsequences.

Performance is further boosted through advanced ensemble strategies:

Feature Bagging: Individual learners are trained on randomly sampled subspaces (feature subsets of $N_m \sim Uniform[2, d-1]$ ), emphasizing anomalies present in only a few sensor channels, mitigating the curse of dimensionality.
Nested-Rotation via PCA: For each learner’s feature subset, features are partitioned and rotated using PCA-driven block-diagonal matrices, increasing diversity among classifiers by exposing them to decorrelated views of the data.
Ensemble Voting and Stacking: Aggregation of model predictions through majority vote or, in a semi-supervised regime, stacking with logistic regression on a held-out validation split (Iliopoulos et al., 2023).

5. Evaluation Protocols and Metrics

The SKAB benchmark employs per-series F1-score and Area Under the ROC Curve (AUC) as primary metrics, averaged across the 34 time series. The highly imbalanced class distribution makes these metrics particularly relevant; F1 quantifies the trade-off between precision and recall in anomaly identification, while AUC captures the discriminatory power of score-based predictions.

A summary of quantitative results is shown in the following table:

Model	F1	AUC
Convolutional AE (plain)	0.762	0.812
+ Feature Bagging	0.745	0.800
+ Feature Bagging + Rotation	0.787	0.832
LSTM (plain)	0.722	0.787
+ Feature Bagging + Rotation (LSTM)	0.701	0.770
Semi-supervised stacking (all arch.)	0.85	0.88

The best unsupervised method is Convolutional AE with feature-bagging and nested rotation (F1 ≈ 0.787, AUC ≈ 0.832), yielding 2–4% improvement over plain models. Semi-supervised stacking with logistic regression achieves F1 ≈ 0.85 and AUC ≈ 0.88, corresponding to ≈10% absolute F1 and ≥8% AUC improvements (Iliopoulos et al., 2023).

6. Analysis, Limitations, and Prospects

Algorithmic improvements observed on SKAB are attributed to the heterogeneity of anomaly manifestation across sensor channels, favoring techniques (e.g., feature bagging) that can isolate local anomalies. Nested rotations via PCA further enhance diversity and align model representations to dominant sensor variance axes. Ensemble methods—both voting-based and stacking—effectively pool complementary strengths of different architectures.

However, these advantages are counterbalanced by increased computational demand (due to the multiplicity of learners and PCA decompositions), sensitivity to hyperparameters (such as subspace size, number of learners, and aggregation thresholds), and the requirement for a labeled validation set in semi-supervised schemes. A plausible implication is that further scaling to higher-dimensional sensor suites or real-world deployments would require careful calibration of computational and labeling resources.

Future suggested directions for SKAB-based research include exploration of alternative aggregation strategies (e.g., veto-based voting), studying scalability of PCA, varying diversity parameters (number of blocks $K$ and subspace sizes), and examining the effectiveness of different projections (random projections, Independent Component Analysis) and non-linear meta-learners (Iliopoulos et al., 2023).

7. Significance and Context in the Literature

SKAB provides a realistic, controlled platform for comparative evaluation of anomaly detection methods on industrial time series, enabling reproducible benchmarks that reflect practical challenges such as input imbalance, local anomaly sparsity, and sensor intercorrelations. Methods validated on SKAB, such as ensemble-based deep neural architectures with feature subspace bagging and rotation, demonstrate empirical advances over single-model or single-space baselines.

The dataset’s careful curation—labeled anomalies with known physical origins, absence of missing data, and inclusion of multiple correlated sensors—has positioned it as a reference tool for both methodological development and critical analysis of time series anomaly detectors in industrial settings (Iliopoulos et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Detection of Anomalies in Multivariate Time Series Using Ensemble Techniques (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Skoltech Anomaly Benchmark (SKAB).