Papers
Topics
Authors
Recent
Search
2000 character limit reached

STREAM-VAE: Streaming VAE for Anomaly Detection

Updated 26 November 2025
  • STREAM-VAE is a collection of VAE models that separate slow drift and fast spike dynamics to improve anomaly detection in streaming telemetry data.
  • The framework leverages a dual-path encoder with EMA filtering, attention mechanisms, and a Mixture-of-Experts decoder to achieve efficient real-time inference.
  • Adaptive extensions integrate a Dirichlet process prior for online clustering, enabling robust continual learning without catastrophic forgetting.

STREAM-VAE refers to a distinct set of Variational Autoencoder (VAE) architectures designed for time-series anomaly detection or streaming data clustering, with specific focus on separating multi-scale temporal dynamics in signals or adapting online to new clusters. This entry consolidates models termed STREAM-VAE and related streaming-VAEs across the literature, with emphasis on the dual-path, slow/fast-dynamics approach for telemetry anomaly detection (Özer et al., 19 Nov 2025) and the streaming adaptive Dirichlet process VAE for online clustering (Zhao et al., 2019).

1. Conceptual Foundations

STREAM-VAE architectures aim to overcome the limitations of conventional VAEs in the context of streaming, high-throughput, or highly non-stationary time-series data. Standard sequence VAEs typically encode all temporal variation (slow drift, abrupt spikes, regime changes) into a single latent process, resulting in the entanglement of heterogeneous dynamics and impaired anomaly or change-point detection. In contrast, the dual-path STREAM-VAE explicitly separates latent representations of slow (drift) and fast (spike) temporal dynamics, while streaming nonparametric VAEs combine VAEs with Bayesian nonparametrics for adaptive clustering under data streams (Özer et al., 19 Nov 2025, Zhao et al., 2019).

2. Dual-Path Architecture for Telemetry Anomaly Detection

The STREAM-VAE for vehicle telemetry anomaly detection (Özer et al., 19 Nov 2025) incorporates a dual-path encoder and a specialized decoder structure. The pipeline is as follows:

  • Windowed Input and BI-LSTM Encoder: Process input windows X=[x1,,xT]RT×FX = [x_1,\ldots,x_T] \in \mathbb{R}^{T \times F} with a two-layer bidirectional LSTM to extract features HEH_E.
  • Attention-Driven Dual Paths:
    • The slow-drift path applies an Exponential Moving Average (EMA) to HEH_E to estimate baseline drift, computes differences, and applies multi-head attention to yield AslowA_{\mathrm{slow}}.
    • The fast-spike path computes high-pass features HhpH_{\mathrm{hp}} by subtracting an EMA baseline, uses these as queries/keys for multi-head attention, producing AfastA_{\mathrm{fast}}.
  • Latent Process Splitting: Posterior Gaussians per time-step are split into slow and fast components via EMA filtering: zslow,t=EMAαs(z1:t)z_{\mathrm{slow},t} = \mathrm{EMA}_{\alpha_s}(z_{1:t}), zfast,t=ztzslow,tz_{\mathrm{fast},t} = z_t - z_{\mathrm{slow},t}.
  • Gated Fusion: The outputs AslowA_{\mathrm{slow}} and AfastA_{\mathrm{fast}} are fused per time-step with a learned gate, outputting context latents for decoding.
  • MoE Decoder and Event Residual: The decoder uses another Bi-LSTM and a Mixture-of-Experts head for the per-feature mean, with a single shared variance. An event-residual block, driven by the first difference in latent codes, explicitly models transients using gated soft-thresholding.
  • Generative Distribution: The likelihood pθ(xtzt)p_{\theta}(x_t | z_t) uses the fused mean and residual and a diagonal Gaussian covariance.

This architecture enables explicit modeling of both protracted drift phenomena and sharp, isolated deviations, yielding improved anomaly localization and interpretability in telemetry data (Özer et al., 19 Nov 2025).

3. Variational Objective, Inference, and Regularization

STREAM-VAE is trained by minimizing a composite objective:

L=Eqϕ(zX)[logpθ(Xz)]+βDKL(qϕ(zX)p(z))+λr1+η[HHMoE]+\mathcal{L} = -\mathbb{E}_{q_{\phi}(z|X)}[\log p_{\theta}(X|z)] + \beta\,D_{\mathrm{KL}}(q_{\phi}(z|X)\,\|\,p(z)) + \lambda\,\|r\|_1 + \eta[H^* - H_{\mathrm{MoE}}]_+

where:

  • The reconstruction loss is a negative Gaussian log-likelihood.
  • KL-divergence between posterior and prior controls the expressiveness of the latent space. The coefficient β\beta is dynamically adjusted (Control-VAE strategy) to maintain a target KL.
  • λr1\lambda\,\|r\|_1 enforces sparsity in the event residuals, discouraging explanations of smooth drift via sparse spike activity.
  • The entropy regularizer η[HHMoE]+\eta[H^* - H_{\mathrm{MoE}}]_+ prevents collapse in the MoE expert allocation, encouraging the use of multiple experts.

The explicit balancing of slow and fast latent dynamics, combined with regularization, enables the budgeted allocation of representational resources and prevents one latent path from absorbing both slow and fast phenomena (Özer et al., 19 Nov 2025).

4. Anomaly Scoring and Calibration

For anomaly detection, test windows are scored by their negative Gaussian log-likelihood under the trained model:

s(X)=logpθ(X)=t=1Tf=1F[(xt,fμ^t,f)22σ^t,f2+12log(2πσ^t,f2)]s(X) = -\log p_{\theta}(X) = \sum_{t=1}^T\sum_{f=1}^F \left[ \frac{(x_{t,f} - \hat\mu_{t,f})^2}{2\hat\sigma^2_{t,f}} + \frac{1}{2}\log(2\pi \hat\sigma^2_{t,f}) \right]

To ensure stable per-series thresholds, the framework fits a Peaks-Over-Threshold Generalized Pareto Distribution (GPD) to score distributions on normal data and analytically computes the alert quantile. Fleet-wide analytics leverage these fixed entity-level thresholds for consistent cross-vehicle comparison (Özer et al., 19 Nov 2025).

5. Deployment and Practical Considerations

STREAM-VAE is architected for high-throughput, real-time, and low-latency inference scenarios:

  • Sliding windows (length TT, stride 1) for streaming input.
  • EMA routines and gating offer computationally efficient, causal signal separation.
  • Pruned Bi-LSTMs and reduced attention heads ensure execution on MCUs or edge devices—e.g., ∼2.5 ms per 100-step window on commodity hardware, supporting sampling rates up to 400 Hz.
  • Deployment includes programmable thresholds per entity based on initial, calibration-stage normal data, and globally stable hyperparameters for all entities.

Multi-Query Attention (GQA) variants further optimize compute and memory usage (Özer et al., 19 Nov 2025).

6. Comparative Evaluation and Empirical Results

Experiments on synthetic and public datasets demonstrate consistent improvements in anomaly detection metrics over established baselines:

Dataset Oracle PA-F1 PA-F1 (POT thr.) AUC-PR AUC-ROC
Automobile Telemetry 0.857 0.794 0.532 0.755
SMD Benchmark 0.935 0.493 0.430 0.812

Baselines include GDN, TFT-Residual, VASP, VS-VAE, SIS-VAE, OmniAnomaly, MA-VAE, Anomaly Transformer, among others. On thresholded and F1-type metrics (suited for strict anomaly separation), STREAM-VAE is superior or on par with the best alternatives. Ablation studies confirm the necessity of each architectural component: removing the event residual, MoE, or either of the attention branches systematically degrades F1 and AUC-PR, with component-wise MSE analyses verifying the targeted separation of spike and drift explanations (Özer et al., 19 Nov 2025).

7. STREAM-VAE for Streaming Clustering and Continual Learning

Another STREAM-VAE line—termed Streaming Adaptive Nonparametric VAE or AdapVAE (Zhao et al., 2019)—adapts VAEs for nonparametric clustering in streaming data:

  • The model introduces a Dirichlet-Process Gaussian Mixture (DP-GMM) prior on latent representations, allowing dynamic birth and merging of clusters as the stream evolves.
  • Variational inference is performed via a mean-field posterior over continuous latents, discrete cluster assignments, stick weights, and DP parameters, with an evidence lower bound (ELBO) optimized per-batch.
  • Catastrophic forgetting is mitigated via generative replay: synthetic samples generated from the current model act as anchors when updating with new data batches.
  • The inference loop alternates between VAE stochastic gradient steps (updating network weights) and Expectation-Maximization-like DP mixture refinement (cluster assignment, param updates, birth/merge moves), updating priors and sufficient statistics in a strictly online fashion.

This method achieves adaptive, robust clustering on streaming data without revisiting previous examples, with hyperparameters and network architectures tuned per dataset demands (Zhao et al., 2019).


Together, STREAM-VAE frameworks represent a class of VAE-based models tailored for streaming environments—either for disentangled modeling of multi-scale temporal dynamics in anomaly detection, or for adaptive clustering with nonparametric Bayesian regularization—achieving robust empirical results in challenging online settings (Özer et al., 19 Nov 2025, Zhao et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to STREAM-VAE.