Conditional Diffusion Models for Time Series

Updated 5 February 2026

Conditional diffusion models for time series are generative frameworks that add noise and denoise data using conditioning signals like history and exogenous variables.
They employ a forward Markov process to perturb data and a neural network-parameterized reverse process to accurately reconstruct or forecast temporal patterns.
These models are applied in forecasting, imputation, anomaly detection, and synthesis, often outperforming traditional methods in practical time series tasks.

Conditional diffusion models for time series are a class of generative modeling paradigms that leverage a structured noising–denoising (diffusion) process, augmented by explicit conditioning information, to synthesize, impute, forecast, or analyze temporal data. These models invert a forward Markov process that gradually perturbs time-series data into noise, learning to reconstruct original sequences by integrating context signals (such as history, exogenous features, metadata, or partial observations) during the denoising process. This conditional design enables precise, context-aware generative modeling in highly structured, sequential domains including forecasting, imputation, anomaly detection, data augmentation, and simulation.

1. Mathematical Foundations of Conditional Diffusion for Time Series

Let $x_0 \in \mathbb{R}^{L \times d}$ denote a clean time-series segment and $c$ the conditioning context (e.g., observed anchors, historical windows, exogenous covariates). Conditional diffusion models consist of a two-stage process:

a) Forward (noising) process:

A (typically fixed) Markov chain adds Gaussian noise in discrete steps:

$q(x_t \mid x_{t-1}, c) = \mathcal{N}(x_t; \sqrt{1-\beta_t}\, x_{t-1}, \beta_t I)$

which has the closed form: $q(x_t \mid x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t} x_0, (1-\bar{\alpha}_t)I)$ where $\bar{\alpha}_t = \prod_{s=1}^t(1-\beta_s)$ . In most approaches, the forward process itself can be independent of $c$ , but several models (e.g., S²DBM (Yang et al., 2024), TimeBridge (Park et al., 2024), Diff-MTS (Ren et al., 2024)) also allow the noise schedule or endpoint to depend on conditioning information, forming a diffusion bridge.

b) Reverse (denoising) process:

A neural network parameterizes the reverse transition, usually through noise (score) prediction:

$p_\theta(x_{t-1} \mid x_t, c) = \mathcal{N}\left(x_{t-1};\, \mu_\theta(x_t, t, c),\, \sigma^2_\theta(x_t, t, c)I\right)$

with the noise-prediction parametrization (as per Ho et al.):

$\mu_\theta(x_t, t, c) = \frac{1}{\sqrt{1-\beta_t}}\left( x_t - \tfrac{\beta_t}{\sqrt{1-\bar{\alpha}_t}}\, \epsilon_\theta(x_t, t, c) \right)$

The conditional score function is $s_\theta(x_t, t, c) = \nabla_{x_t}\log p_\theta(x_t \mid c)$ .

c) Training objective:

The network is trained via conditional denoising score matching:

$L(\theta) = \mathbb{E}_{x_0, t, \epsilon}\big[ \| \epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t} \epsilon,\, t,\, c) \|^2 \big]$

Conditioning signals $c$ may be fixed in advance (as in imputation or forecasting) or generated dynamically (e.g., from partial observations, metadata, or context windows) (Yang et al., 2024).

2. Conditioning Mechanisms and Model Architectures

Conditional diffusion for time series diverges from unconditional generative diffusion by the explicit infusion of context information at every denoising step:

Condition sources: Past history (autoregressive or sliding windows), exogenous variables (weather, price, calendar), labels or metadata (location, asset type), partial observations (imputation masks), or structured context (graph topology, event sequences).
Injection schemes: Conditioning variables are injected either by concatenation with the noised input, via cross-attention mechanisms (as in U-Net or Transformer architectures), or via embedding networks whose outputs are broadcast at each denoising layer (Yang et al., 2024, Narasimhan et al., 2024, Shankar et al., 8 Mar 2025, Yang et al., 2024).
Architectural backbones: Modern approaches utilize temporal U-Nets (Yang et al., 2024), encoder–decoder transformers (with channel/time attention) (Yuan et al., 2024, Ma et al., 2024), graph neural networks for spatio-temporal scenarios (Yang et al., 2024), or structured state-space (S4) layers for long-range dependencies (Shankar et al., 8 Mar 2025).
Advanced parameterizations: Brownian bridge priors (S²DBM) and scale-preserving endpoints (TimeBridge) provide direct anchoring between conditional “start” and “end” points, reducing denoising stochasticity and improving stability (Yang et al., 2024, Park et al., 2024). Score-based SDEs with conditional drift/variance schedules further generalize beyond DDPMs.

Representative methods:

Model	Conditioning Modes	Special Features
CSDI	Observations, masks	2D attention (time×feat)
TimeDiff	History, future mixup, AR	Non-autoregressive, mixup
S²DBM	History (prior+encoder)	Brownian bridge
Diff-MTS	Exogenous, label encoding	Adaptive kernel-MMD
WaveStitch	Aux. feats, known values	Parallel "stitching"
TimeBridge	Trends, fixed-points	Diffusion bridge, scale
Time Weaver	Heterog. categorical/meta	Metadata tokenizers, J-FTSD
CCDM	History (channel-wise)	Channel-aware contrastive

3. Core Methodological Innovations

Conditional time series diffusion models have introduced several innovations to address key challenges in temporal generative modeling:

Conditional imputation: Masked (partial) observations are used as context, ensuring that imputations are consistent with known values. CSDI implements this through a masked conditional denoising network, providing state-of-the-art probabilistic and deterministic imputation accuracy (Tashiro et al., 2021).
Non-autoregressive conditional forecasting: TimeDiff employs split conditioning—autoregressive estimates and “future mixup”—to break the error accumulation of autoregressive decoders. This improves long-horizon fidelity and sample efficiency (Shen et al., 2023).
Temporal feature disentanglement: Diffusion-TS and DS-Diffusion explicitly decompose latent variables into trend, seasonality, and residual components, enabling enhanced interpretability and improved sample quality through hierarchical denoising (Yuan et al., 2024, Sun et al., 23 Sep 2025).
Contrastive conditioning: Methods such as CCDM (2410.02168) and MTSCI (Zhou et al., 2024) augment score-matching objectives with (i) contrastive intra- and inter-view consistency (masks, mixups), and (ii) InfoNCE-style variational mutual information regularization, which boosts OOD generalization and ensures distributional alignment between observed/imputed regions.
Domain adaptation and bridges: Cross-domain conditional diffusion (CD²-TSI (Zhang et al., 14 Jun 2025)) combines spectral priors, shared/branch-specific encoders, and output-level domain alignment to support adaptation between different data sources under high missing rates. TimeBridge bridges conventional diffusion with priors that preserve trends or hard constraints, making the prior endpoint data-aware (Park et al., 2024).

4. Principal Application Domains

Conditional diffusion models for time series have unlocked new SOTA performance and methodological flexibility for a spectrum of application domains:

Forecasting: By conditioning on observed history or exogenous covariates, models such as TimeDiff (Shen et al., 2023), UTSD (Ma et al., 2024), and S²DBM (Yang et al., 2024) generate multi-step, scenario-consistent forecasts, yielding improved accuracy and uncertainty quantification.
Imputation: CSDI (Tashiro et al., 2021), MTSCI (Zhou et al., 2024), and CD²-TSI (Zhang et al., 14 Jun 2025) employ conditioning on incomplete/masked data to reconstruct missing values, achieving 10–20% lower errors versus VAE, GP, and GAN-based baselines, and supporting block, arbitrary, or cross-domain missingness.
Synthesis under constraints: WaveStitch (Shankar et al., 8 Mar 2025) and Time Weaver (Narasimhan et al., 2024) support conditional synthesis under auxiliary or metadata constraints (e.g., region, year, class), with parallelized inference and compact categorical encoding for fast scenario generation.
Anomaly detection: DiffAD (Yang et al., 2024), ImDiffusion, and other conditional generators are trained to reconstruct or complete under context, with discrepancy (residual) metrics flagging outlier observations.
Causal and counterfactual generation: CaTSG (Xia et al., 25 Sep 2025) operationalizes the Pearl ladder (associational/interventional/counterfactual) in conditional diffusion, incorporating backdoor-adjusted score ensembles to generate under interventions or counterfactuals given observed scenes and hypothetical contexts.

5. Performance, Efficiency, and Practical Considerations

Conditional diffusion models offer several advantages, as well as novel trade-offs:

Advantages:

Task-specific generations that accurately reflect conditioning information (e.g., forecasts that clasp known regimens, imputations that exactly fit observed points) (Yang et al., 2024).
Flexibility to fuse multi-modal and heterogeneous context (continuous/categorical/meta) in Time Weaver, WaveStitch, and DS-Diffusion (Narasimhan et al., 2024, Sun et al., 23 Sep 2025).
Model-agnostic conditional inference: methods such as SemGuide (Ding et al., 3 Aug 2025) and TSDiff (Kollovieh et al., 2023) enable post hoc guidance without retraining the diffusion backbone.

Limitations and challenges:

Network complexity and training cost increase as conditioning becomes high-dimensional or involves attention between many features/steps (Yang et al., 2024, Narasimhan et al., 2024).
Sampling time in vanilla DDPMs is high (tens to thousands of steps); fast samplers (DDIM, hybrid SDE/ODE, parallel stitching/windowing as in WaveStitch) reduce computation by up to two orders of magnitude (Shankar et al., 8 Mar 2025).
Overfitting to specific conditioning types if context vectors are not regularized; cross-attention and contrastive regularization mitigate this risk (2410.02168).
Robustness to distribution shift and OOD generalization demands contrastive or domain-alignment losses (Zhang et al., 14 Jun 2025, 2410.02168).

Architecture/data‐modality alignment:

Data Type	Typical Model Backbone	Conditioning Integration
Univariate series	1D CNN/U-Net, Transformer	Concatenation, MLP
Multivariate	2D U-Net (time × feature), 2D Transformer	Channel-wise, Cross-attn
Graph/trajectory	GNN/Graph-attn U-Nets	Node/edge, topology

6. Recent Directions and Open Challenges

The evolution of conditional diffusion models for time series continues at a rapid pace. Key frontiers include:

Scalability: Reductions in sequential sampling steps (through fast solvers, non-autoregressive inference, parallel windowing), model distillation, and distributed sampling are critical for deployment in high-throughput and real-time domains (Shankar et al., 8 Mar 2025, Ma et al., 2024).
Prior-knowledge and constraint injection: Embedding domain-specific structural priors (e.g., physical conservation laws, spatial/temporal graph topology) directly into diffusion networks is an active area (Yang et al., 2024, Park et al., 2024).
Robust, adaptive conditioning: Methods to dynamically adapt conditioning mechanisms to OOD or dynamically shifting contexts are required for practical reliability (2410.02168, Zhang et al., 14 Jun 2025).
Multimodal and hierarchical conditionality: Integrating text, audio, images, and time series within a singular diffusion architecture remains open (Yang et al., 2024). Hierarchical denoising modules (as in DS-Diffusion, CHIME) show promise for long-range compositionality (Sun et al., 23 Sep 2025, Chen et al., 4 Jun 2025).
Interpretable and foundation models: New approaches (UTSD (Ma et al., 2024), DS-Diffusion (Sun et al., 23 Sep 2025)) target cross-domain foundation models with compact adapters or style-guided layers, aiming for high interpretability and universal coverage.
Causal, interventional, and counterfactual simulation: Incorporation of explicit structural causal modeling, as in CaTSG, will be crucial for reliable simulation, policy evaluation, and scientific discovery (Xia et al., 25 Sep 2025).

References

(Yang et al., 2024) A Survey on Diffusion Models for Time Series and Spatio-Temporal Data
(Tashiro et al., 2021) CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
(Shen et al., 2023) Non-autoregressive Conditional Diffusion Models for Time Series Prediction (TimeDiff)
(Yang et al., 2024) Series-to-Series Diffusion Bridge Model (S²DBM)
(Park et al., 2024) TimeBridge: Better Diffusion Prior Design with Bridge Models for Time Series Generation
(Shankar et al., 8 Mar 2025) WaveStitch: Flexible and Fast Conditional Time Series Generation with Diffusion Models
(Sun et al., 23 Sep 2025) DS-Diffusion: Data Style-Guided Diffusion Model for Time-Series Generation
(Zhang et al., 14 Jun 2025) Cross-Domain Conditional Diffusion Models for Time Series Imputation
(2410.02168) Channel-aware Contrastive Conditional Diffusion for Multivariate Probabilistic Time Series Forecasting (CCDM)
(Narasimhan et al., 2024) Time Weaver: A Conditional Time Series Generation Model
(Yuan et al., 2024) Diffusion-TS: Interpretable Diffusion for General Time Series Generation
(Zhou et al., 2024) MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation
(Ren et al., 2024) Diff-MTS: Temporal-Augmented Conditional Diffusion-based AIGC for Industrial Time Series
(Ding et al., 3 Aug 2025) Semantically-Guided Inference for Conditional Diffusion Models: Enhancing Covariate Consistency in Time Series Forecasting
(Xia et al., 25 Sep 2025) Causal Time Series Generation via Diffusion Models
(Ma et al., 2024) UTSD: Unified Time Series Diffusion Model
(Chen et al., 4 Jun 2025) CHIME: Conditional Hallucination and Integrated Multi-scale Enhancement for Time Series Diffusion Model

For a comprehensive review of conditional diffusion architectures for time series, including theoretical frameworks, taxonomy, representative methods, and future research challenges, see (Yang et al., 2024).