Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models

Published 19 Feb 2026 in stat.ML and cs.LG | (2602.17830v1)

Abstract: We study the estimation of time-homogeneous drift functions in multivariate stochastic differential equations with known diffusion coefficient, from multiple trajectories observed at high frequency over a fixed time horizon. We formulate drift estimation as a denoising problem conditional on previous observations, and propose an estimator of the drift function which is a by-product of training a conditional diffusion model capable of simulating new trajectories dynamically. Across different drift classes, the proposed estimator was found to match classical methods in low dimensions and remained consistently competitive in higher dimensions, with gains that cannot be attributed to architectural design choices alone.

Abstract PDF Upgrade to Chat

Summary

The paper’s main contribution is modeling drift estimation as a conditional denoising problem using diffusion models to overcome low Fisher information in high dimensions.
It employs a VPSDE-based forward process and Monte Carlo averaging with analytically derived coefficients to robustly approximate drift across varying noise levels.
Empirical results show that the denoising approach outperforms classical regression and neural baselines, especially in complex, coupled, and chaotic SDE systems.

Drift Estimation for SDEs Using Denoising Diffusion Models: A Technical Analysis

Problem Formulation and Motivation

The paper addresses the estimation of time-homogeneous drift functions in multivariate SDEs, assuming known diffusion coefficients, based on high-frequency observations from multiple i.i.d. trajectories. Classical approaches to drift estimation face serious challenges when $\Delta$ (the discrete time increment) is small due to the $\mathcal{O}(\Delta)$ scaling of drift versus the $\mathcal{O}(\sqrt{\Delta})$ noise term, resulting in vanishing per-observation Fisher information. This problem is exacerbated in high dimension ( $D$ ), where standard nonparametric regressions deteriorate rapidly, as convergence rates scale unfavorably (cf. Tsybakov 2009). The paper positions drift estimation as a conditional denoising problem, linking it directly to recent advances in denoising diffusion models (DDMs) and score-based generative modeling.

Methodological Advancements

The proposed estimator is derived by training a conditional diffusion model (CDM), which learns a denoising operator for increments of observed SDE trajectories. Formally, for each increment $Z_t = Y_{t+\Delta} - Y_t$ , the conditional law is modeled, and the core denoiser $\mathbb{E}[X_0 \mid X_\tau, Y]$ is estimated via neural networks. The forward process is modeled as a Variance-Preserving SDE (VPSDE), which ensures well-conditioned denoising objectives across noise levels.

The crucial estimator for drift $\mu(Y)$ , under an Euler-Maruyama approximation, is computed using Monte Carlo averaging across diffused increments at a fixed diffusion time $\tau=1$ : $\bar{\mu}(y) = \frac{1}{\mathcal{K}} \sum_{k=1}^{\mathcal{K}} a(1)x_1^{(k)} + b_\Delta(1) D_{\theta^*}(1, x_1^{(k)}, y)$ where $a(1)$ and $b_\Delta(1)$ are analytically derived coefficients, and $D_{\theta^*}$ is the trained denoiser network. Empirical results validate that setting $\tau=1$ is robust, facilitating fast and parallelizable Monte Carlo estimation and maintaining estimator performance versus choosing $\tau$ optimally for each sample.

Relation to Prior Work

The paper situates itself relative to classical kernel, penalized spline, and projection estimators for SDE drift estimation as well as recent deep learning-based approaches (e.g., [Zhao, Liu, Hoffmann 2025]). It criticizes existing neural approaches for their restrictive assumptions (e.g., separable drifts) and incomplete empirical evaluation in high dimension and for coupled, chaotic systems. The architecture leverages denoising score-matching equivalence under Gaussian perturbations (Vincent 2011), and models drift estimation as learning a conditional expectation of increments given highly noisy inputs, effectively regularizing the regression via multi-scale corruption.

This framework explicitly draws from advances in conditional diffusion modeling, emphasizing the flexibility in conditioning, feature embedding, and convolutional inductive biases. The paper also dissects how forward noise schedules (VPSDE vs. VESDE) impact estimator training and bias/variance trade-offs, demonstrating empirically that uniform training coverage across signal-to-noise ratios is critical.

Empirical Evaluation

Benchmarking is performed across multiple drift classes:

1D drifts: non-monotone, oscillatory, and cubic polynomial forms,
High-dimensional $\mu_4$ (bistable potential with optional cross-dimension coupling),
Chaotic drifts $\mu_5$ (Lorenz 96 system).

The denoising estimator ( $\mathrm{DN}$ ) is shown to match or outperform classical methods (Nadaraya-Watson, Hermite projections, Ridge regression) in 1D regimes. In higher dimensions, $\mathrm{DN}$ yields lowest in-sample MSE for both separable and coupled drifts, and generalizes more robustly out-of-sample than comparable feedforward regression baselines (even when controlling for network size and inductive bias). For strongly coupled and chaotic dynamics, denoising confers substantive gains in predictive stability and error growth control during extrapolation.

Architectural sensitivity is rigorously investigated: convolutional embedding is crucial for chaotic systems, but denoising remains competitive where standard regression is sufficient. Control baselines with matched capacities and feature embeddings unambiguously confirm denoising's impact cannot be explained by architecture alone.

Extensive ablation studies quantify robustness to i.i.d. path count, kernel bandwidth, spline order, noise schedule, and sampling frequency, demonstrating estimator stability and numerical efficiency.

Implications and Future Directions

Practically, the approach enables scalable, high-dimensional drift estimation in SDE settings without parametric restrictions or stringent regularity assumptions. The Monte Carlo estimator's structure is amenable to parallel computation and avoids integration over intractable transition densities.

Theoretically, denoising-based training is shown to enhance generalization in low-Fisher information regimes and for systems with strong inter-dimensional coupling or chaotic behavior. However, the paper leaves open the development of formal error bounds and bias-variance analyses for denoising drift estimators, particularly with respect to noise schedule optimization and non-Euler increments.

Future research should focus on the statistical properties of denoising drift estimators under alternate corruption processes and architectures (e.g., Transformers or more exotic sequence models), higher-order incremental schemes, adaptive noise schedules, and the integration of learned diffusion coefficients. There is also warranted interest in exploring denoising objectives for irregularly sampled, missing, or imputed time series as well as real-world applications in model-based reinforcement learning and scientific ML (e.g., physics-informed SDE learning).

Conclusion

The paper presents a principled and empirically validated methodology for drift function estimation in SDEs, leveraging conditional denoising diffusion models and Monte Carlo expectation approximation. This approach addresses core limitations of traditional estimators in high-dimensional, coupled, and chaotic systems, showing that architectural bias and denoising objectives can complement each other to achieve strong generalization. The results establish new baselines for high-dimensional drift recovery and highlight promising directions for theory and applications in generative modeling for stochastic processes (2602.17830).