Variance-Exploding SDEs: Theory & Applications

Updated 4 January 2026

Variance-Exploding SDEs are defined as drift-free stochastic processes with a time-dependent diffusion coefficient, forming the basis of generative diffusion models.
The ER-SDE framework employs a parameterized noise schedule via φ(σ), allowing interpolation between deterministic ODE solvers and fully stochastic SDE solvers.
Efficient VE ER-SDE solvers strike a balance between rapid sampling and high sample diversity, demonstrated by state-of-the-art empirical performance on benchmarks.

Variance-Exploding Stochastic Differential Equations (VE-SDEs) define a class of stochastic processes characterized by increasing variance over time and form a foundational component in generative diffusion models. In VE-SDEs, the forward stochastic differential equation is drift-free with a time-dependent diffusion coefficient, and the backward or reverse-time equation involves a score-based drift. Under the Extended Reverse-Time SDE (ER-SDE) framework, VE-SDEs admit semi-linear solutions with a parameterized noise-schedule, allowing interpolation between fully stochastic and deterministic (ODE) solvers. Approximate closed-form solutions, efficient solvers, and error analyses within this framework provide mathematical and practical insights into the speed, quality, and diversity of diffusion-based sampling algorithms (Cui et al., 2023).

1. Formal Definition and Structure of VE-SDEs

The Variance-Exploding SDE is defined by a forward equation of the form: $\mathrm{d}x_t = 0\cdot x_t\,\mathrm{d}t + \sqrt{\frac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}\,\mathrm{d}w_t, \qquad x_0 \sim p_0(x_0)$ where the drift $f(t, x) = 0$ , and the diffusion coefficient is $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ . The process is initialized from a data distribution $p_0(x_0)$ , and $w_t$ is the standard Wiener process.

For generative modeling tasks, the reverse-time SDE (specialized from Song et al. 2021 to the VE case) is given by: $\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t$ with reverse drift $f_{\text{rev}}(t, x) = -\frac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t} \nabla_x \log p_t(x)$ and $\bar{w}_t$ an independent Wiener process.

2. Solution Structure in the Extended Reverse-Time SDE Framework

The ER-SDE framework generalizes both SDE and ODE-based solvers by introducing an independent “reverse” noise scale $h(t)$ and substituting the learned data-prediction network $x_\theta(x_t, t)$ for the score term.

Switching to the noise-level parameter $f(t, x) = 0$ 0, the ER-SDE reads: $f(t, x) = 0$ 1 with $f(t, x) = 0$ 2, such that $f(t, x) = 0$ 3. Define

$f(t, x) = 0$ 4

with $f(t, x) = 0$ 5.

The exact solution for evolving $f(t, x) = 0$ 6 from $f(t, x) = 0$ 7 to $f(t, x) = 0$ 8 (Proposition 1) is: $f(t, x) = 0$ 9 where $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 0. In practice, the nonlinear integral is approximated using Taylor expansion.

3. Efficient VE ER-SDE Solvers: Algorithmic Construction

The first-order VE ER-SDE-Solver has the following update for each step: $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 1 where $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 2, and $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 3. Each update utilizes a single evaluation of $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 4.

Pseudocode for the algorithmic workflow (as described in Algorithm 1) is:

Step	Description	Details
1	Initialization	$g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 5 (initial noise sample)
2	For $g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 6	$g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 7
		$g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 8
		$g(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}$ 9
		drift $p_0(x_0)$ 0
		noise $p_0(x_0)$ 1
		$p_0(x_0)$ 2 drift $p_0(x_0)$ 3 noise
3	Output	$p_0(x_0)$ 4

The method is tunable via the choice of $p_0(x_0)$ 5, the number of steps $p_0(x_0)$ 6, and the schedule $p_0(x_0)$ 7. Higher-order versions reuse evaluations and add finite-difference corrections at the cost of additional network calls per step.

4. Local Discretization Error and Theoretical Analysis

Discretization error is governed by the First-Order Euler Integral (FEI) coefficient: $p_0(x_0)$ 8 which quantifies the dominant local one-step error: $p_0(x_0)$ 9 The minimum FEI is achieved by choosing $w_t$ 0, corresponding to the deterministic probability-flow ODE, resulting in the lowest discretization error among the ER-SDE family. Any larger $w_t$ 1 increases FEI and the corresponding global error for equal step-size.

A change of variable demonstrates that VE and VP ER-SDEs share the same FEI coefficient, establishing parity between these formulations for a given pretrained model and fixed number of function evaluations (NFE).

5. Stochasticity, Sample Quality, and Diversity

ODE-based samplers with $w_t$ 2 have minimal local error but no injected noise, leading to less sample diversity. Choosing $w_t$ 3 close to $w_t$ 4 but sufficiently large to inject controlled noise allows ER-SDE-based VE solvers to achieve near-ODE fidelity without sacrificing diversity. This stochasticity-efficiency tradeoff is central: VE-SDE solvers interpolate between pure SDE and ODE processes, balancing rapid low-NFE sampling with high sample quality and output variability.

6. Practical Considerations and Empirical Findings

VE ER-SDE-Solvers are parameterized by the noise-scale function $w_t$ 5, number of steps $w_t$ 6, and schedule mapping $w_t$ 7. First-order solvers require just one network evaluation per step. Advanced higher-order variants increase per-step cost for potentially improved empirical accuracy.

Empirical evaluation on the ImageNet $w_t$ 8 benchmark demonstrates that ER-SDE-Solvers attain state-of-the-art performance across stochastic samplers while maintaining the efficiency typical of deterministic samplers (e.g., $w_t$ 9 FID in $\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t$ 0 function evaluations) (Cui et al., 2023). This suggests that appropriate tuning of $\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t$ 1 enables simultaneous optimization of sample quality and computational efficiency.

7. Significance and Theoretical Summary

The ER-SDE framework unifies ODE and SDE sampling methodologies for VE-SDEs, providing a family of semi-linear solutions whose error and stochasticity are parametrically controlled by $\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t$ 2. The key theorem asserts that, among all extended reverse-time SDEs with a given drift-score model, the ODE ( $\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t$ 3) uniquely minimizes the local discretization error. A plausible implication is that careful functional choice allows constructing samplers that closely approach ODE performance while preserving the stochastic effects essential for output variability and model robustness (Cui et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Elucidating the solution space of extended reverse-time SDE for diffusion models (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variance-Exploding Stochastic Differential Equations (VE-SDEs).