Papers
Topics
Authors
Recent
Search
2000 character limit reached

DiffusionDrive: Truncated Diffusion Model

Updated 17 February 2026
  • DiffusionDrive is a truncated diffusion model that halts the forward process early to reduce computation while maintaining high generative fidelity.
  • It integrates techniques like adversarial regularization, trajectory anchoring, and KL expansion to optimize reverse generation across various domains.
  • Empirical results show significant speedups and competitive sample quality in tasks such as autonomous driving, medical imaging, and image generation.

A truncated diffusion model, often termed "DiffusionDrive" in the literature, refers to a class of generative models in which the standard forward diffusion process is halted after a small number of steps and the reverse generative process is run starting from this truncated state, rather than from a maximally random (pure noise) state. This paradigm, developed across multiple domains including probabilistic modeling, trajectory generation for autonomous driving, and medical image processing, retains generative fidelity while reducing computation and inference time. The concept unifies methods such as Truncated Diffusion Probabilistic Models (TDPM), anchor-based trajectory diffusion, truncated Karhunen-Loève expansions, and normalizing flow-based truncated reverse diffusion chains (Zheng et al., 2022, Liao et al., 2024, Ren et al., 22 Mar 2025, Dong et al., 2024).

1. Mathematical Foundations of Truncated Diffusion

Standard diffusion probabilistic models generate data by running a forward process that iteratively corrupts data x0x_0 with additive Gaussian noise over TT timesteps: q(x1:Tx0)=t=1Tq(xtxt1),q(xtxt1)=N(1βtxt1,βtI)q(x_{1:T}|x_0) = \prod_{t=1}^T q(x_t|x_{t-1}), \quad q(x_t|x_{t-1}) = \mathcal{N}(\sqrt{1-\beta_t}x_{t-1},\, \beta_t I) resulting in a terminal distribution q(xT)q(x_T) approximately N(0,I)\mathcal{N}(0,I) for large TT. The reverse (generative) process parameterized by neural networks denoises from xTx_T back to x0x_0 in TT steps (Zheng et al., 2022).

In truncated diffusion, the forward process is stopped at TTT' \ll T. Instead of diffusing to pure noise, the forward chain’s marginal at TT' steps, q(xT)q(x_{T'}), becomes the starting distribution. The generative process runs only TT' reverse steps: pθ(x0:T)=pψ(xT)t=1Tpθ(xt1xt)p_\theta(x_{0:T'}) = p_\psi(x_{T'}) \prod_{t=1}^{T'} p_\theta(x_{t-1}|x_t) where pψ(xT)p_\psi(x_{T'}) is a learnable/implicit distribution (often parameterized by a generator). The loss combines the standard denoising MSE for t=1,,Tt=1,\ldots,T' plus a divergence penalty matching q(xT)q(x_{T'}) and pψ(xT)p_\psi(x_{T'}) (Zheng et al., 2022).

2. Architectural Instantiations and Variants

Adversarially Regularized Truncation

The TDPM framework interprets the fixed forward diffusion encoder q(xTx0)q(x_{T'}|x_0) and reverse decoder pθ(x0xT)p_\theta(x_0|x_{T'}) as an adversarial autoencoder. An implicit generator Gψ(z)G_\psi(z) (with latent zN(0,I)z \sim \mathcal{N}(0,I)) produces samples at the truncated time, and a discriminator DϕD_\phi ensures pψ(xT)p_\psi(x_{T'}) aligns with q(xT)q(x_{T'}) (Zheng et al., 2022): minψmaxϕExq(xT)[logDϕ(x)]+EzN[log(1Dϕ(Gψ(z)))]\min_\psi\max_\phi\, \mathbb{E}_{x\sim q(x_{T'})}\left[\log D_\phi(x)\right] + \mathbb{E}_{z\sim N}[ \log(1-D_\phi(G_\psi(z)))]

Trajectory Anchoring and Truncated Schedules in Driving

In DiffusionDrive for autonomous driving, the action space is partitioned using NancN_\mathrm{anc} K-means anchors from trajectory data. Noising starts at each anchor to produce τkTtrunc=αˉTtruncak+1αˉTtruncϵ\tau_k^{T_\mathrm{trunc}} = \sqrt{\bar\alpha^{T_\mathrm{trunc}}}\mathbf{a}_k + \sqrt{1-\bar\alpha^{T_\mathrm{trunc}}}\epsilon, and truncated reverse steps denoise these to generate diverse, scene-conditioned trajectories (Liao et al., 2024, Zou et al., 8 Dec 2025). A cascade diffusion decoder with cross-attention and feedforward modules processes the noisy trajectories in steps:

  1. Compute spatial/agent cross-attentions.
  2. Predict trajectory offsets Δτk\Delta\tau_k and score s^k\hat s_k.
  3. DDIM-style update for τki1\tau_k^{i-1}. Stacked layers refine trajectories across steps.

Truncated KL Expansion of the Forward Process

A distinct methodology replaces the Brownian-driven forward SDE in diffusion with a truncated Karhunen-Loève (KL) expansion: Wt(M)=n=1MZnϕn(t),ZnN(0,1)W_t^{(M)} = \sum_{n=1}^M Z_n \phi_n(t),\quad Z_n \sim \mathcal{N}(0,1) yielding an ODE with MM mode coefficients rather than i.i.d. Gaussian noise. Training under this forward dynamics accelerates convergence, improves FID, and enables highly parallelized computation (Ren et al., 22 Mar 2025). The DDIM sampler and U-Net remain unchanged, with only the loss reparameterization and noise reconstruction adapted for basis coefficients.

Flow-based Truncated Denoising

In flow-based truncation for medical super-resolution, the prior for xTtruncx_{T_\mathrm{trunc}} is learned by an invertible flow FϕF_\phi, mapping N(μz,σz2I)\mathcal{N}(\mu_z, \sigma_z^2 I) latent variables to the truncated forward state. The generative process combines sampling via the flow and then running TtruncT_\mathrm{trunc} reverse steps with the score-based network (Dong et al., 2024).

3. Algorithmic Workflow

The canonical truncated diffusion sampling procedure is as follows (Zheng et al., 2022, Liao et al., 2024):

  1. Sample zN(0,I)z \sim \mathcal{N}(0,I) (or anchor ak\mathbf{a}_k in trajectory models).
  2. Obtain xTGψ(z)x_{T'} \leftarrow G_\psi(z) or initialize around prior anchor.
  3. For t=T1t = T' \downarrow 1:
    • Predict ϵθ(xt,t)\epsilon_\theta(x_t, t).
    • Compute μθ(xt,t)\mu_\theta(x_t, t).
    • Draw xt1N(μθ(xt,t),β~tI)x_{t-1} \sim \mathcal{N}(\mu_\theta(x_t, t), \tilde\beta_t I).
  4. Return x0x_0 (or trajectory).

For trajectory models, the decoder predicts both confidence scores and trajectory reconstructions, selecting the highest confidence output (Liao et al., 2024).

4. Comparative Performance and Computational Gains

Empirical results consistently demonstrate that truncated diffusion achieves similar or superior generative quality to full-chain diffusion, with substantial acceleration in inference:

  • On CIFAR-10, TDPM with T=99T'=99 matches or improves full-DDPM FID (e.g., T=99,FID=2.88T'=99,\textrm{FID}=2.88 vs. baseline $3.21$) while reducing steps 10×10\times (Zheng et al., 2022).
  • LSUN-2562256^2, ADM: TDPM with T=99T'=99 nearly matches baseline FID at 10×10\times speedup.
  • DiffusionDrive for planning achieves $88.1$ PDMS at $45$ FPS (4090 GPU), exceeding strong baselines with 400×400\times fewer anchors and only $2$–$3$ denoising steps (Liao et al., 2024).
  • Flow-based truncation in MRSI improves PSNR/SSIM and achieves 9×9\times sampling acceleration: 1.33s1.33\,s/slice vs. 12.4s12.4\,s/slice for baseline DDPM (Dong et al., 2024).

These results validate that properly learning or anchoring the truncated prior allows order-of-magnitude reductions in sampling and reverse steps, with minor or no impairment to sample diversity and fidelity—a key advantage in latency-critical applications.

5. Domain-Specific Innovations and Extensions

End-to-End Autonomous Driving

DiffusionDrive integrates multi-mode anchor priors, joint conditional scene features, and cascade decoders to generate robust, high-diversity trajectory candidates in real-time (Liao et al., 2024). The method is further extended in DiffusionDriveV2, where reinforcement learning constraints (intra- and inter-anchor group-relative policy optimization, or GRPO) are used to constrain quality and avoid mode collapse, while scale-adaptive multiplicative noise retains trajectory smoothness and multimodality (Zou et al., 8 Dec 2025).

Medical Imaging

Flow-based truncated denoising allows for efficient, high-fidelity multi-scale super-resolution of MRSI, with uncertainty estimation, radiologist-rated improvements, and flexible sharpness controls (Dong et al., 2024).

General-Purpose Generation

The truncated KL expansion provides a principled, forward-process alternative, reducing the temporal noise complexity from TT to MTM \ll T while remaining compatible with existing sampler and network architectures. This enhances parallelization and convergence speed, with significant FID gains on MNIST, CelebA, and CIFAR10 (Ren et al., 22 Mar 2025).

6. Implementation Considerations

Key practical aspects include:

  • Choice of truncation step TT' or mode number MM (in KL approaches): moderate values (e.g., T=49T'=49 or M=8M=8–$10$) usually suffice for high-quality outputs (Zheng et al., 2022, Ren et al., 22 Mar 2025).
  • Approximating (and learning) the distribution of xTx_{T'} via an adversarial prior, flow, or Gaussian mixture anchored on domain priors.
  • For trajectory generation, clustering for mode anchoring and cascading for decoder refinement.
  • For parallelized KL approaches, all MM basis coefficients can be predicted in a batched forward pass.

7. Summary Table: Truncated Diffusion Model Variants

Methodology Truncated Model Type Application Domain
Adversarial TDPM (Zheng et al., 2022) Implicit prior + MSE Image/Text-to-Image Gen.
Anchor + Cascade (Liao et al., 2024) Anchored prior, truncated Autonomous Driving
Flow-based FTDDM (Dong et al., 2024) Flow prior, truncated UNet Medical Imaging (MRSI)
KL Expansion (Ren et al., 22 Mar 2025) Truncated basis expansion General Image Generation

All implementations demonstrate that carefully designed truncated diffusion schedulers—via learnable or anchored priors, architectural adaptation, or efficient forward process truncation—provide a favorable trade-off between sample quality, diversity, and efficiency compared to standard full-chain diffusion. This strategy enables strong results in computationally demanding or latency-sensitive generative tasks across disciplines.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DiffusionDrive: Truncated Diffusion Model.