DiffusionDrive: Truncated Diffusion Model
- DiffusionDrive is a truncated diffusion model that halts the forward process early to reduce computation while maintaining high generative fidelity.
- It integrates techniques like adversarial regularization, trajectory anchoring, and KL expansion to optimize reverse generation across various domains.
- Empirical results show significant speedups and competitive sample quality in tasks such as autonomous driving, medical imaging, and image generation.
A truncated diffusion model, often termed "DiffusionDrive" in the literature, refers to a class of generative models in which the standard forward diffusion process is halted after a small number of steps and the reverse generative process is run starting from this truncated state, rather than from a maximally random (pure noise) state. This paradigm, developed across multiple domains including probabilistic modeling, trajectory generation for autonomous driving, and medical image processing, retains generative fidelity while reducing computation and inference time. The concept unifies methods such as Truncated Diffusion Probabilistic Models (TDPM), anchor-based trajectory diffusion, truncated Karhunen-Loève expansions, and normalizing flow-based truncated reverse diffusion chains (Zheng et al., 2022, Liao et al., 2024, Ren et al., 22 Mar 2025, Dong et al., 2024).
1. Mathematical Foundations of Truncated Diffusion
Standard diffusion probabilistic models generate data by running a forward process that iteratively corrupts data with additive Gaussian noise over timesteps: resulting in a terminal distribution approximately for large . The reverse (generative) process parameterized by neural networks denoises from back to in steps (Zheng et al., 2022).
In truncated diffusion, the forward process is stopped at . Instead of diffusing to pure noise, the forward chain’s marginal at steps, , becomes the starting distribution. The generative process runs only reverse steps: where is a learnable/implicit distribution (often parameterized by a generator). The loss combines the standard denoising MSE for plus a divergence penalty matching and (Zheng et al., 2022).
2. Architectural Instantiations and Variants
Adversarially Regularized Truncation
The TDPM framework interprets the fixed forward diffusion encoder and reverse decoder as an adversarial autoencoder. An implicit generator (with latent ) produces samples at the truncated time, and a discriminator ensures aligns with (Zheng et al., 2022):
Trajectory Anchoring and Truncated Schedules in Driving
In DiffusionDrive for autonomous driving, the action space is partitioned using K-means anchors from trajectory data. Noising starts at each anchor to produce , and truncated reverse steps denoise these to generate diverse, scene-conditioned trajectories (Liao et al., 2024, Zou et al., 8 Dec 2025). A cascade diffusion decoder with cross-attention and feedforward modules processes the noisy trajectories in steps:
- Compute spatial/agent cross-attentions.
- Predict trajectory offsets and score .
- DDIM-style update for . Stacked layers refine trajectories across steps.
Truncated KL Expansion of the Forward Process
A distinct methodology replaces the Brownian-driven forward SDE in diffusion with a truncated Karhunen-Loève (KL) expansion: yielding an ODE with mode coefficients rather than i.i.d. Gaussian noise. Training under this forward dynamics accelerates convergence, improves FID, and enables highly parallelized computation (Ren et al., 22 Mar 2025). The DDIM sampler and U-Net remain unchanged, with only the loss reparameterization and noise reconstruction adapted for basis coefficients.
Flow-based Truncated Denoising
In flow-based truncation for medical super-resolution, the prior for is learned by an invertible flow , mapping latent variables to the truncated forward state. The generative process combines sampling via the flow and then running reverse steps with the score-based network (Dong et al., 2024).
3. Algorithmic Workflow
The canonical truncated diffusion sampling procedure is as follows (Zheng et al., 2022, Liao et al., 2024):
- Sample (or anchor in trajectory models).
- Obtain or initialize around prior anchor.
- For :
- Predict .
- Compute .
- Draw .
- Return (or trajectory).
For trajectory models, the decoder predicts both confidence scores and trajectory reconstructions, selecting the highest confidence output (Liao et al., 2024).
4. Comparative Performance and Computational Gains
Empirical results consistently demonstrate that truncated diffusion achieves similar or superior generative quality to full-chain diffusion, with substantial acceleration in inference:
- On CIFAR-10, TDPM with matches or improves full-DDPM FID (e.g., vs. baseline $3.21$) while reducing steps (Zheng et al., 2022).
- LSUN-, ADM: TDPM with nearly matches baseline FID at speedup.
- DiffusionDrive for planning achieves $88.1$ PDMS at $45$ FPS (4090 GPU), exceeding strong baselines with fewer anchors and only $2$–$3$ denoising steps (Liao et al., 2024).
- Flow-based truncation in MRSI improves PSNR/SSIM and achieves sampling acceleration: /slice vs. /slice for baseline DDPM (Dong et al., 2024).
These results validate that properly learning or anchoring the truncated prior allows order-of-magnitude reductions in sampling and reverse steps, with minor or no impairment to sample diversity and fidelity—a key advantage in latency-critical applications.
5. Domain-Specific Innovations and Extensions
End-to-End Autonomous Driving
DiffusionDrive integrates multi-mode anchor priors, joint conditional scene features, and cascade decoders to generate robust, high-diversity trajectory candidates in real-time (Liao et al., 2024). The method is further extended in DiffusionDriveV2, where reinforcement learning constraints (intra- and inter-anchor group-relative policy optimization, or GRPO) are used to constrain quality and avoid mode collapse, while scale-adaptive multiplicative noise retains trajectory smoothness and multimodality (Zou et al., 8 Dec 2025).
Medical Imaging
Flow-based truncated denoising allows for efficient, high-fidelity multi-scale super-resolution of MRSI, with uncertainty estimation, radiologist-rated improvements, and flexible sharpness controls (Dong et al., 2024).
General-Purpose Generation
The truncated KL expansion provides a principled, forward-process alternative, reducing the temporal noise complexity from to while remaining compatible with existing sampler and network architectures. This enhances parallelization and convergence speed, with significant FID gains on MNIST, CelebA, and CIFAR10 (Ren et al., 22 Mar 2025).
6. Implementation Considerations
Key practical aspects include:
- Choice of truncation step or mode number (in KL approaches): moderate values (e.g., or –$10$) usually suffice for high-quality outputs (Zheng et al., 2022, Ren et al., 22 Mar 2025).
- Approximating (and learning) the distribution of via an adversarial prior, flow, or Gaussian mixture anchored on domain priors.
- For trajectory generation, clustering for mode anchoring and cascading for decoder refinement.
- For parallelized KL approaches, all basis coefficients can be predicted in a batched forward pass.
7. Summary Table: Truncated Diffusion Model Variants
| Methodology | Truncated Model Type | Application Domain |
|---|---|---|
| Adversarial TDPM (Zheng et al., 2022) | Implicit prior + MSE | Image/Text-to-Image Gen. |
| Anchor + Cascade (Liao et al., 2024) | Anchored prior, truncated | Autonomous Driving |
| Flow-based FTDDM (Dong et al., 2024) | Flow prior, truncated UNet | Medical Imaging (MRSI) |
| KL Expansion (Ren et al., 22 Mar 2025) | Truncated basis expansion | General Image Generation |
All implementations demonstrate that carefully designed truncated diffusion schedulers—via learnable or anchored priors, architectural adaptation, or efficient forward process truncation—provide a favorable trade-off between sample quality, diversity, and efficiency compared to standard full-chain diffusion. This strategy enables strong results in computationally demanding or latency-sensitive generative tasks across disciplines.