Time-Annealed Perturbation Sampling (TAPS)

Updated 6 February 2026

TAPS is a framework that introduces time-varying, annealed perturbations into diffusion models to enhance sample diversity and maintain output fidelity.
It adapts controlled noise injection across applications—such as diffusion language models and inverse problems—to overcome mode collapse and improve posterior inference.
Empirical results show that TAPS outperforms traditional methods in both diversity metrics and quality measures while offering theoretical convergence guarantees.

Time-Annealed Perturbation Sampling (TAPS) refers to a family of algorithms for training-free, inference-time diversification or conditional sampling in generative diffusion models. The core principle is to introduce controlled, explicitly-annealed perturbations into the generative process, enabling increased sample diversity or efficient posterior inference while ensuring convergence to coherent, high-quality outputs. TAPS has been developed across different modalities and use cases, with notable implementations in diffusion LLMs (Diffusion-LMs) (Wu et al., 30 Jan 2026), posterior sampling for linear inverse problems (Xun et al., 30 Oct 2025), and equilibrium sampling of structured densities via progressive inference-time annealing (Akhound-Sadegh et al., 19 Jun 2025).

1. Motivation and Foundations

Diffusion-based generative models, whether applied to language or high-dimensional structured data, operate by iteratively denoising starting from noise toward a signal aligned with a conditioning prompt or distribution. Empirical studies in text diffusion models have revealed a temporal division of labor: early denoising steps establish global semantics, while later refinement stages polish lexical or local attributes. Standard decoding strategies often result in mode collapse, with repeated generations converging on similar global trajectories due to fixed conditioning throughout the denoising chain (Wu et al., 30 Jan 2026).

In conditional sampling scenarios such as inverse problems or energy-based models, direct posterior sampling is computationally intractable, and diffusion models in their standard forms are insufficiently flexible to handle hard constraints or multi-modal distributions, absent explicit annealing or perturbation schedules (Akhound-Sadegh et al., 19 Jun 2025, Xun et al., 30 Oct 2025).

TAPS deploys time-varying, typically monotonic perturbations to the conditioning or the underlying energy/measurement parameters. This strategy leverages two key insights:

Early, strong perturbations drive the generative process along divergent global (semantic or geometric) paths, thus increasing sample diversity or mixing across modes of the target density.
Gradually annealing the perturbation to zero (or low values) at later stages realigns the process to the true conditioning, ensuring fidelity, fluency, and adherence to instructions or data constraints.

Prior approaches to increasing diversity or posterior coverage, such as temperature scaling, top-k/p sampling, or truncation, inject randomness at each output step (token, pixel) but rapidly degrade fluency and coherence under aggressive settings. Prompt-based diversification is inconsistent and can induce semantic drift. Diffusion-specific entropy-driven techniques require careful tuning and may fail to avoid low-diversity or degenerate outputs (Wu et al., 30 Jan 2026).

2. Algorithmic Formulations

TAPS implementations across modalities share the general principle of temporally scheduled perturbation but differ in the source and application of noise:

2.1 Diffusion LLMs

In Diffusion-LMs, TAPS operates in the embedding space of the prompt:

Let $E \in \mathbb{R}^{T \times d}$ denote the prompt embeddings.
For diffusion time $t \in [0,1]$ , a perturbation schedule $\sigma(t)$ is defined, typically as a monotonic decay (e.g., cosine annealing), spanning an injection window $[t_\text{start}, t_\text{end}]$ .
The perturbed embedding at step $t$ : $\tilde{E}^{(t)} = E + \sigma(t)\epsilon$ , $\epsilon \sim \mathcal{N}(0,I)$ if $t_{\text{start}} \leq t \leq t_{\text{end}}$ , otherwise $E$ .
Three quality-preservation layers are sequentially applied: (1) batch mean/variance rescaling to match the original embedding statistics, (2) convex $\psi$ -mix interpolation between noisy and clean embedding, and (3) per-token norm projection.

Pseudo-algorithmic steps:

Initialize token sequence with prompt and masks; compute $E$ .
At each diffusion step, sample noise and perturb if within the scheduled window; apply preservation layers.
Input perturbed embedding into the LM denoiser at every step or block, producing new token completions.
Output the final decoded sequence.

The method is instantiated for both non-autoregressive (LLaDA-8B, 256 steps) and semi-autoregressive (TraDo-8B, 4 blocks) diffusion text models (Wu et al., 30 Jan 2026).

2.2 Posterior Sampling for Inverse Problems

Here, TAPS comprises a chain of short unadjusted Langevin dynamics (ULA) samplers, each targeting a smoothed, annealed posterior:

Linear inverse problem: $y = Ax + \xi$ , with Gaussian noise $\xi$ and a prior $p(x)$ .
Measurement noise parameter $\eta$ is decreased according to a schedule $(\eta_N < \eta_{N-1} < \dots < \eta_1)$ .
At each noise level, auxiliary measurements $y_i$ are constructed, and the conditional posterior $p(x|y_i)$ is the target at each level.
Posterior score: $\nabla_x \log p(x|y_i) = \nabla_x \log p(x) + \frac{A^T(y_i - Ax)}{\eta_i^2}$ .
At each level, a short ULA chain is run:

$X_i^{(k+1)} = X_i^{(k)} + \frac{h}{2} \widehat{s}_{\eta_i}(X_i^{(k)}) + \sqrt{h} Z^{(k)}, \quad Z^{(k)} \sim \mathcal{N}(0,I)$

The chain is initialized on unconditional samples from the prior for $X_0$ using a diffusion model.

The annealing ensures that each chain is warm-started near the target, requiring only local mixing and an $L^4$ bound on score estimation errors to ensure correct marginal coverage (Xun et al., 30 Oct 2025).

2.3 Progressive Inference-Time Annealing for Boltzmann Densities

PITA, classified as a TAPS algorithm, utilizes a sequence of diffusion models trained over a temperature ladder $(\beta_0 < \beta_1 < \ldots < \beta_N=1)$ . At each stage:

Diffusion models are trained on $\pi^{\beta_i}$ .
SMC/Feynman-Kac inference-time annealing is run to transition samples from $\pi^{\beta_i}$ to $\pi^{\beta_{i+1}}$ .
Stochastic differential equations governing propagation, weights, and normalization are derived from the Feynman-Kac PDE:

$dx_t = \left(-a_t x_t + \frac{\zeta_t^2}{2}\left[s_t(x_t) - \gamma \xi_t \nabla U_t(x_t)\right]\right) dt + \zeta_t \sqrt{\xi_t} dW_t$

$d\log w_t = g_t(x_t) dt$

Particles are resampled according to their normalized weights to accurately approximate each successive lower-temperature distribution (Akhound-Sadegh et al., 19 Jun 2025).

3. Theoretical Properties and Guarantees

The core theoretical contribution of TAPS in the context of posterior sampling is that it enables polynomial-time approximate sampling from the target posterior under only an $L^4$ bound on score estimation error, provided the prior (or the smoothed target densities in PITA) is (locally) strongly log-concave and has a Lipschitz score. The iterative annealing and re-anchoring at each level limit divergence from the manifold of valid solutions, in contrast with direct Langevin or reverse SDE sampling, which can fail under estimation error or drift "off-manifold" (Xun et al., 30 Oct 2025).

In Diffusion-LMs, the method exploits the "branch–refine" trade-off: early strong noise creates global semantic diversity across samples ("branching"), while the annealing schedule ensures convergence to instruction-adherent and fluent outputs. No formal convergence proofs are supplied in the text domain, but empirical ablations demonstrate that late-stage perturbation harms both diversity and quality, whereas early-annealed perturbation maximizes high-level variance with minimal quality loss (Wu et al., 30 Jan 2026).

4. Empirical Results and Benchmarks

Diffusion LLMs

Experimental evaluation of TAPS for generation diversity and quality utilizes several benchmarks:

NoveltyBench (Curated & WildChat): measures multi-sample diversity under open-ended instructions.
WritingPrompts & Arena-Hard-Auto: tests long-form story generation evaluated for lexical/semantic diversity, using GPT-4o or Skywork-Reward for scoring.
GSM8K: assesses multi-step mathematical reasoning, measuring Pass@1 and majority-vote accuracy.

Key findings for TAPS:

Outperforms baselines (top-k/p, EDT, prompt augmentation) by 5–15 points in diversity (IntraDistinct, Div-BLEU, Sent-BERT, EAD).
Matches or exceeds generation quality on fluency, relevance, and writing quality, with increased creativity.
On GSM8K, yields the highest majority-vote accuracy (+5–8 points), illustrating improved error diversity despite a slight decrease in Pass@1 (Wu et al., 30 Jan 2026).

Posterior and Boltzmann Sampling

Posterior Sampling (Xun et al., 30 Oct 2025):
- TAPS provably converges to the true posterior in total variation under polynomial time.
- The method avoids mixing-severity bottlenecks and can handle both globally and locally log-concave priors.
PITA (Akhound-Sadegh et al., 19 Jun 2025):
- Enables equilibrium sampling of molecular systems (e.g., Alanine dipeptide/tripeptides) with significantly improved energy and interatomic distance metrics compared to baseline MCMC and MD-diffusion schemes.
- Demonstrates that progressive annealing captures both major basins and interconversion transition dynamics in realistic physical systems.

5. Practical Recommendations and Limitations

Empirical studies yield concrete recommendations:

For Diffusion-LMs: $\sigma_\text{max} \approx 0.2$ , $\psi \approx 0.9$ , $t_\text{start} \approx 0.9$ , $t_\text{end} \approx 0.3-0.5$ . Both cosine and linear decay schedules are effective.
Tuning of temperature ladders and block lengths in structured densities must be adapted to model-specific constraints.
SMC resampling is central in physical density estimation to sustain effective sample size and avoid bias collapse (Akhound-Sadegh et al., 19 Jun 2025).

Principal limitations noted:

Uniform embedding-level noise disregards token saliency, motivating future token-wise or saliency-guided perturbation schemes.
High-temperature decoders may degrade quality unless coupled with quality-aware adaptive annealing.
In PITA, the need for energy-based models and temperature schedule selection increases storage and tuning complexity.
TAPS' guaranteed mixing relies on log-concavity, and theoretical bounds in non-log-concave or multimodal regimes are left open (Xun et al., 30 Oct 2025, Akhound-Sadegh et al., 19 Jun 2025, Wu et al., 30 Jan 2026).

6. Future Directions

Proposed research extensions include:

Adaptive or reward-driven annealing schedules responsive to sample quality metrics or downstream reward signals.
Token- or saliency-guided perturbation injection for more targeted diversity increases in LLMs.
Integration of TAPS with classifier-free guidance or advanced diffusion controls.
Scaling analysis and extension to multimodal, non-log-concave, or very high-dimensional generative diffusion backbones.
Joint optimization of conditional or multi-temperature score and energy models in structured regimes (Wu et al., 30 Jan 2026, Akhound-Sadegh et al., 19 Jun 2025).

TAPS strategies establish a general and theoretically grounded paradigm for enhancing diversity or sampling efficiency in diffusion generative models, with applications spanning creative text generation, conditional inference, and physical density simulations.

Markdown Report Issue Upgrade to Chat

References (3)

Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models (2026)

Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics (2025)

Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Time-Annealed Perturbation Sampling (TAPS).

Time-Annealed Perturbation Sampling (TAPS)

1. Motivation and Foundations

2. Algorithmic Formulations

2.1 Diffusion LLMs

2.2 Posterior Sampling for Inverse Problems

2.3 Progressive Inference-Time Annealing for Boltzmann Densities

3. Theoretical Properties and Guarantees

4. Empirical Results and Benchmarks

Diffusion LLMs

Posterior and Boltzmann Sampling

5. Practical Recommendations and Limitations

6. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Time-Annealed Perturbation Sampling (TAPS)

1. Motivation and Foundations

2. Algorithmic Formulations

2.1 Diffusion LLMs

2.2 Posterior Sampling for Inverse Problems

2.3 Progressive Inference-Time Annealing for Boltzmann Densities

3. Theoretical Properties and Guarantees

4. Empirical Results and Benchmarks

Diffusion LLMs

Posterior and Boltzmann Sampling

5. Practical Recommendations and Limitations

6. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research