Papers
Topics
Authors
Recent
Search
2000 character limit reached

Rectified Gradient Guidance (REG)

Updated 23 January 2026
  • The paper introduces REG, a method that replaces the flawed scaled marginal guidance with a mathematically valid scaled joint objective to reduce guidance error.
  • REG employs a first-order approximation using local surrogate and analytic Jacobian corrections to improve fidelity across diverse conditional diffusion tasks.
  • Empirical results across synthetic and large-scale datasets show REG consistently reduces FID and boosts Inception Scores, demonstrating practical gains over traditional methods.

Rectified Gradient Guidance (REG) is a method for improving conditional generation in diffusion models, motivated by a rigorous re-examination of the statistical foundations underlying gradient-based guidance. REG replaces the empirically successful but theoretically flawed practice of “scaled marginal” guidance with an approximation to the mathematically valid “scaled joint” objective. This results in a lightweight, model-agnostic correction that reduces guidance error and improves conditional sample fidelity across diverse architectures and conditional tasks (Gao et al., 31 Jan 2025).

1. Theoretical Foundation and Motivation

Traditional diffusion model guidance methods (e.g., @@@@1@@@@, classifier-free guidance) typically operate by modifying the noise prediction ϵθ,t\epsilon_{\theta,t} at each reverse chain step using a reward term Rt(xt,y)R_t(x_t,y). The motivation is to encourage samples from a tilted conditional distribution,

pˉθ(x0y)pθ(x0y)R0(x0,y).\bar p_\theta(x_0\mid y) \propto p_\theta(x_0\mid y)\, R_0(x_0,y).

This is operationalized by the update

ϵˉθ,t=ϵθ,t1αˉtxtlogRt(xt,y).\bar\epsilon_{\theta,t} = \epsilon_{\theta,t} - \sqrt{1-\bar\alpha_t}\,\nabla_{x_t}\log R_t(x_t,y).

However, this “scaled marginal” guidance is mathematically inconsistent with the Markov structure of diffusion probabilistic models (DDPMs). Imposing arbitrary RtR_t at each step induces dependence between reverse transitions, making such marginal-tilting invalid except in trivial cases [(Gao et al., 31 Jan 2025), Appendix A.2].

The valid approach is to reweight the joint chain: pˉθ(x0:Ty)pθ(x0:Ty)R0(x0,y),\bar p_\theta(x_{0:T}\mid y) \propto p_\theta(x_{0:T}\mid y)\, R_0(x_0, y), where the reward modifies the terminal state only. This induces unique Markovian reverse transitions: pˉθ(xt1xt,y)=Et1(xt1,y)Et(xt,y)pθ(xt1xt,y),\bar p_\theta(x_{t-1}\mid x_t,y) = \frac{E_{t-1}(x_{t-1},y)}{E_t(x_t,y)}\, p_\theta(x_{t-1}\mid x_t, y), with Et(xt,y)=pθ(x0xt,y)R0(x0,y)dx0E_t(x_t, y) = \int p_\theta(x_0 \mid x_t, y)\, R_0(x_0, y) dx_0 and optimal prediction update

ϵθ,t(xt,y)=ϵθ,t(xt,y)1αˉtxtlogEt(xt,y).\epsilon^{\star}_{\theta,t}(x_t, y) = \epsilon_{\theta,t}(x_t, y) - \sqrt{1-\bar\alpha_t}\, \nabla_{x_t} \log E_t(x_t, y).

The intractable nature of Et(xt,y)E_t(x_t, y)—requiring “future foresight”—necessitates approximation. Existing practices can be seen as zeroth-order, foresight-free approximations, substituting RtR_t for EtE_t, with error bounds established under mild Lipschitz assumptions [(Gao et al., 31 Jan 2025), Theorems 4.2–4.3].

2. Derivation of REG: From Theory to Practical Update

Under deterministic samplers such as DDIM or Heun, Et(xt,y)E_t(x_t,y) can be reduced to evaluation of R0R_0 at the predicted x^0(xt)\hat x_0(x_t). Chain rule expansion provides

xtlogEt=x^0xtx0logR0(x^0,y).\nabla_{x_t} \log E_t = \frac{\partial \hat x_0}{\partial x_t} \nabla_{x_0} \log R_0(\hat x_0, y).

Substituting into the optimal update gives: ϵθ,t=ϵθ,t1αˉtx^0xtx0logR0.\epsilon_{\theta,t}^{\star} = \epsilon_{\theta,t} - \sqrt{1-\bar\alpha_t} \frac{\partial \hat x_0}{\partial x_t} \nabla_{x_0} \log R_0.

To operationalize this in all guidance regimes, three key approximations are adopted:

  • Local Surrogate: Replace x0logR0(x^0,y)\nabla_{x_0} \log R_0(\hat x_0, y) with xtlogRt(xt,y)\nabla_{x_t} \log R_t(x_t, y).
  • Analytic Jacobian: For DDPM, with

x^0=1αˉt(xt1αˉtϵθ,t),\hat x_0 = \frac{1}{\sqrt{\bar\alpha_t}}\left( x_t - \sqrt{1-\bar\alpha_t}\,\epsilon_{\theta,t} \right),

the Jacobian is

x^0xt=1αˉt[I1αˉtxtϵθ,t].\frac{\partial \hat x_0}{\partial x_t} = \frac{1}{\sqrt{\bar\alpha_t}}[I - \sqrt{1-\bar\alpha_t}\, \partial_{x_t} \epsilon_{\theta,t}].

  • Diagonal-Dominance: Discarding off-diagonal Jacobian terms, retain only elementwise scaling:

diag(xtϵθ,t)=xt(1Tϵθ,t).\operatorname{diag}\bigl(\partial_{x_t}\,\epsilon_{\theta,t}\bigr) = \partial_{x_t}(\mathbf{1}^T \epsilon_{\theta,t}).

The resulting REG correction is

ϵˉθ,tREG=ϵθ,t1αˉtxtlogRt(xt,y)  [11αˉt xt(1Tϵθ,t)]\boxed{ \bar\epsilon^{\rm REG}_{\theta,t} = \epsilon_{\theta,t} - \sqrt{1 - \bar\alpha_t}\, \nabla_{x_t} \log R_t(x_t, y)\ \odot\ \left[1 - \sqrt{1 - \bar\alpha_t}\ \partial_{x_t}\bigl(\mathbf{1}^T \epsilon_{\theta,t}\bigr)\right] }

where \odot denotes elementwise product.

This formulation generalizes the standard guidance update by weighting the gradient term by the diagonal deviation of the conditional score, providing a tractable, first-order approximation to the intractable optimal solution.

3. Practical Deployment: Implementation and Integration

REG is compatible with all major conditional diffusion guidance methods and requires only minor modifications:

Guidance Method xtlogRt(xt,y)\nabla_{x_t} \log R_t(x_t, y) Notes
Classifier guidance wxtlogpϕ(yxt)w\, \nabla_{x_t} \log p_\phi(y\mid x_t) ϕ\phi: auxiliary classifier
Classifier-free (CFG) w1αˉt(ϵuncondϵcond)\frac{w}{\sqrt{1-\bar\alpha_t}}(\epsilon_{\text{uncond}} - \epsilon_{\text{cond}}) Tune ww, no further hyperparams
  • Hyperparameters: Guidance scale w[1,20]w \in [1,20], typically in the 4–8 range. No custom tuning of schedule parameters α(t),β(t)\alpha(t), \beta(t) required.
  • Rectification overhead: Jacobian-vector product xt(ϵθ,t)\partial_{x_t}(\sum \epsilon_{\theta,t}) implemented with standard autograd in major frameworks, costing one extra backward pass per sampling step. Computational overhead is approximately 1.1×1.1\times runtime relative to uncorrected guidance.

Implementation snippet (PyTorch):

1
2
eps = model(x, t, y)
corr = 1 - sqrt(1 - abar[t]) * torch.autograd.grad(eps.sum(), x)[0]
Elementwise multiplication is handled via broadcasting.

4. Empirical Validation Across Tasks

REG has been evaluated extensively:

  • 1D Mixture task: REG’s guidance curve closely tracks the “oracle” (numerically integrated) logEt\nabla\log E_t, outperforming vanilla CFG in guidance error.
  • 2D Shape task: Across time steps and classes, REG wins on average per-pixel guidance error, e.g., at t=20t=20, Class1 60.2% : 39.8%, Class2 65.2% : 34.8%.
  • ImageNet class-conditional (256×256, DiT-XL/2):
    • No guidance: FID 9.62, IS 121.5
    • Vanilla CFG: FID 2.21, IS 248.4
    • CFG + REG: FID 2.04, IS 276.3
    • Cosine CFG: FID 2.30, IS 300.7
    • Cosine+REG: FID 1.76, IS 287.5
    • Interval CFG: FID 1.95, IS 250.4
    • Interval+REG: FID 1.86, IS 259.6

In all cases, REG reduces FID by 0.1–0.6 and increases IS by up to ∼30. Pareto-front analysis demonstrates REG induces a strict left-down (FID, IS) shift ((Gao et al., 31 Jan 2025), Figs. 4, 6).

  • COCO2017 Text-to-Image (SD-v1.4, SD-XL):
    • SD-v1.4 + van.CFG: FID 20.27, CLIP 30.68
    • + REG: FID 19.63, CLIP 30.75
    • ... (other configurations report similar monotonic improvements)

Qualitative samples show sharper detail and improved prompt adherence.

5. Comparative Assessment and Limitations

REG advances the theory and practice of diffusion guidance:

  • Establishes the invalidity of “scaled marginal” recipes in DDPMs.
  • Recovers a provably optimal (but intractable) “scaled joint” guidance strategy.
  • Demonstrates that all existing solutions (classifier guidance, CFG, AutoG, interval CFG) are zeroth-order, foresight-free approximations.
  • REG implements a diagonal-Jacobian correction, yielding a first-order, computationally feasible approximation.
  • Empirical gains span synthetic and large-scale datasets, reducing guidance error by 5–20%, FID by up to ∼0.6, and boosting IS by up to ∼30 points.

Limitations include:

  • Modest extra runtime (1.1×\sim 1.1\times), primarily from the additional gradient computation.
  • Full (non-diagonal) Jacobian correction remains intractable.
  • Open questions remain regarding the optimal handling of the initial noise prior p(xT)p(x_T) under joint scaling.

A plausible implication is that REG configurations may serve as a new “default” for diffusion guidance in both research and production.

6. Broader Implications and Future Directions

REG provides a mathematically coherent, easily deployed upgrade path for conditional guidance in DDPM and related architectures. Its theoretical foundation resolves longstanding discrepancies between practice and optimality, while its empirical benefits establish practical value across both synthetic control tasks and large-scale conditional generation.

Remaining open problems include the efficient computation of full Jacobian corrections and rigorous assessment of re-weighted initial priors. Extending REG to non-Markovian or alternative generative architectures may also yield further advances.

REG stands as a reference implementation for any Markov chain guidance regime seeking first-order optimality and tractable conditioning, independent of task, model family, or conditional modality (Gao et al., 31 Jan 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rectified Gradient Guidance (REG).