- The paper extends the classical Tweedie formula to elliptical noise, linking Stein scores with the path-derivative of energy scores.
- It introduces an algebraic framework that unifies denoising, score estimation, and robust generative modeling under varying noise geometries.
- Empirical validation on multimodal benchmarks confirms improved calibration, noise-adaptivity, and performance over traditional Gaussian models.
Energy-Tweedie: Extending Tweedie's Identity to Elliptical Noise and Energy Score Connections
Introduction
This paper introduces a generalization of the classical Tweedie’s formula, connecting score estimation and denoising in the presence of a broad class of noise distributions—specifically, elliptical distributions, also known as energy models. The main contribution is a set of identities that link the Stein score of the noisy marginal not only with the conditional expectation (as in the standard Tweedie formula) but with the path-derivative of energy-scoring rules, offering a general and algebraically grounded framework. The identities support strict properness requirements and parameterization via the Mahalanobis distance, extending existing theoretical apparatus and providing tools for estimation, calibration, and generative modeling under heavy-tailed or anisotropic noise.
The core denoising context involves estimating a clean variable X from noisy observations Y=X+ϵ, with ϵ∼q for some location-equivariant noise family q. Traditionally, for Gaussian noise, the Tweedie formula links the posterior mean to the gradient of the log marginal likelihood, and methods such as score-matching and denoising autoencoders exploit this result.
The paper extends this machinery by:
- Generalizing Tweedie’s identity to elliptical distributions (i.e., energy-based models parameterized by centrally symmetric potentials over the Mahalanobis norm).
- Demonstrating that, for elliptical q, the Stein score sm(y) of the noisy marginal m(y)=(p∗q)(y) equals the negative path-derivative, with respect to y, of the posterior expectation of the noise potential:
$s_m(y) = -\nabla^{PD}_y~\EE_{X \mid Y=y}[V(\|y-X\|_{\Sigma^{-1}})]$
This result recovers, as special cases, the Gaussian mean-seeking regime (β=2) and the geometric-median-seeking regime for Laplace-like noise (β=1).
Additionally, for generalized Gaussian noise (parameterized by shape β and scale λ), the score formula incorporates nonlinearity in the residual, interpolating between mean and median behavior, and supporting heavy-tailed or robust denoising objectives.
Energy Score Identity and Its Consequences
A primary result is the identification of a new energy-score-based identity, which holds for generalized Gaussian (and more generally, for elliptical) noise:
sm(y)=−βλ∇yPD ESΣ−1,β(P(X∣Y=y),y)
where ESΣ−1,β is an energy score with Mahalanobis geometry and shape parameter β. This extends the classical connection between the marginal score and reconstruction loss gradients beyond MSE to a much broader set of strictly proper scoring rules. For the Gaussian case, this recovers the original Tweedie formula, as the energy score reduces to the MSE.
This relation has strong implications:
- Score Estimation: Provides a consistent method for estimating the Stein score via energy-score gradients evaluated on samples from any calibrated or strictly proper-trained denoising posterior model, obviating the need for closed-form densities.
- Parameter Estimation and Calibration: The algebraic identity allows for the estimation of noise parameters (β,λ,Σ) by minimizing calibration error between independently estimated scores.
- Noise-Adaptivity: The identity holds for arbitrary choices of noise parameters, enabling denoiser adaptation and flexible calibration for changing noise distributions.
- Geometry: The framework illuminates the relationship between noise distribution geometry (induced by Σ) and the energy landscape guiding denoising/vector fields.

Figure 1: Score fields at a fixed early denoising stage (σ=0.8) for Gaussian (mean-seeking) and generalized Gaussian (median-seeking) noise showcasing the influence of the residual weighting.
Denoising and Generative Modeling Applications
The established identities unify and generalize denoising functionals. The optimal denoiser under MSE (for the Gaussian) and MAE-like (for Laplace/generalized Gaussian) regimes are shown to be conservative vector fields generated by gradients of potential functions dependent on the energy score. These fields are self-adjoint in the appropriate geometry (e.g., Mahalanobis).
Critically, the results enable the development of energy-score diffusion models:
- Training: Models can be trained via matched (possibly heavy-tailed, anisotropic) energy score objectives at each time step, accommodating varying noise distributions.
- Generation: The sampling procedure leverages Monte Carlo gradient approximations of the energy score to compute the Stein score for each step, which is then used in standard SDE/ODE-based generative pipelines, analogous to denoising score matching.
As a result, diffusion models can now formally support non-Gaussian or robust noise schedules (with explicit score computation), and posterior samplers such as Engression or Distributional Principal Autoencoders can supply the denoising conditional, even when trained using energy or distributional losses.

Figure 2: Diffusion progress for both Gaussian (mean-seeking, left) and generalized Gaussian (median-seeking, right) noise models, tracking sample reconvergence from highly noised states to the target distribution.


Figure 3: Mean energy distance to the clean data through denoising progress, comparing Gaussian and generalized Gaussian models, showing distinctive convergence characteristics reflective of each noise model’s target statistic (mean/median).
Experimental Validation
Empirical evaluation is conducted on the Eight Gaussians dataset, a canonical multimodal benchmark. The energy-score identity is empirically validated by comparing analytic and Monte Carlo-based scores across noise levels for both Gaussian and generalized Gaussian noise. The experimental setup includes:
- Trained conditional models for each noise parameterization.
- Annealed Langevin sampling using (estimated) energy score gradients.
- Quantitative evaluation (MSE, cosine similarity) of the score field estimators, showing high directional accuracy across varying noise levels.
- Visualization of denoising vector fields illustrating the geometrical and statistical differences between mean and median-seeking regimes.



Figure 4: Comparison of estimated Stein score fields and ground truth for both noise models, highlighting the accuracy of MC-based estimation via energy-score differentiation.
Relation to Prior Work and Theoretical Outlook
This work unifies and extends classical Tweedie/Fisher identities from exponential-family/score-matching-based denoising and generative modeling to a more general class of noise, geometries, and scoring losses. New connections are established with modern developments in generative models (energy-based, distributional autoencoders), robust regression, and adaptive denoising. The derived identities go beyond previous approaches by enabling both analytic and sample-based score computations for arbitrary strictly proper scoring rules associated with the noise model, and by supporting parameter inference and calibration.
Potential future extensions include:
- Further development of heavy-tailed or robust generative models using energy score diffusion under arbitrary elliptical noise.
- Application to inverse problems, uncertainty quantification, or test-time adaptation under unknown or shifting noise.
- Exploration of connections to learned Riemannian metrics and geometric flows for data manifolds.
Conclusion
The paper delivers a significant analytical generalization of Tweedie’s formula, connecting denoising, Stein scores, and energy scoring rules under elliptical noise. The results provide tools for theory and practice, covering estimation, generative modeling, and calibration in challenging noise regimes—enabling, for the first time, principled score computation and diffusion modeling for non-Gaussian, robust, and geometrically structured noise.
These theoretical advancements are substantiated by experiments demonstrating both the accuracy and practical advantages of the proposed MC-based score estimation and noise-adaptive generative modeling—the latter showing distinctive behaviors for mean- versus median-seeking denoising and generative processes. The unified perspective developed here stands to inform future developments in robust, adaptive, and geometry-aware generative modeling.