What's Inside Your Diffusion Model? A Score-Based Riemannian Metric to Explore the Data Manifold

Published 16 May 2025 in cs.LG and cs.CV | (2505.11128v3)

Abstract: Recent advances in diffusion models have demonstrated their remarkable ability to capture complex image distributions, but the geometric properties of the learned data manifold remain poorly understood. We address this gap by introducing a score-based Riemannian metric that leverages the Stein score function from diffusion models to characterize the intrinsic geometry of the data manifold without requiring explicit parameterization. Our approach defines a metric tensor in the ambient space that stretches distances perpendicular to the manifold while preserving them along tangential directions, effectively creating a geometry where geodesics naturally follow the manifold's contours. We develop efficient algorithms for computing these geodesics and demonstrate their utility for both interpolation between data points and extrapolation beyond the observed data distribution. Through experiments on synthetic data with known geometry, Rotated MNIST, and complex natural images via Stable Diffusion, we show that our score-based geodesics capture meaningful transformations that respect the underlying data distribution. Our method consistently outperforms baseline approaches on perceptual metrics (LPIPS) and distribution-level metrics (FID, KID), producing smoother, more realistic image transitions. These results reveal the implicit geometric structure learned by diffusion models and provide a principled way to navigate the manifold of natural images through the lens of Riemannian geometry.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a score-based Riemannian metric that captures anisotropic geometry of data manifolds in diffusion models for effective geodesic interpolation.
It leverages the Stein score function to compute optimized geodesic paths, outperforming traditional methods on synthetic and real image datasets.
Empirical evaluations on Rotated MNIST and Stable Diffusion benchmarks demonstrate enhanced perceptual quality and improved metrics such as PSNR, SSIM, and FID.

Score-Based Riemannian Metrics for Data Manifold Exploration in Diffusion Models

Introduction

The paper presents a principled approach for characterizing and traversing the low-dimensional data manifolds implicitly learned by diffusion models. By introducing a score-based Riemannian metric, the authors formulate geometric tools such as a metric tensor and geodesics directly in ambient pixel or latent space. This framework avoids explicit manifold parameterization and leverages the Stein score function, extracted from a trained diffusion model, to adapt local geometry for manifold-aware operations. The methodology is evaluated across synthetic data, Rotated MNIST, and Stable Diffusion benchmarks, with quantitative and qualitative evidence showing improved fidelity and perceptual quality over baseline interpolation techniques.

Mathematical Framework

Stein Score Metric Tensor

The central mathematical contribution is the definition of the Stein score metric tensor:

$g(\boldsymbol{x}) = \mathbf{I} + \lambda \cdot \mathbf{s}(\boldsymbol{x})\mathbf{s}(\boldsymbol{x})^T$

where $\mathbf{s}(\boldsymbol{x}) = \nabla_{\boldsymbol{x}} \log p(\boldsymbol{x})$ is obtained via the neural noise predictor in a diffusion model and $\lambda > 0$ is a penalty parameter. This leads to anisotropic geometry that stretches ambient space in directions normal to the data manifold, heavily penalizing off-manifold transitions while leaving tangential movement near-unaffected. The metric is symmetric and positive definite by construction.

Optimization of Geodesics

Geodesic paths between two data points are computed by minimizing the energy functional:

$\mathcal{E}[\gamma] = \frac{1}{2}\int_0^1 \left( \|\dot{\gamma}(\tau)\|^2 + \lambda(\mathbf{s}(\gamma(\tau))^\top\dot{\gamma}(\tau))^2 \right) d\tau$

The authors discretize the curve, apply regularization for smoothness and monotonicity, and solve the resulting optimization using a Riemannian Adam variant. Score extraction and metric evaluation are performed at each discretized path segment, using the Sherman-Morrison formula for efficient matrix inversion. For extrapolation, they employ a momentum-guided walk that combines previous tangent direction and score guidance, controlled by hyperparameters $\varepsilon$ and $\beta$ .

Empirical Validation

Synthetic Manifold: Embedded Sphere

On synthetic 2-sphere and 50-sphere datasets embedded in high-dimensional pixel space, the method achieves geodesic approximation errors below 0.1% for $\lambda\geq 1000$ , with score vectors aligning closely (angle $\approx$ 173°) to true manifold normals at appropriate diffusion timesteps. The geodesic interpolations remain on the manifold, preserving image structure, while linear and spherical interpolations rapidly depart from the manifold, causing blurring and artifacts.

Rotated MNIST

For Rotated MNIST, interpolation trajectories computed via the score-based metric accurately follow the true rotational trajectory, demonstrating that the diffusion model has learned the underlying manifold structure (group of digit rotations). The geodesic approach outperforms LERP, SLERP, and Noise Diffusion in perceptual metrics and pixel-wise measures:

PSNR: Geodesic 14.98 vs. LERP 14.08
SSIM: Geodesic 0.650 vs. LERP 0.578

Extrapolation experiments show the method can generate novel rotations, maintaining digit identity beyond observed examples—a key evidence of manifold-aware extrapolation.

Stable Diffusion and MorphBench

For high-resolution, natural image data modeled by Stable Diffusion, the geodesic framework applied in latent space yields smoother, perceptually coherent morphs as measured by LPIPS, FID, and KID:

LPIPS: Geodesic 0.358 vs. LERP 0.361
FID: Geodesic 140.6 vs. LERP 148.2
KID: Geodesic 0.086 vs. LERP 0.094

Although LERP achieves slightly higher pixel-wise scores (PSNR/SSIM), the geodesic interpolations are visually cleaner and avoid composite artifacts, substantiating the claim that pixel-wise metrics can be misleading for perceptual quality in complex data.

Computational Considerations

The score-based geodesic algorithm incurs orders of magnitude higher computational cost compared to direct methods (e.g., LERP, SLERP), scaling linearly with the number of discretization points and requiring several hundred gradient steps. Despite this, the approach remains tractable for images up to 512×512 pixels (via latents), and the numerical stability is enhanced by regularizing the energy landscape and properly choosing diffusion timesteps.

Implications and Theoretical Impact

The introduced framework advances the theoretical bridge between probabilistic generative modeling and differential geometry by directly extracting anisotropic metrics from neural network-learned scores. This enables manifold-respecting interpolations and extrapolations, which are critical for tasks such as semantic editing, morphing, and data analysis. The approach reveals the implicit geometric structure captured in diffusion models, offering novel insights into how generative models encode distributional support and transformations beyond the reach of classical Euclidean methods.

Contradictory Evidence

Pixel-level metrics on complex images sometimes favor baseline methods due to the limitations of those metrics, but the geodesic method consistently excels in more robust, perception-oriented metrics. Additionally, the method demonstrates that score vectors reliably approximate inward-pointing normals to the manifold even in high-dimensional, real-image data—a hypothesis previously open to question.

Limitations and Future Directions

Computational Complexity: Substantially higher than baselines; improvements in Riemannian optimization and neural surrogate geodesic prediction are suggested.
Quantitative Extrapolation Metrics: Lack of ground truth for extrapolation hinders objective assessment.
Manifold-Aware Operations: Future work could leverage the geometric approach for bias detection, manifold visualization, and real-time editing tools, possibly coupled with neural ODEs for acceleration.

Conclusion

The paper establishes a foundational approach for extracting and traversing the low-dimensional data manifolds encoded by diffusion models, by introducing a score-based Riemannian metric directly in pixel or latent space. The geodesic framework eschews explicit parameterization, achieves manifest improvements in interpolation quality, and offers new tools for both theoretical investigation and practical manipulation of generative models. The connections to general relativity, Riemannian geometry, and optimally efficient sampling highlight the interdisciplinary nature of the method and its potential for broad impact in model interpretability and controllable generation.

Markdown Report Issue