- The paper introduces a score-based Riemannian metric that captures anisotropic geometry of data manifolds in diffusion models for effective geodesic interpolation.
- It leverages the Stein score function to compute optimized geodesic paths, outperforming traditional methods on synthetic and real image datasets.
- Empirical evaluations on Rotated MNIST and Stable Diffusion benchmarks demonstrate enhanced perceptual quality and improved metrics such as PSNR, SSIM, and FID.
Score-Based Riemannian Metrics for Data Manifold Exploration in Diffusion Models
Introduction
The paper presents a principled approach for characterizing and traversing the low-dimensional data manifolds implicitly learned by diffusion models. By introducing a score-based Riemannian metric, the authors formulate geometric tools such as a metric tensor and geodesics directly in ambient pixel or latent space. This framework avoids explicit manifold parameterization and leverages the Stein score function, extracted from a trained diffusion model, to adapt local geometry for manifold-aware operations. The methodology is evaluated across synthetic data, Rotated MNIST, and Stable Diffusion benchmarks, with quantitative and qualitative evidence showing improved fidelity and perceptual quality over baseline interpolation techniques.
Mathematical Framework
Stein Score Metric Tensor
The central mathematical contribution is the definition of the Stein score metric tensor:
g(x)=I+λ⋅s(x)s(x)T
where s(x)=∇xlogp(x) is obtained via the neural noise predictor in a diffusion model and λ>0 is a penalty parameter. This leads to anisotropic geometry that stretches ambient space in directions normal to the data manifold, heavily penalizing off-manifold transitions while leaving tangential movement near-unaffected. The metric is symmetric and positive definite by construction.
Optimization of Geodesics
Geodesic paths between two data points are computed by minimizing the energy functional:
E[γ]=21∫01(∥γ˙(τ)∥2+λ(s(γ(τ))⊤γ˙(τ))2)dτ
The authors discretize the curve, apply regularization for smoothness and monotonicity, and solve the resulting optimization using a Riemannian Adam variant. Score extraction and metric evaluation are performed at each discretized path segment, using the Sherman-Morrison formula for efficient matrix inversion. For extrapolation, they employ a momentum-guided walk that combines previous tangent direction and score guidance, controlled by hyperparameters ε and β.
Empirical Validation
Synthetic Manifold: Embedded Sphere
On synthetic 2-sphere and 50-sphere datasets embedded in high-dimensional pixel space, the method achieves geodesic approximation errors below 0.1% for λ≥1000, with score vectors aligning closely (angle ≈ 173°) to true manifold normals at appropriate diffusion timesteps. The geodesic interpolations remain on the manifold, preserving image structure, while linear and spherical interpolations rapidly depart from the manifold, causing blurring and artifacts.
Rotated MNIST
For Rotated MNIST, interpolation trajectories computed via the score-based metric accurately follow the true rotational trajectory, demonstrating that the diffusion model has learned the underlying manifold structure (group of digit rotations). The geodesic approach outperforms LERP, SLERP, and Noise Diffusion in perceptual metrics and pixel-wise measures:
- PSNR: Geodesic 14.98 vs. LERP 14.08
- SSIM: Geodesic 0.650 vs. LERP 0.578
Extrapolation experiments show the method can generate novel rotations, maintaining digit identity beyond observed examples—a key evidence of manifold-aware extrapolation.
Stable Diffusion and MorphBench
For high-resolution, natural image data modeled by Stable Diffusion, the geodesic framework applied in latent space yields smoother, perceptually coherent morphs as measured by LPIPS, FID, and KID:
- LPIPS: Geodesic 0.358 vs. LERP 0.361
- FID: Geodesic 140.6 vs. LERP 148.2
- KID: Geodesic 0.086 vs. LERP 0.094
Although LERP achieves slightly higher pixel-wise scores (PSNR/SSIM), the geodesic interpolations are visually cleaner and avoid composite artifacts, substantiating the claim that pixel-wise metrics can be misleading for perceptual quality in complex data.
Computational Considerations
The score-based geodesic algorithm incurs orders of magnitude higher computational cost compared to direct methods (e.g., LERP, SLERP), scaling linearly with the number of discretization points and requiring several hundred gradient steps. Despite this, the approach remains tractable for images up to 512×512 pixels (via latents), and the numerical stability is enhanced by regularizing the energy landscape and properly choosing diffusion timesteps.
Implications and Theoretical Impact
The introduced framework advances the theoretical bridge between probabilistic generative modeling and differential geometry by directly extracting anisotropic metrics from neural network-learned scores. This enables manifold-respecting interpolations and extrapolations, which are critical for tasks such as semantic editing, morphing, and data analysis. The approach reveals the implicit geometric structure captured in diffusion models, offering novel insights into how generative models encode distributional support and transformations beyond the reach of classical Euclidean methods.
Contradictory Evidence
Pixel-level metrics on complex images sometimes favor baseline methods due to the limitations of those metrics, but the geodesic method consistently excels in more robust, perception-oriented metrics. Additionally, the method demonstrates that score vectors reliably approximate inward-pointing normals to the manifold even in high-dimensional, real-image data—a hypothesis previously open to question.
Limitations and Future Directions
- Computational Complexity: Substantially higher than baselines; improvements in Riemannian optimization and neural surrogate geodesic prediction are suggested.
- Quantitative Extrapolation Metrics: Lack of ground truth for extrapolation hinders objective assessment.
- Manifold-Aware Operations: Future work could leverage the geometric approach for bias detection, manifold visualization, and real-time editing tools, possibly coupled with neural ODEs for acceleration.
Conclusion
The paper establishes a foundational approach for extracting and traversing the low-dimensional data manifolds encoded by diffusion models, by introducing a score-based Riemannian metric directly in pixel or latent space. The geodesic framework eschews explicit parameterization, achieves manifest improvements in interpolation quality, and offers new tools for both theoretical investigation and practical manipulation of generative models. The connections to general relativity, Riemannian geometry, and optimally efficient sampling highlight the interdisciplinary nature of the method and its potential for broad impact in model interpretability and controllable generation.