Riemannian AmbientFlow: Towards Simultaneous Manifold Learning and Generative Modeling from Corrupted Data

Published 26 Jan 2026 in cs.LG, math.DG, math.OC, and math.ST | (2601.18728v1)

Abstract: Modern generative modeling methods have demonstrated strong performance in learning complex data distributions from clean samples. In many scientific and imaging applications, however, clean samples are unavailable, and only noisy or linearly corrupted measurements can be observed. Moreover, latent structures, such as manifold geometries, present in the data are important to extract for further downstream scientific analysis. In this work, we introduce Riemannian AmbientFlow, a framework for simultaneously learning a probabilistic generative model and the underlying, nonlinear data manifold directly from corrupted observations. Building on the variational inference framework of AmbientFlow, our approach incorporates data-driven Riemannian geometry induced by normalizing flows, enabling the extraction of manifold structure through pullback metrics and Riemannian Autoencoders. We establish theoretical guarantees showing that, under appropriate geometric regularization and measurement conditions, the learned model recovers the underlying data distribution up to a controllable error and yields a smooth, bi-Lipschitz manifold parametrization. We further show that the resulting smooth decoder can serve as a principled generative prior for inverse problems with recovery guarantees. We empirically validate our approach on low-dimensional synthetic manifolds and on MNIST.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a Riemannian AmbientFlow framework that jointly recovers smooth generative models and latent manifolds from corrupted observations.
It employs diffeomorphic normalizing flows and geometric regularization to ensure stable inverse mappings with quantifiable recovery errors under RIP conditions.
Empirical results on synthetic and MNIST data demonstrate superior denoising, interpolation, and inverse problem recovery compared to classical methods.

Riemannian AmbientFlow: Simultaneous Manifold Learning and Generative Modeling from Corrupted Data

Motivation and Context

The problem of learning generative models from corrupted or incomplete observations is critical in scientific and imaging domains where access to clean data is limited or infeasible. Traditional generative modeling techniques presuppose high-quality, clean training samples, whereas real-world scenarios commonly involve noisy, undersampled, or linearly transformed measurements. Furthermore, data in high-dimensional spaces frequently reside near low-dimensional nonlinear manifolds, whose geometric structures are crucial for downstream analytic and interpretability goals. Recent ambient distribution learning paradigms (e.g., AmbientGAN, AmbientFlow, Ambient Diffusion) have enabled generative model learning from corrupted data, but do not provide direct access to latent geometric features. The Riemannian AmbientFlow framework bridges this gap by jointly learning a smooth generative model and the underlying data manifold from corrupted observations.

Theoretical Framework

Riemannian AmbientFlow extends AmbientFlow [kelkarambientflow] via explicit geometric regularization driven by normalizing flows with data-driven pullback metrics. Specifically, the method assumes that the true data distribution is concentrated near the range of a Riemannian Autoencoder (RAE), generated by a diffeomorphism $\varphi_\theta$ trained to match the distributions of corrupted measurements. The optimization objective is an augmented variational lower bound: $\min_{\theta, \phi}\ -\mathcal{L}_{\text{VLB}}(\theta, \phi) + \lambda \|D_\mathbf{0}\varphi_\theta^{-1}\|_F$ where $\mathcal{L}_{\text{VLB}}$ reflects the variational lower bound for marginalized likelihood of measurement data, $\varphi_\theta$ is a neural normalizing flow parameterizing the data manifold, and the penalty controls the concentration of the decoder mapping.

Corrupted measurements $\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{n}$ induce significant ambiguity unless additional structure on $\mathbf{x}$ is imposed. Rather than classical sparsity, the framework enforces that $\mathbf{x}$ lives near a low-dimensional manifold produced by the range of the RAE decoder. Pullback Riemannian geometry then provides closed-form expressions for geodesics, logarithmic and exponential maps, and barycenters, enabling both efficient geometric analysis and stable manifold mappings.

Recovery Guarantees and RAE Properties

A central result analytically bounds the recovery error in terms of the expected RAE projection error and a geometric regularity condition on the measurement operator. Under the Restricted Isometry Property (RIP) for the manifold, it is shown that

$W_1(p_{\hat\theta}, p_\text{data}) \leq 2\omega \left(1 + \frac{\|\mathbf{A}\|}{\sqrt{1-\delta}}\right)$

where $p_{\hat\theta}$ is the learned model, $p_\text{data}$ is the true underlying distribution, $\omega$ quantifies the geometric regularization, and $\delta$ the RIP constant. Thus, the model recovers the workable data distribution up to a quantifiable and controllable error, offering guarantees even for indirect/noisy measurements.

The RAE decoder mapping is shown to be injective, bi-Lipschitz, and smooth, with explicit bounds on its Lipschitz and Jacobian regularity constants, supporting its use as a parameterization in downstream inference and optimization. The construction relies on diffeomorphic neural normalizing flows with invertible architectures. The decoder properties enable efficient projection onto the manifold and robust inverse problem formulations.

Manifold Structure and Generative Capabilities

The framework leverages the diffeomorphic flow $\varphi_\theta$ and geometric projections to both sample generative data and compute geometric quantities (geodesics, barycenters, intrinsic dimensions). Empirical analyses confirm that the learned generative model not only fits corrupted data but manifests a smooth underlying manifold. Geodesic curves interpolate between points through regions of high likelihood, representing structurally meaningful transitions in the latent space.

Figure 1: Points along the geodesic $\gamma_{\mathbf{x}_1, \mathbf{x}_2}$ illustrate smooth interpolation between two samples on the learned manifold.

Generated samples from the learned prior manifest notable denoising and regularization, surpassing the visual quality of corrupted training images.

Figure 2: Top: Corrupted training images. Bottom: Synthetic samples from the learned prior $p_\theta$ demonstrate fidelity to the underlying data manifold.

Application to Inverse Problems

Given the smooth bi-Lipschitz decoder $D_\varepsilon$ , inverse problems are solved via nonlinear least squares optimization constrained to the manifold: $\min_{\mathbf{p}} \frac{1}{2} \left\| \mathbf{A} D_\varepsilon(\mathbf{p}) - \mathbf{y} \right\|^2$ Theoretical results confirm linear convergence guarantees for gradient descent under measurement operator RIP and decoder smoothness. Empirically, this approach yields improved reconstructions for corrupted MNIST images compared to classical TV-based regularization, with lower mean squared error and more semantically accurate reconstructions.

Figure 3: RAE-based inverse problem recovery outperforms TV baseline on MNIST, providing lower reconstruction error.

Empirical Analysis

Experiments on synthetic low-dimensional manifolds and corrupted MNIST illustrate the method’s efficacy. For synthetic sinusoidal data corrupted via Gaussian noise and projection, the method successfully reconstructs a manifold and generative model matching the clean data with as little as 5% clean reference samples, given suitable regularization. For MNIST, the learned prior is used to reconstruct deblurred and denoised digits, and the manifold structure enables semantically interpretable geodesic interpolations and meaningful inverse problem solving.

Implications and Future Directions

Practically, Riemannian AmbientFlow enables robust generative modeling and geometric data analysis directly from corrupted or incomplete samples, facilitating its application in domains where clean data acquisition is impractical (e.g., medical imaging, astronomy, geoscience). The smooth decoder mappings have immediate utility for optimization and inference over manifolds, with theoretical convergence guarantees.

Theoretically, advances hinge on extending diffeomorphic priors to handle multimodal and highly variable data distributions, potentially via non-constant determinant architectures, and on sharpening geometric regularization mechanisms. Future investigations include:

Handling sample-dependent corruption operators and black-box corruption settings.
Refining manifold assumptions and decoder architectures for broader data domains.
Exploring optimization landscapes and potential sources of non-convexity in deep manifold models.
Generalizing geometric priors to incorporate invariances and equivariant architectures.

Conclusion

Riemannian AmbientFlow provides a mathematically principled approach for joint manifold learning and generative modeling from corrupted data, blending ambient distribution learning with data-driven Riemannian geometry. The framework delivers measurable recovery guarantees, smooth and stable manifold parameterizations, and robust empirical performance in both synthetic and real data settings. Its decoder can be immediately employed as a prior for inverse problems, outperforming classical regularization, and its geometric foundations open new avenues for interpretability and sample-efficient downstream learning.

Markdown Report Issue