Papers
Topics
Authors
Recent
Search
2000 character limit reached

Local Averaging Accurately Distills Manifold Structure From Noisy Data

Published 23 Jun 2025 in stat.ML, cs.CG, and cs.LG | (2506.18761v1)

Abstract: High-dimensional data are ubiquitous, with examples ranging from natural images to scientific datasets, and often reside near low-dimensional manifolds. Leveraging this geometric structure is vital for downstream tasks, including signal denoising, reconstruction, and generation. However, in practice, the manifold is typically unknown and only noisy samples are available. A fundamental approach to uncovering the manifold structure is local averaging, which is a cornerstone of state-of-the-art provable methods for manifold fitting and denoising. However, to the best of our knowledge, there are no works that rigorously analyze the accuracy of local averaging in a manifold setting in high-noise regimes. In this work, we provide theoretical analyses of a two-round mini-batch local averaging method applied to noisy samples drawn from a $d$-dimensional manifold $\mathcal M \subset \mathbb{R}D$, under a relatively high-noise regime where the noise size is comparable to the reach $\tau$. We show that with high probability, the averaged point $\hat{\mathbf q}$ achieves the bound $d(\hat{\mathbf q}, \mathcal M) \leq \sigma \sqrt{d\left(1+\frac{\kappa\mathrm{diam}(\mathcal {M})}{\log(D)}\right)}$, where $\sigma, \mathrm{diam(\mathcal M)},\kappa$ denote the standard deviation of the Gaussian noise, manifold's diameter and a bound on its extrinsic curvature, respectively. This is the first analysis of local averaging accuracy over the manifold in the relatively high noise regime where $\sigma \sqrt{D} \approx \tau$. The proposed method can serve as a preprocessing step for a wide range of provable methods designed for lower-noise regimes. Additionally, our framework can provide a theoretical foundation for a broad spectrum of denoising and dimensionality reduction methods that rely on local averaging techniques.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.