Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scale-Dependent Intrinsic Dimensions

Updated 6 February 2026
  • Scale-dependent intrinsic dimensions are measures that define how effective data or geometric dimensionality changes with varying observation scales, capturing hierarchical and multiscale structures.
  • Methods like the Grassberger–Procaccia estimator, ABIDE, and persistent homology enable precise quantification of local and global manifold properties while accounting for noise.
  • These concepts have practical applications in manifold learning, network science, and quantum gravity, guiding algorithm design and in-depth multiscale data analysis.

Scale-dependent intrinsic dimensions quantify how the effective dimensionality of data, networks, or geometrical spaces varies as a function of the observation or probing scale. Unlike classical intrinsic dimension—which presumes a constant, manifold-like geometry—scale-dependent frameworks reveal multiscale or hierarchical structure, noise regimes, manifold curvature, or transitions between discrete and continuum behaviors. Such scale-dependent dimension concepts operate in manifold learning, topological data analysis, optimal transport, network science, quantum gravity, and high-dimensional statistics.

1. Conceptual Foundations and Core Definitions

Intrinsic dimension (ID) is formally the minimal number of independent coordinates required to describe local neighborhoods in data or geometric spaces. For many datasets and models, ID varies non-trivially with scale: at fine scales, noise or sampling artifacts may dominate, while at coarse scales, global curvature, boundary, or multi-component structure may emerge. Scale-dependent intrinsic dimension captures this variation by associating to each scale parameter (distance, diffusion time, grid size, neighborhood size, etc.) an effective dimension estimate controlling geometric, statistical, or topological properties.

Multiple paradigms exist for defining and estimating scale-dependent ID:

  • Multiscale correlation dimension: Given NN points xiRDx_i\in\mathbb{R}^D, the probability C(r)C(r) that a random pair lies within distance rr often obeys C(r)rdC(r)\sim r^d locally, so dd(r)=dlnC(r)/dlnrd\approx d(r)=d\ln C(r)/d\ln r. Plateaus in d(r)d(r) identify scales of homogeneous dimension, while transitions indicate multiscale or hierarchical structure (Montalvão et al., 2020).
  • Spectral dimension: For spaces (continuous, random, or discrete) where diffusion or random walk is well-defined, the return probability P(t)tds/2P(t)\sim t^{-d_s/2} for time tt, and the "spectral dimension" ds(t)=2dlnP(t)/dlntd_s(t) = -2\,d\ln P(t)/d\ln t encodes how many directions a diffusing particle perceives at each scale (Atkin et al., 2011, Arzano et al., 2017).
  • Covering- or doubling-dimension at scale: The number of balls of radius δ\delta needed to cover a set (N(δ)N(\delta)) yields d(δ)=logN(δ)/log(1/δ)d_*(\delta)=\log N(\delta)/\log(1/\delta). When restricted to rtr\leq t for a scale parameter tt, one obtains the tt-restricted doubling dimension (Choudhary et al., 2014).
  • Topological homology dimension: Persistent local homology quantifies topological features (cycles, voids) that appear at various spatial scales, assigning to each point and scale the largest nontrivial homology observed in annuli of varying inner and outer radii (Rohrscheidt et al., 2022).
  • Statistical or process-based dimension: Local, relative, and global dimension definitions via dynamical processes (diffusion, epidemic spreading) probe network structure at dynamically meaningful scales (Peach et al., 2021).

2. Scale-dependent Estimators and Multiscale Methodologies

Estimation of scale-dependent intrinsic dimension typically relies on analyzing geometric, topological, or probabilistic statistics across varying scales:

  • Grassberger–Procaccia (GP) estimator and joint differential entropy estimation: Joint log–log plots (logr,logC(r))(\log r, \log C(r)) allow identification of multiple linear regimes, revealing plateaus (homogeneous dimensions) and transitions (e.g., from local tubular to global ring structure as scale increases) (Montalvão et al., 2020).
  • Adaptive Binomial and likelihood-based methods: The ABIDE protocol selects for each data point the largest neighborhood where density is statistically homogeneous, by performing likelihood ratio tests at each scale and coupling this with intrinsic dimension estimation via consistent MLE on binomial point-in-ball counts. This self-consistent procedure isolates a "sweet spot" scale range where ID estimation is both statistically valid and robust to noise or curvature (Noia et al., 2024).
  • Connectivity-based estimators: The eDCF method computes the local connectivity factor (CF) on a spatial grid, scanning scales to balance noise smoothing with preservation of geometric structure. Dimension is then inferred from comparison to analytically or empirically calibrated reference values for each candidate dimension (Gupta et al., 18 Oct 2025).
  • Persistent homology and topological filtration: The PID (Persistent Intrinsic Dimension) algorithm detects, as a function of annular scale, the dimension via the highest nontrivial local persistent homology, while the Euclidicity score quantifies to what degree neighborhoods match the topology of a Euclidean ball in the candidate dimension (Rohrscheidt et al., 2022).
  • Diffusion- and random-walk-based metrics: Network settings employ the local peak response of the heat kernel (or Green's function) to estimate dimension from peak time and amplitude, with global or local averages yielding dimension curves as functions of diffusion time or process scale (Peach et al., 2021, Burgess, 2022).

3. Theoretical Guarantees and Statistical Implications

Multiple scale-dependent ID estimators have theoretical consistency and performance guarantees:

  • Bias correction: Multiscale correlation-based dimension estimators exhibit finite-sample bias, especially severe in high dimension or small sample size (N10d/2N\ll10^{d/2}), which can be analytically corrected using known relationships between sample size, neighborhood scale, and observed (apparent) dimension (Montalvão et al., 2020).
  • Convergence and self-consistency: The ABIDE method is proven to converge with high probability, is consistent (ddd^*\to d as nn\to\infty), and is asymptotically normal, provided data is not extremely high-dimensional or highly inhomogeneous (Noia et al., 2024).
  • Minimum Intrinsic Dimension scaling: For high-dimensional data, entropic optimal transport results demonstrate that statistical convergence rates (e.g., for Sinkhorn costs, maps, densities) depend solely on the single-scale covering number at the entropic regularization scale, highlighting a form of "scale-dependent dimension collapse"—the effective statistical resolution is governed by the intrinsic dimension at the scale set by entropic regularization (Stromme, 2023).
  • Scale selection: Several frameworks (ABIDE, eDCF) provide principled procedures for automatic determination of the optimal (statistically homogeneous) scale at each point, ensuring that dimension estimates are neither noise-dominated (fine-scale inflation) nor under-resolved (coarse-scale misestimation).

4. Representative Applications Across Domains

Scale-dependent intrinsic dimension has broad applications:

  • Manifold learning and feature selection: Understanding how ID varies with scale guides the choice of neighborhood in local embedding algorithms, optimal selection of reduced feature sets, and detection of outlier or singular regions (Rohrscheidt et al., 2022, Noia et al., 2024).
  • Topological data analysis and singularity detection: Persistent local homology and multi-scale Euclidicity measures allow fine-grained detection of non-manifold singularities, complex geometric features, and the regions where the manifold hypothesis fails (e.g., in high-density image or cytometry data) (Rohrscheidt et al., 2022).
  • Network science: Relative, local, and global dimension curves, evaluated as a function of diffusion time or multi-hop reachability, quantitatively expose multi-scale network organization, functional modules, topological constraints, or communication bottlenecks (e.g. in protein graphs, economic networks, internet topology) (Peach et al., 2021, Burgess, 2022).
  • Quantum gravity and spacetime geometry: The spectral dimension's scale dependence is a universal feature in approaches to quantum gravity. Running dimensions serve as signatures of flow from high (infrared) to reduced (ultraviolet) effective geometries, with implications for entanglement entropy finiteness, emergence of gravitational laws, and potential observational signatures in cosmic or particle physics (Atkin et al., 2011, Arzano et al., 2017).
  • Combinatorial and topological optimization: The t-restricted doubling dimension supports efficient construction of hierarchical net-forests, approximate Čech complexes, and fast topological data structures for computational geometry and metric learning, with computational costs depending on the relevant scale-dependent dimension rather than global worst-case dimension (Choudhary et al., 2014).

5. Comparative Analysis of Key Methods and Diagnostics

The following table summarizes several classes of scale-dependent ID estimators, their mathematical basis, and diagnostic recommendations:

Estimator / Framework Scale Parameterization Diagnostic Output
Grassberger–Procaccia / Ma (entropy) Pairwise distance threshold rr Log–log slope (ID) vs. rr
ABIDE (adaptive binomial) kk-NN radius (stat. homogeneity) Adaptive dd^*, kik_i^*
eDCF (local connectivity) Grid spacing ss, neighbor radius Scale–membership curve
Persistent ID / Euclidicity Annular radii (r,s)(r,s) (multi-scale) ix(ϵ)i_x(\epsilon), Euclidicity score
Diffusion-based network dimension Diffusion time t,τt,\tau Di(τ),Di ⁣ ⁣j\mathcal D_i(\tau), D_{i\!\to\!j} vs. τ\tau
t-restricted doubling dimension Max radius tt Δt\Delta_t up to tt

Interpretation guidelines stress the identification of plateaus (indicative of scale-invariant structure), transition regions (multiscale or fractal geometry), and anomalies (singular or non-manifold structure). Plateaus in ID estimates, stability of diagnostic scores under resampling or scale perturbation, and consistency across multiple methods are recommended for reliable geometric characterization.

6. Broader Implications and Universal Features

Scale-dependent intrinsic dimension reveals several universal properties across disciplines:

  • No unique global dimension: Complex spaces, data clouds, and networks possess a spectrum of dimensions, not a single value. Geometry, topology, and statistical structure evolve with scale, sometimes featuring plateaus, sometimes non-monotonic or hierarchical transitions (Montalvão et al., 2020, Rohrscheidt et al., 2022, Peach et al., 2021).
  • Finiteness and physical meaning: In physical and quantum gravity contexts, the requirement that spectral or entanglement dimension never vanish at any scale is crucial for the finiteness of key physical quantities (e.g., black hole entropy), the emergence of classical laws, and the universality of area–entropy relations (Arzano et al., 2017).
  • Algorithmic and statistical design: The statistical efficiency, computational cost, and robustness of data processing pipelines depend critically on the relevant scale-dependent dimension, with automatic scale selection and multiscale adaptation necessary for high-dimensional, noisy, or geometrically complex data (Noia et al., 2024, Gupta et al., 18 Oct 2025, Choudhary et al., 2014, Stromme, 2023).
  • Multiscale diagnostics: Comparative multiscale analysis provides crucial insights: singularity detection, phase transitions, functional modularity, and intervention points are often detectable only through scale-resolved geometric and topological properties (Rohrscheidt et al., 2022, Peach et al., 2021).

7. Practical Recommendations and Limitations

Practical guidelines for scale-dependent dimension estimation include:

  • Exclusion of extreme scales—very small (noise dominated, discretization effects) and very large (global wrap-around, inhomogeneity) regimes are unreliable; focus on intermediate scales with local density constancy and linearity in log–log diagnostics (Montalvão et al., 2020, Noia et al., 2024).
  • Validation via multiple estimators, cross-method consistency, and empirical assessment of local homogeneity or scale-invariance is advised for robust dimension assignment (Rohrscheidt et al., 2022, Gupta et al., 18 Oct 2025).
  • Automatic, statistically justified scale selection (as in ABIDE and eDCF) is preferable to fixed-scale or purely visual choices, especially in heterogeneous or high-noise data.
  • In extremely high dimension (d50d\gg50), all neighborhood-based methods are sensitive to density inhomogeneity and may exhibit bias; interpret results with appropriate caution (Noia et al., 2024).
  • When computational complexity matters, use algorithms (e.g., net-forest, covering-based filtrations) whose cost scales with the relevant Δt\Delta_t instead of global dd (Choudhary et al., 2014).

In conclusion, scale-dependent intrinsic dimension provides a mathematically rigorous and practically indispensable toolkit for diagnosing and exploiting the geometric and topological structure of data, networks, and physical spaces across scales, with foundational implications extending from high-dimensional statistics to quantum spacetime (Montalvão et al., 2020, Rohrscheidt et al., 2022, Atkin et al., 2011, Noia et al., 2024, Gupta et al., 18 Oct 2025, Stromme, 2023, Choudhary et al., 2014, Peach et al., 2021, Burgess, 2022, Arzano et al., 2017).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Scale-Dependent Intrinsic Dimensions.