Exact Recovery Threshold in High Dimensions
- Exact recovery threshold is a phase transition phenomenon that defines when the recovery of hidden structures (such as community assignments or sparse supports) becomes statistically feasible.
- In models like the Stochastic Block Model and sparse recovery, recovery is achieved when divergence measures such as Kullback-Leibler and Chernoff-Hellinger exceed critical values.
- Extensions to hypergraph, weighted, and geometric models show that leveraging local-to-global amplification and tailored metrics is key to precisely attaining the recovery threshold.
Exact recovery threshold refers to the sharp phase transition in high-dimensional inference problems, distinguishing model parameters for which it is possible to recover hidden combinatorial structure (e.g., community assignment, permutation, or sparse support) with probability tending to one, from those for which even information-theoretically optimal (possibly computationally intractable) estimators fail with probability bounded away from zero. This threshold typically manifests in random graph models, mixture models, sparse recovery, hypergraphs, geometric networks, and related structures, and is fundamentally captured by information-theoretic measures such as Kullback-Leibler, Chernoff-Hellinger, or Rényi divergences.
1. Classical Stochastic Block Model Thresholds
For the symmetric two-community Stochastic Block Model (SBM) with vertices partitioned into two equal clusters and edge probabilities (within) and (between), the exact recovery threshold is characterized by the sharp condition (Abbe–Mossel theorem):
Exact recovery is possible with high probability if and only if this inequality is satisfied; conversely, if , not even maximum likelihood (ML) estimators succeed. This explicit threshold is tight and can be efficiently achieved by a relaxation based on semidefinite programming (SDP) (Hajek et al., 2014, Abbe et al., 2014).
For generalized SBMs with equal-sized clusters, the threshold becomes (Hajek et al., 2015).
2. Algorithmic Attainment and Computational Barriers
In the sparse regime (), SDP relaxations of ML, dual certificate analysis, and two-stage partial-recovery plus local-improvement algorithms achieve the exact recovery threshold. There is no algorithmic gap between the statistical and computational thresholds in this regime for SBMs and planted dense subgraph (PDS) models of linear cluster size (Hajek et al., 2014). For the PDS case with cluster size and probabilities , , the threshold is
But when the planted subgraph is much smaller (), a gap emerges: unbounded-time algorithms can succeed below thresholds inaccessible to any polynomial-time procedure unless the planted clique problem is tractable (Hajek et al., 2014).
3. Extensions to General Models
Hypergraph and Non-Uniform Block Models
For the general -uniform hypergraph SBM, the threshold is governed by the generalized Chernoff–Hellinger divergence between community degree profiles:
guarantees exact recovery, and failure when it is less than 1 except for an explicitly characterized exceptional regime (Zhang et al., 2021, Dumitriu et al., 2023).
In non-uniform models (mixtures of different uniformities), aggregation across layers can achieve recovery even when all the uniform layers separately fail, due to the sum of contributions in the GCH divergence (Dumitriu et al., 2023).
Correlated and Geometric Models
Correlated SBMs and multi-network models present richer threshold phenomena involving interplay between alignment, matching, and community structure. For two correlated SBMs, the threshold involves the interplay of single-graph, matching, and union terms:
where is the edge-subsampling rate (Gaudio et al., 2022). With correlated networks, exact recovery is only possible if both the union and matching-vote exponents exceed one, and regimes emerge where graphs are insufficient, but suffice even if no pairwise matching is information-theoretically possible (Rácz et al., 2024).
For geometric models (GSBM, GHCM) where spatial embedding and triangle counts play a role, the sharp threshold takes the form:
where is a spatially-averaged Chernoff–Hellinger divergence between within-community and between-community edge (or pairwise weight) distributions (Gaudio et al., 28 Dec 2025, Gaudio et al., 22 Jan 2025, Gaudio et al., 24 Jan 2026, Gaudio et al., 2024).
4. Mixtures and Weighted Graph Models
For Gaussian mixture models with equal-size clusters, necessary and sufficient separation for exact recovery is:
with Gaussian noise covariance and sample size (Chen et al., 2020).
In weighted SBMs with community-dependent weights (e.g., Gaussian weights), the fundamental signal-to-noise ratio (SNR) controls the threshold:
Recovery is possible if and impossible when for the two-community case. For planted dense subgraph in this model, the threshold for cluster size is at (achievability), with a statistical impossibility below (Pandey et al., 2024).
5. Sparse Recovery and Compressed Sensing
In linear sparse recovery with i.i.d. Gaussian measurement matrices, the so-called "Donoho–Tanner weak threshold" gives the maximal sparsity for exact recovery via minimization with measurement ratio . Two-step reweighted minimization strictly increases the threshold beyond , provably improving over plain in the random Gaussian case (Khajehnejad et al., 2010).
For coordinate-wise sequential detection in high-dimensional sparse models, the exact recovery threshold for average sample size per coordinate is
compared to for non-sequential (fixed sample size) methods, yielding a potentially much smaller requirement in the highly sparse regime (Malloy et al., 2012).
6. Thresholds with Side-Information, Attributes, and Generalizations
When graph models include vertex-associated data (attributes, side-channel, etc.), the exact recovery threshold is determined by a generalized Chernoff–TV divergence, which strictly improves on the graph-alone threshold and can yield exact recovery in previously impossible regimes. For example, in the Data Block Model (DBM) the pairwise separation criterion using the Chernoff–TV divergence must satisfy for all (Asadi et al., 5 Feb 2026).
7. Interpretations and Local-to-Global Amplification
Across all these settings, the fundamental principle is the local-to-global amplification phenomenon: the error rate for recovering a single label or component decays as for some problem-dependent constant , and exact recovery demands that so that a union bound over elements gives vanishing total error. This is universally reflected in the requirement that the relevant divergence or SNR parameter exceeds unity.
Key References
- "Achieving Exact Cluster Recovery Threshold via Semidefinite Programming" (Hajek et al., 2014)
- "Exact Recovery in the Stochastic Block Model" (Abbe et al., 2014)
- "Cutoff for exact recovery of Gaussian mixture models" (Chen et al., 2020)
- "Exact Recovery in the General Hypergraph Stochastic Block Model" (Zhang et al., 2021)
- "Sharp exact recovery threshold for two-community Euclidean random graphs" (Gaudio et al., 22 Jan 2025)
- "Exact recovery in Gaussian weighted stochastic block model and planted dense subgraphs: Statistical and algorithmic thresholds" (Pandey et al., 2024)
- "Improved Sparse Recovery Thresholds with Two-Step Reweighted Minimization" (Khajehnejad et al., 2010)
- "Sequential Testing for Sparse Recovery" (Malloy et al., 2012)
- "Exact Recovery in the Data Block Model" (Asadi et al., 5 Feb 2026)
The exact recovery threshold is a unifying concept in high-dimensional statistical inference, determining both the fundamental and sometimes algorithmically achievable phase transitions for reconstructive estimation in random combinatorial models.