Papers
Topics
Authors
Recent
Search
2000 character limit reached

Exact Recovery Threshold in High Dimensions

Updated 7 February 2026
  • Exact recovery threshold is a phase transition phenomenon that defines when the recovery of hidden structures (such as community assignments or sparse supports) becomes statistically feasible.
  • In models like the Stochastic Block Model and sparse recovery, recovery is achieved when divergence measures such as Kullback-Leibler and Chernoff-Hellinger exceed critical values.
  • Extensions to hypergraph, weighted, and geometric models show that leveraging local-to-global amplification and tailored metrics is key to precisely attaining the recovery threshold.

Exact recovery threshold refers to the sharp phase transition in high-dimensional inference problems, distinguishing model parameters for which it is possible to recover hidden combinatorial structure (e.g., community assignment, permutation, or sparse support) with probability tending to one, from those for which even information-theoretically optimal (possibly computationally intractable) estimators fail with probability bounded away from zero. This threshold typically manifests in random graph models, mixture models, sparse recovery, hypergraphs, geometric networks, and related structures, and is fundamentally captured by information-theoretic measures such as Kullback-Leibler, Chernoff-Hellinger, or Rényi divergences.

1. Classical Stochastic Block Model Thresholds

For the symmetric two-community Stochastic Block Model (SBM) with nn vertices partitioned into two equal clusters and edge probabilities p=alogn/np=a\log n/n (within) and q=blogn/nq=b\log n/n (between), the exact recovery threshold is characterized by the sharp condition (Abbe–Mossel theorem):

(ab)2>2(\sqrt{a}-\sqrt{b})^2 > 2

Exact recovery is possible with high probability if and only if this inequality is satisfied; conversely, if (ab)2<2(\sqrt{a}-\sqrt{b})^2 < 2, not even maximum likelihood (ML) estimators succeed. This explicit threshold is tight and can be efficiently achieved by a relaxation based on semidefinite programming (SDP) (Hajek et al., 2014, Abbe et al., 2014).

For generalized SBMs with rr equal-sized clusters, the threshold becomes (ab)2>r(\sqrt{a} - \sqrt{b})^2 > r (Hajek et al., 2015).

2. Algorithmic Attainment and Computational Barriers

In the sparse regime (p,qlogn/np,q \sim \log n / n), SDP relaxations of ML, dual certificate analysis, and two-stage partial-recovery plus local-improvement algorithms achieve the exact recovery threshold. There is no algorithmic gap between the statistical and computational thresholds in this regime for SBMs and planted dense subgraph (PDS) models of linear cluster size (Hajek et al., 2014). For the PDS case with cluster size K=ρnK=\rho n and probabilities p=alogn/np=a\log n/n, q=blogn/nq=b\log n/n, the threshold is

ρf(a,b)>1,f(a,b)=aτlog(a/τ)=bτlog(b/τ),  τ=ablogalogb\rho\,f(a,b) > 1\,, \quad f(a,b) = a - \tau^*\log(a/\tau^*) = b - \tau^*\log(b/\tau^*), \; \tau^* = \frac{a-b}{\log a - \log b}

But when the planted subgraph is much smaller (KnK\ll n), a gap emerges: unbounded-time algorithms can succeed below thresholds inaccessible to any polynomial-time procedure unless the planted clique problem is tractable (Hajek et al., 2014).

3. Extensions to General Models

Hypergraph and Non-Uniform Block Models

For the general dd-uniform hypergraph SBM, the threshold is governed by the generalized Chernoff–Hellinger divergence D+(i,j)D_+(i,j) between community degree profiles:

minijD+(i,j)>1\min_{i\neq j} D_+(i,j) > 1

guarantees exact recovery, and failure when it is less than 1 except for an explicitly characterized exceptional regime (Zhang et al., 2021, Dumitriu et al., 2023).

In non-uniform models (mixtures of different uniformities), aggregation across layers can achieve recovery even when all the uniform layers separately fail, due to the sum of contributions in the GCH divergence (Dumitriu et al., 2023).

Correlated and Geometric Models

Correlated SBMs and multi-network models present richer threshold phenomena involving interplay between alignment, matching, and community structure. For two correlated SBMs, the threshold involves the interplay of single-graph, matching, and union terms:

(ab)2[s22+s(1s)]>1(\sqrt{a}-\sqrt{b})^2\Big[\frac{s^2}{2} + s(1-s)\Big] > 1

where ss is the edge-subsampling rate (Gaudio et al., 2022). With K3K\ge3 correlated networks, exact recovery is only possible if both the union and matching-vote exponents exceed one, and regimes emerge where K1K-1 graphs are insufficient, but KK suffice even if no pairwise matching is information-theoretically possible (Rácz et al., 2024).

For geometric models (GSBM, GHCM) where spatial embedding and triangle counts play a role, the sharp threshold takes the form:

λνdrdD+>1\lambda\,\nu_d\,r^d\,D_+ > 1

where D+D_+ is a spatially-averaged Chernoff–Hellinger divergence between within-community and between-community edge (or pairwise weight) distributions (Gaudio et al., 28 Dec 2025, Gaudio et al., 22 Jan 2025, Gaudio et al., 24 Jan 2026, Gaudio et al., 2024).

4. Mixtures and Weighted Graph Models

For Gaussian mixture models with KK equal-size clusters, necessary and sufficient separation for exact recovery is:

Δ24σ2(1+1+KdnlogN)logN\Delta^2 \geq 4\sigma^2\left(1 + \sqrt{1 + \frac{Kd}{n\log N}}\right)\log N

with Gaussian noise covariance σ2Id\sigma^2 I_d and sample size N=KnN=Kn (Chen et al., 2020).

In weighted SBMs with community-dependent weights (e.g., Gaussian weights), the fundamental signal-to-noise ratio (SNR) controls the threshold:

SNR=(μ1μ2)28τ2\mathrm{SNR} = \frac{(\mu_1 - \mu_2)^2}{8\tau^2}

Recovery is possible if SNR>1\mathrm{SNR}>1 and impossible when SNR<1\mathrm{SNR}<1 for the two-community case. For planted dense subgraph in this model, the threshold for cluster size γn\gamma n is at γSNR>1\gamma\,\mathrm{SNR}>1 (achievability), with a statistical impossibility below γSNR<3/4\gamma\,\mathrm{SNR}<3/4 (Pandey et al., 2024).

5. Sparse Recovery and Compressed Sensing

In linear sparse recovery with i.i.d. Gaussian measurement matrices, the so-called "Donoho–Tanner weak threshold" ρW(δ)\rho_W(\delta) gives the maximal sparsity ρ=k/n\rho=k/n for exact recovery via 1\ell_1 minimization with measurement ratio δ=m/n\delta=m/n. Two-step reweighted 1\ell_1 minimization strictly increases the threshold beyond ρW(δ)\rho_W(\delta), provably improving over plain 1\ell_1 in the random Gaussian case (Khajehnejad et al., 2010).

For coordinate-wise sequential detection in high-dimensional sparse models, the exact recovery threshold for average sample size mm per coordinate is

m>logsD(P0P1)m > \frac{\log s}{D(P_0\|P_1)}

compared to m>lognD(P1P0)m > \frac{\log n}{D(P_1\|P_0)} for non-sequential (fixed sample size) methods, yielding a potentially much smaller requirement in the highly sparse regime (Malloy et al., 2012).

6. Thresholds with Side-Information, Attributes, and Generalizations

When graph models include vertex-associated data (attributes, side-channel, etc.), the exact recovery threshold is determined by a generalized Chernoff–TV divergence, which strictly improves on the graph-alone threshold and can yield exact recovery in previously impossible regimes. For example, in the Data Block Model (DBM) the pairwise separation criterion using the Chernoff–TV divergence DCTD_\mathrm{CT} must satisfy Ds,t>1D_{s,t} > 1 for all sts \neq t (Asadi et al., 5 Feb 2026).

7. Interpretations and Local-to-Global Amplification

Across all these settings, the fundamental principle is the local-to-global amplification phenomenon: the error rate for recovering a single label or component decays as nCn^{-C} for some problem-dependent constant CC, and exact recovery demands that C>1C>1 so that a union bound over nn elements gives vanishing total error. This is universally reflected in the requirement that the relevant divergence or SNR parameter exceeds unity.


Key References

The exact recovery threshold is a unifying concept in high-dimensional statistical inference, determining both the fundamental and sometimes algorithmically achievable phase transitions for reconstructive estimation in random combinatorial models.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Exact Recovery Threshold.