Papers
Topics
Authors
Recent
Search
2000 character limit reached

Correlated Stochastic Block Model

Updated 10 February 2026
  • Correlated stochastic block model is a probabilistic framework extending traditional SBMs to generate multiple graphs with shared latent community structure and correlated edge patterns.
  • It employs joint edge laws and divergence measures to precisely characterize phase transitions for tasks such as community detection and graph matching under various sparsity regimes.
  • Algorithmic strategies including subgraph counting and low-degree polynomial tests bridge the gap between information-theoretic limits and computational feasibility in CSBM.

A correlated stochastic block model (CSBM) is a probabilistic framework that extends the classical stochastic block model (SBM) to study multiple random graphs—often sharing latent structure—where edges are correlated across graphs due to dependence on a shared parent network. CSBM and its variants (including the multi-view SBM and correlated edge-subsampled models) capture the interplay between community structure and inter-graph edge correlation. CSBM models are central to the study of joint graph inference tasks such as graph matching, joint community detection, and the computational-statistical tradeoffs arising in sparse network regimes.

1. Formal Definitions and Model Classes

The archetypal correlated stochastic block model is generated as follows: First, a parent graph G0G_0 is sampled from an SBM defined by nn vertices, kk communities (with distribution π\pi), and intra- and inter-community edge probabilities pin,poutp_\mathrm{in}, p_\mathrm{out}. For each unordered node pair {i,j}\{i,j\}:

  • Pr[{i,j}E(G0)]=pin\Pr[\{i,j\}\in E(G_0)] = p_\mathrm{in} if ii and jj share a community, poutp_\mathrm{out} otherwise.

From G0G_0, DD observed graphs (views) G1,,GDG^1, \ldots, G^D are formed by independent subsampling: Pr[{i,j} kept in Gd]=sd, independently across d.\Pr[\{i,j\} \text{ kept in } G^d] = s_d, \quad \text{ independently across } d. Thus, across observed graphs, each edge has a joint law characterized by the parent and subsampling parameters, inducing correlations.

A unifying generalization is the multi-view SBM (MVSBM), which prescribes for each edge a joint law pp (same-community) and qq (different-community) over {0,1}D\{0,1\}^D: P[Aij=(a1,,aD)]={p(a1,,aD)if X(i)=X(j), q(a1,,aD)otherwise.\mathbb{P}[A_{ij}=(a_1,\ldots,a_D)] = \begin{cases} p(a_1,\ldots,a_D) & \text{if } X(i)=X(j),\ q(a_1,\ldots,a_D) & \text{otherwise.} \end{cases} This formalism accommodates arbitrary correlation among the edge indicators across views (Zhang et al., 2024).

Another canonical CSBM variant considers two observed graphs, AA and BB, generated by subsampling edges from G0G_0 with retention probability ss and independently permuting BB by a latent matching (vertex permutation) π\pi_* (Racz et al., 2021, Yang et al., 2023).

2. Information-Theoretic Thresholds for Recovery

A hallmark of CSBM theory is the sharp characterization of when one can exactly recover the underlying community labels XX or the vertex correspondence π\pi_*.

Community Exact Recovery: In the MVSBM with nn nodes and DD views, exact recovery of XX is possible if and only if

limnnI(p,q)logn>2,\lim_{n\to\infty} \frac{n\,I(p,q)}{\log n} > 2,

where I(p,q)=2log(d{0,1}Dp(d)q(d))I(p,q) = -2\log\left(\sum_{d\in\{0,1\}^D} \sqrt{p(d)q(d)}\right) is the order-$1/2$ Rényi (Hellinger) divergence between the joint edge-laws (Zhang et al., 2024).

Graph Matching: For two subsampled graphs with average intra-/inter-community degrees p=αlognnp = \alpha \frac{\log n}{n}, q=βlognnq = \beta \frac{\log n}{n}, and edge correlation ss: Exact recovery of π is possible iff s2α+β2>1.\text{Exact recovery of } \pi_* \text{ is possible iff } s^2 \cdot \frac{\alpha+\beta}{2} > 1. If s2α+β2<1s^2 \cdot \frac{\alpha+\beta}{2}<1, isolated vertices emerge in the intersection graph, precluding perfect matching (Racz et al., 2021, Yang et al., 2023).

Joint Thresholds: For community detection from two graphs, if one first aligns the graphs (by matching), the union graph has strengthened edge density. The critical recovery threshold becomes: αβ>11(1s)2,|\sqrt\alpha-\sqrt\beta| > \frac{1}{\sqrt{1 - (1-s)^2}}, mirroring the single-graph SBM threshold αβ>1|\sqrt\alpha-\sqrt\beta|>1 at s=0s=0 and the connectivity threshold αβ0|\sqrt\alpha - \sqrt\beta| \to 0 at s1s\to1 (Racz et al., 2021, Gaudio et al., 2022, Yang et al., 2023).

3. Correlation Structure and Divergence Additivity

The fundamental limit on recovery is governed by the joint divergences of edge laws:

  • For classical SBM (D=1D=1): I(p,q)=(pq)2+o(p+q)I(p,q) = (\sqrt{p} - \sqrt{q})^2 + o(p+q).
  • For DD independent SBMs (independent edge observations), the total divergence is additive:

I(p,q)=k=1D(pkqk)2+o()I(p,q) = \sum_{k=1}^D (\sqrt{p_k} - \sqrt{q_k})^2 + o(\cdot)

  • When graph views are correlated, I(p,q)I(p,q) compresses all cross-view dependence into a joint Hellinger term, which “automatically accounts for the extra information (or redundancy)” (Zhang et al., 2024).

This divergence perspective subsumes both classical and multi-view scenarios, showing that correlated observations synergistically enhance statistical power, but the mutual dependencies must be correctly incorporated.

4. Algorithmic Developments and Computational-Statistical Gaps

A central line of CSBM research targets polynomial-time algorithms that achieve information-theoretic limits, especially in sparse regimes where community signal is weak and edge correlation is crucial.

  • Tree/counting statistics: For sparse CSBMs, efficient detection and matching may be achieved by counting small subgraphs (trees, chandeliers, or more generally decorated trees) in the intersection or union of observed graphs. Otter’s constant α0.338\alpha\approx0.338 emerges as the limiting exponential growth rate of trees, giving a computational threshold for count-based tests: s2>αs^2>\alpha (Chai et al., 2024, Chen et al., 9 Mar 2025, Chen et al., 2024).
  • Low-degree likelihood analysis: The polynomial-time testability of correlation is precisely pinned by low-degree polynomial statistics; no O(logn)O(\log n)-degree polynomial test can distinguish the correlated and null models when s<min{α,1/(λϵ2)}s < \min\{\sqrt{\alpha}, 1/(\lambda\epsilon^2)\}, which results in a computational-statistical gap in regimes where information is present but computational resources are insufficient (Chen et al., 9 Mar 2025, Chen et al., 2024).
  • Graph matching via structured signatures: Recent algorithms for exact graph matching in dense SBMs use partition-tree or chandelier-based signature vectors for candidate vertex pairs, tracking edge patterns within and between communities. Efficient color-coding and combinatorial enumeration techniques enable subgraph counting in nO(1)n^{O(1)} time for fixed parameters (Yang et al., 2023, Chai et al., 2024).

The table below summarizes the main phase transition thresholds for key tasks:

Problem Information-theoretic threshold Computational threshold (sparse regime)
Exact community recovery nI(p,q)/logn>2nI(p,q)/\log n > 2 (MVSBM) Polynomial in nn achievable; no gap for recovery (Zhang et al., 2024)
Exact graph matching s2(α+β)/2>1s^2 (\alpha+\beta)/2 > 1 (two-block) s2>αs^2 > \alpha (Otter’s constant) (Chai et al., 2024, Chen et al., 2024)
Correlation detection Statistical: any s>0s>0; Computation: s>min{α,1/(λϵ2)}s > \min\{\sqrt{\alpha}, 1/(\lambda\epsilon^2)\} Low-degree polynomials match this cut (Chen et al., 2024)

5. Multi-View and Generalizations

CSBM connects naturally to broader random graph models:

  • Multi-view community detection: MVSBM generalizes both multiple independent SBMs and fully joint, correlated edge models. The phase transition and proof techniques (union bounds, change of measure, Hellinger divergence) recover prior results on both single-view and multi-view models as special cases (Zhang et al., 2024).
  • Graph matching with degree-correction and labeled edges: Other extensions include analysis of labeled SBMs (with edge labels/attributes) and degree-corrected SBMs, leading to more nuanced thresholds and algorithms (Lelarge et al., 2015, Mossel et al., 2015).
  • Contextual SBMs: Incorporating covariate information yields a phase boundary for weak recovery at λ2+τ/γ=1\lambda^2 + \tau/\gamma = 1 in the Gaussian and sparse analogues, necessitating optimal use of both graph and feature data (Deshpande et al., 2018).

6. Information-Theoretic and Computational Proof Techniques

Analytical approaches leverage:

  • Second-moment and contiguity methods: Impossibility proofs typically show that the planted model is contiguous w.r.t. a null model below threshold, preventing both recovery and even parameter estimation (Mossel et al., 2012).
  • Reconstruction on random trees: The broadcast Ising channel and Kesten–Stigum bound underpin the theory of phase transitions and link CSBM thresholds to statistical physics (Mossel et al., 2012).
  • Cycle/subgraph enumeration: Detectability proofs above threshold exploit the existence of signature subgraphs whose count is information-revealing only in the planted case (Chen et al., 9 Mar 2025).
  • Amalgamation of spectral, SDP, and combinatorial routines: Polished algorithms interleave spectral initialization, subgraph centering, and seed-boosted matching for tight performance up to the computational barrier (Chai et al., 2024, Yang et al., 2023).

7. Open Directions and Practical Significance

Despite recent progress, key open problems persist:

  • Tight polynomial-time algorithms at the information-theoretic limits for all parameter regimes—particularly for exact matching and weak recovery in the correlated sparse SBM.
  • Generalization to more complex correlation structures, unbalanced communities, degree-heterogeneous, or labeled networks.
  • Quantification of statistical-to-computational phase diagrams: The presence of regimes where community or matching recovery information is latent but algorithmically inaccessible due to complexity barriers (as in low-degree polynomial hardness or Otter threshold transitions).

Correlated SBMs crystallize the interplay between statistical inference, random graph combinatorics, and computational complexity—serving as a unifying model class for analyzing multi-network data across disciplines (Zhang et al., 2024, Chai et al., 2024, Gaudio et al., 2022, Yang et al., 2023, Chen et al., 2024, Chen et al., 9 Mar 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Correlated Stochastic Block Model.