Correlated Stochastic Block Model

Updated 10 February 2026

Correlated stochastic block model is a probabilistic framework extending traditional SBMs to generate multiple graphs with shared latent community structure and correlated edge patterns.
It employs joint edge laws and divergence measures to precisely characterize phase transitions for tasks such as community detection and graph matching under various sparsity regimes.
Algorithmic strategies including subgraph counting and low-degree polynomial tests bridge the gap between information-theoretic limits and computational feasibility in CSBM.

A correlated stochastic block model (CSBM) is a probabilistic framework that extends the classical stochastic block model (SBM) to study multiple random graphs—often sharing latent structure—where edges are correlated across graphs due to dependence on a shared parent network. CSBM and its variants (including the multi-view SBM and correlated edge-subsampled models) capture the interplay between community structure and inter-graph edge correlation. CSBM models are central to the study of joint graph inference tasks such as graph matching, joint community detection, and the computational-statistical tradeoffs arising in sparse network regimes.

1. Formal Definitions and Model Classes

The archetypal correlated stochastic block model is generated as follows: First, a parent graph $G_0$ is sampled from an SBM defined by $n$ vertices, $k$ communities (with distribution $\pi$ ), and intra- and inter-community edge probabilities $p_\mathrm{in}, p_\mathrm{out}$ . For each unordered node pair $\{i,j\}$ :

$\Pr[\{i,j\}\in E(G_0)] = p_\mathrm{in}$ if $i$ and $j$ share a community, $p_\mathrm{out}$ otherwise.

From $G_0$ , $D$ observed graphs (views) $G^1, \ldots, G^D$ are formed by independent subsampling: $\Pr[\{i,j\} \text{ kept in } G^d] = s_d, \quad \text{ independently across } d.$ Thus, across observed graphs, each edge has a joint law characterized by the parent and subsampling parameters, inducing correlations.

A unifying generalization is the multi-view SBM (MVSBM), which prescribes for each edge a joint law $p$ (same-community) and $q$ (different-community) over $\{0,1\}^D$ : $\mathbb{P}[A_{ij}=(a_1,\ldots,a_D)] = \begin{cases} p(a_1,\ldots,a_D) & \text{if } X(i)=X(j),\ q(a_1,\ldots,a_D) & \text{otherwise.} \end{cases}$ This formalism accommodates arbitrary correlation among the edge indicators across views (Zhang et al., 2024).

Another canonical CSBM variant considers two observed graphs, $A$ and $B$ , generated by subsampling edges from $G_0$ with retention probability $s$ and independently permuting $B$ by a latent matching (vertex permutation) $\pi_*$ (Racz et al., 2021, Yang et al., 2023).

2. Information-Theoretic Thresholds for Recovery

A hallmark of CSBM theory is the sharp characterization of when one can exactly recover the underlying community labels $X$ or the vertex correspondence $\pi_*$ .

Community Exact Recovery: In the MVSBM with $n$ nodes and $D$ views, exact recovery of $X$ is possible if and only if

$\lim_{n\to\infty} \frac{n\,I(p,q)}{\log n} > 2,$

where $I(p,q) = -2\log\left(\sum_{d\in\{0,1\}^D} \sqrt{p(d)q(d)}\right)$ is the order-$1/2$ Rényi (Hellinger) divergence between the joint edge-laws (Zhang et al., 2024).

Graph Matching: For two subsampled graphs with average intra-/inter-community degrees $p = \alpha \frac{\log n}{n}$ , $q = \beta \frac{\log n}{n}$ , and edge correlation $s$ : $\text{Exact recovery of } \pi_* \text{ is possible iff } s^2 \cdot \frac{\alpha+\beta}{2} > 1.$ If $s^2 \cdot \frac{\alpha+\beta}{2}<1$ , isolated vertices emerge in the intersection graph, precluding perfect matching (Racz et al., 2021, Yang et al., 2023).

Joint Thresholds: For community detection from two graphs, if one first aligns the graphs (by matching), the union graph has strengthened edge density. The critical recovery threshold becomes: $|\sqrt\alpha-\sqrt\beta| > \frac{1}{\sqrt{1 - (1-s)^2}},$ mirroring the single-graph SBM threshold $|\sqrt\alpha-\sqrt\beta|>1$ at $s=0$ and the connectivity threshold $|\sqrt\alpha - \sqrt\beta| \to 0$ at $s\to1$ (Racz et al., 2021, Gaudio et al., 2022, Yang et al., 2023).

3. Correlation Structure and Divergence Additivity

The fundamental limit on recovery is governed by the joint divergences of edge laws:

For classical SBM ( $D=1$ ): $I(p,q) = (\sqrt{p} - \sqrt{q})^2 + o(p+q)$ .
For $D$ independent SBMs (independent edge observations), the total divergence is additive:

$I(p,q) = \sum_{k=1}^D (\sqrt{p_k} - \sqrt{q_k})^2 + o(\cdot)$

When graph views are correlated, $I(p,q)$ compresses all cross-view dependence into a joint Hellinger term, which “automatically accounts for the extra information (or redundancy)” (Zhang et al., 2024).

This divergence perspective subsumes both classical and multi-view scenarios, showing that correlated observations synergistically enhance statistical power, but the mutual dependencies must be correctly incorporated.

4. Algorithmic Developments and Computational-Statistical Gaps

A central line of CSBM research targets polynomial-time algorithms that achieve information-theoretic limits, especially in sparse regimes where community signal is weak and edge correlation is crucial.

Tree/counting statistics: For sparse CSBMs, efficient detection and matching may be achieved by counting small subgraphs (trees, chandeliers, or more generally decorated trees) in the intersection or union of observed graphs. Otter’s constant $\alpha\approx0.338$ emerges as the limiting exponential growth rate of trees, giving a computational threshold for count-based tests: $s^2>\alpha$ (Chai et al., 2024, Chen et al., 9 Mar 2025, Chen et al., 2024).
Low-degree likelihood analysis: The polynomial-time testability of correlation is precisely pinned by low-degree polynomial statistics; no $O(\log n)$ -degree polynomial test can distinguish the correlated and null models when $s < \min\{\sqrt{\alpha}, 1/(\lambda\epsilon^2)\}$ , which results in a computational-statistical gap in regimes where information is present but computational resources are insufficient (Chen et al., 9 Mar 2025, Chen et al., 2024).
Graph matching via structured signatures: Recent algorithms for exact graph matching in dense SBMs use partition-tree or chandelier-based signature vectors for candidate vertex pairs, tracking edge patterns within and between communities. Efficient color-coding and combinatorial enumeration techniques enable subgraph counting in $n^{O(1)}$ time for fixed parameters (Yang et al., 2023, Chai et al., 2024).

The table below summarizes the main phase transition thresholds for key tasks:

Problem	Information-theoretic threshold	Computational threshold (sparse regime)
Exact community recovery	$nI(p,q)/\log n > 2$ (MVSBM)	Polynomial in $n$ achievable; no gap for recovery (Zhang et al., 2024)
Exact graph matching	$s^2 (\alpha+\beta)/2 > 1$ (two-block)	$s^2 > \alpha$ (Otter’s constant) (Chai et al., 2024, Chen et al., 2024)
Correlation detection	Statistical: any $s>0$ ; Computation: $s > \min\{\sqrt{\alpha}, 1/(\lambda\epsilon^2)\}$	Low-degree polynomials match this cut (Chen et al., 2024)

5. Multi-View and Generalizations

CSBM connects naturally to broader random graph models:

Multi-view community detection: MVSBM generalizes both multiple independent SBMs and fully joint, correlated edge models. The phase transition and proof techniques (union bounds, change of measure, Hellinger divergence) recover prior results on both single-view and multi-view models as special cases (Zhang et al., 2024).
Graph matching with degree-correction and labeled edges: Other extensions include analysis of labeled SBMs (with edge labels/attributes) and degree-corrected SBMs, leading to more nuanced thresholds and algorithms (Lelarge et al., 2015, Mossel et al., 2015).
Contextual SBMs: Incorporating covariate information yields a phase boundary for weak recovery at $\lambda^2 + \tau/\gamma = 1$ in the Gaussian and sparse analogues, necessitating optimal use of both graph and feature data (Deshpande et al., 2018).

6. Information-Theoretic and Computational Proof Techniques

Analytical approaches leverage:

Second-moment and contiguity methods: Impossibility proofs typically show that the planted model is contiguous w.r.t. a null model below threshold, preventing both recovery and even parameter estimation (Mossel et al., 2012).
Reconstruction on random trees: The broadcast Ising channel and Kesten–Stigum bound underpin the theory of phase transitions and link CSBM thresholds to statistical physics (Mossel et al., 2012).
Cycle/subgraph enumeration: Detectability proofs above threshold exploit the existence of signature subgraphs whose count is information-revealing only in the planted case (Chen et al., 9 Mar 2025).
Amalgamation of spectral, SDP, and combinatorial routines: Polished algorithms interleave spectral initialization, subgraph centering, and seed-boosted matching for tight performance up to the computational barrier (Chai et al., 2024, Yang et al., 2023).

7. Open Directions and Practical Significance

Despite recent progress, key open problems persist:

Tight polynomial-time algorithms at the information-theoretic limits for all parameter regimes—particularly for exact matching and weak recovery in the correlated sparse SBM.
Generalization to more complex correlation structures, unbalanced communities, degree-heterogeneous, or labeled networks.
Quantification of statistical-to-computational phase diagrams: The presence of regimes where community or matching recovery information is latent but algorithmically inaccessible due to complexity barriers (as in low-degree polynomial hardness or Otter threshold transitions).

Correlated SBMs crystallize the interplay between statistical inference, random graph combinatorics, and computational complexity—serving as a unifying model class for analyzing multi-network data across disciplines (Zhang et al., 2024, Chai et al., 2024, Gaudio et al., 2022, Yang et al., 2023, Chen et al., 2024, Chen et al., 9 Mar 2025).