Detecting correlation efficiently in stochastic block models: breaking Otter's threshold by counting decorated trees

Published 9 Mar 2025 in cs.DS, math.PR, math.ST, and stat.TH | (2503.06464v1)

Abstract: Consider a pair of sparse correlated stochastic block models $\mathcal S(n,\tfrac{\lambda}{n},\epsilon;s)$ subsampled from a common parent stochastic block model with two symmetric communities, average degree $\lambda=O(1)$ and divergence parameter $\epsilon \in (0,1)$. For all $\epsilon\in(0,1)$, we construct a statistic based on the combination of two low-degree polynomials and show that there exists a sufficiently small constant $\delta=\delta(\epsilon)>0$ and a sufficiently large constant $\Delta=\Delta(\epsilon,\delta)$ such that when $\lambda>\Delta$ and $s>\sqrt{\alpha}-\delta$ where $\alpha\approx 0.338$ is Otter's constant, this statistic can distinguish this model and a pair of independent stochastic block models $\mathcal S(n,\tfrac{\lambda s}{n},\epsilon)$ with probability $1-o(1)$. We also provide an efficient algorithm that approximates this statistic in polynomial time. The crux of our statistic's construction lies in a carefully curated family of multigraphs called \emph{decorated trees}, which enables effective aggregation of the community signal and graph correlation from the counts of the same decorated tree while suppressing the undesirable correlations among counts of different decorated trees.