Erdős–Rényi Subgraph Pair Model
- The Erdős–Rényi Subgraph Pair Model is a formal framework that couples random graphs via subgraph extraction and vertex correspondence for network matching.
- It establishes sharp information-theoretic phase transitions for both exact and partial recovery, guiding practical and adversarial network de-anonymization.
- The model underpins studies in graph alignment and community detection using methods like brute-force MAP estimators and tail degree signature algorithms.
The Erdős–Rényi Subgraph Pair Model is a formal framework for studying random graphs coupled through subgraph extraction, vertex correspondence, and, crucially, for quantifying the information-theoretic limits of subgraph alignment and network matching. It forms the mathematical substrate for a wide array of statistical and computational analyses on network alignment, planted subgraph recovery, and correlated random graph pairs. This framework now plays a central role in the rigorous treatment of exact and partial graph alignment, particularly for assessing the feasibility and optimality of recovery in both practical and adversarial regimes (Shiu et al., 8 Jan 2026, Du, 17 Feb 2025, Bozorg et al., 2019).
1. Formal Model Definitions
1.1 Subgraph-Pair Model (Alignment/Recovery Setting)
Let , , and . A base random graph is sampled on vertex set . An -subset is chosen uniformly; is the induced subgraph, which is then anonymized by a uniformly random permutation to produce . The observer sees but neither nor , and aims to recover (set recovery) and/or (permutation recovery) (Shiu et al., 8 Jan 2026).
1.2 Correlated Erdős–Rényi Subgraph Pair (Graph Matching Setting)
A "parent" graph is generated. Two edge-subsampled graphs are created by independently including each parent edge with probability . One of the graphs is vertex-permuted by an unknown bijection . The analyst receives and aims to recover (Du, 17 Feb 2025, Bozorg et al., 2019).
1.3 Agglomerated Subgraph-Pair (Super-vertex Construction)
Given a partition of into disjoint nonempty subsets ("super-vertices"), a subgraph–pair model is defined on the super-vertex set: two super-vertices are connected iff at least one edge exists between their constituent nodes in the original graph. This construction creates an effective inhomogeneous random graph on the super-vertex level, with edge probabilities depending on the subset sizes (Kang et al., 2013).
2. Information-Theoretic Recovery Thresholds
Sharp information-theoretic phase transitions delimit when exact or partial recovery is possible:
2.1 Exact Subgraph Set and Permutation Recovery
- Set Recovery: Achievable iff , impossible (converse) if , where . Under mild conditions, the sharp threshold is (Shiu et al., 8 Jan 2026).
- Permutation Recovery: Requires, in addition, (unique labeling). Fails if either the set recovery converse applies or .
2.2 Partial Recovery in Correlated Graphs
For correlated pairs with , , and , one cannot recover all vertices, but the fraction of recoverable correspondences is bounded tightly in terms of a limiting "balanced-load" distribution :
- The maximal fraction of accurately aligned vertices approaches , with (Du, 17 Feb 2025).
These thresholds delineate computational and information-theoretic feasibility in subgraph alignment and network de-anonymization.
3. Structural and Statistical Properties
3.1 Degree and Clustering Structure
- For two independent on a common vertex set, their union is with . Degree distributions are binomial, clustering coefficient is (Wen et al., 2012).
- Agglomerated super-vertex models produce inhomogeneous graphs, where connection probability between super-vertices of sizes is , enabling explicit degree and connectivity computations at the super-vertex level (Kang et al., 2013).
3.2 Emergence of Community and Heavy-Tailed Structures
When community sizes are heavy-tailed (e.g., ), the induced super-vertex network has a scale-free (power-law) degree distribution, depending on the partition (Kang et al., 2013).
4. Methodologies and Algorithms
4.1 Brute-force (MAP) Estimator
For subgraph alignment, the optimal MAP estimator tests all -subsets and bijections , returning those for which relabeling by reproduces . This is computationally intractable but achieves the information-theoretic threshold (Shiu et al., 8 Jan 2026).
4.2 Tail Degree Signature (TDS)
TDS is a polynomial-time, seedless matching algorithm exploiting the robustness of tail-degree statistics in correlated ER graphs. Feature vectors consist of sorted extremes of neighbor degree distributions across multiple neighborhood shells. Theoretical analysis shows it achieves the information-theoretic threshold in regime (Bozorg et al., 2019).
Complexity
| Algorithm | Time Complexity | Achieves IT Threshold |
|---|---|---|
| Brute-force MAP | Exponential () | Yes (exact recovery), not practical for large |
| TDS–h (Hungarian) | Yes (matching threshold for , sparse regime) | |
| TDS–g (Greedy) | Yes, with high probability under threshold conditions |
5. Phase Transitions and Limit Theorems
5.1 Phase Diagrams in Alignment
Define . Set recovery is feasible for , infeasible for , with a grey zone in between. Sharp phase transitions demarcate algorithmic possibility from impossibility (Shiu et al., 8 Jan 2026).
5.2 Community Graph Phase Transitions
For agglomerated super-vertex graphs, thresholds for connectivity and giant component emergence follow from inhomogeneous random graph (IRG) theory (Kang et al., 2013). The key parameter is , the average squared community size times edge probability:
- Largest component vanishes if , occupies super-vertices if .
6. Connections to Broader Random Graph Models
The ER subgraph-pair model is a special case of subgraph generated models (SUGMs), where the only generated subgraphs are links ( type), with SUGM reducing exactly to ER(). More general SUGMs encode dependency on motifs such as triangles, stars, and cliques, bridging ER structure and higher-order motif-based randomness (Chandrasekhar et al., 2016).
By tuning the types and rates of subgraph "atoms," the model generalizes ER, permitting tractable closed-form expressions for expectations, variances, and parameter inference.
7. Applications and Implications
The ER subgraph-pair model underpins rigorous analysis of biological network alignment, privacy and de-anonymization of social networks, and statistical models of network community structure. Its phase diagrams and thresholds provide foundational guarantees for algorithmic graph matching and motif-based inference. Recent advances demonstrate that truly seedless and polynomial-time algorithms can saturate the fundamental information-theoretic limits via robust local statistics, revealing new pathways for tractable recovery in high-noise regimes (Shiu et al., 8 Jan 2026, Du, 17 Feb 2025, Bozorg et al., 2019).
References:
(Shiu et al., 8 Jan 2026) Information-Theoretic Limits on Exact Subgraph Alignment Problem (Du, 17 Feb 2025) Optimal recovery of correlated Erdős-Rényi graphs (Bozorg et al., 2019) Seedless Graph Matching via Tail of Degree Distribution for Correlated Erdos-Renyi Graphs (Wen et al., 2012) Edge Union of Networks on the Same Vertex Set (Chandrasekhar et al., 2016) A Network Formation Model Based on Subgraphs (Kang et al., 2013) Evolution of a modified binomial random graph by agglomeration