Erdős–Rényi Subgraph Pair Model
- The Erdős–Rényi Subgraph Pair Model is a formal framework that couples random graphs via subgraph extraction and vertex correspondence for network matching.
- It establishes sharp information-theoretic phase transitions for both exact and partial recovery, guiding practical and adversarial network de-anonymization.
- The model underpins studies in graph alignment and community detection using methods like brute-force MAP estimators and tail degree signature algorithms.
The Erdős–Rényi Subgraph Pair Model is a formal framework for studying random graphs coupled through subgraph extraction, vertex correspondence, and, crucially, for quantifying the information-theoretic limits of subgraph alignment and network matching. It forms the mathematical substrate for a wide array of statistical and computational analyses on network alignment, planted subgraph recovery, and correlated random graph pairs. This framework now plays a central role in the rigorous treatment of exact and partial graph alignment, particularly for assessing the feasibility and optimality of recovery in both practical and adversarial regimes (Shiu et al., 8 Jan 2026, Du, 17 Feb 2025, Bozorg et al., 2019).
1. Formal Model Definitions
1.1 Subgraph-Pair Model (Alignment/Recovery Setting)
Let , , and . A base random graph is sampled on vertex set . An -subset is chosen uniformly; is the induced subgraph, which is then anonymized by a uniformly random permutation to produce . The observer sees 0 but neither 1 nor 2, and aims to recover 3 (set recovery) and/or 4 (permutation recovery) (Shiu et al., 8 Jan 2026).
1.2 Correlated Erdős–Rényi Subgraph Pair (Graph Matching Setting)
A "parent" graph 5 is generated. Two edge-subsampled graphs 6 are created by independently including each parent edge with probability 7. One of the graphs is vertex-permuted by an unknown bijection 8. The analyst receives 9 and aims to recover 0 (Du, 17 Feb 2025, Bozorg et al., 2019).
1.3 Agglomerated Subgraph-Pair (Super-vertex Construction)
Given a partition of 1 into 2 disjoint nonempty subsets ("super-vertices"), a subgraph–pair model is defined on the super-vertex set: two super-vertices are connected iff at least one edge exists between their constituent nodes in the original 3 graph. This construction creates an effective inhomogeneous random graph on the super-vertex level, with edge probabilities depending on the subset sizes (Kang et al., 2013).
2. Information-Theoretic Recovery Thresholds
Sharp information-theoretic phase transitions delimit when exact or partial recovery is possible:
2.1 Exact Subgraph Set and Permutation Recovery
- Set Recovery: Achievable iff 4, impossible (converse) if 5, where 6. Under mild conditions, the sharp threshold is 7 (Shiu et al., 8 Jan 2026).
- Permutation Recovery: Requires, in addition, 8 (unique labeling). Fails if either the set recovery converse applies or 9.
2.2 Partial Recovery in Correlated Graphs
For correlated pairs with 0, 1, and 2, one cannot recover all vertices, but the fraction of recoverable correspondences is bounded tightly in terms of a limiting "balanced-load" distribution 3:
- The maximal fraction of accurately aligned vertices approaches 4, with 5 (Du, 17 Feb 2025).
These thresholds delineate computational and information-theoretic feasibility in subgraph alignment and network de-anonymization.
3. Structural and Statistical Properties
3.1 Degree and Clustering Structure
- For two independent 6 on a common vertex set, their union is 7 with 8. Degree distributions are binomial, clustering coefficient is 9 (Wen et al., 2012).
- Agglomerated super-vertex models produce inhomogeneous graphs, where connection probability between super-vertices of sizes 0 is 1, enabling explicit degree and connectivity computations at the super-vertex level (Kang et al., 2013).
3.2 Emergence of Community and Heavy-Tailed Structures
When community sizes are heavy-tailed (e.g., 2), the induced super-vertex network has a scale-free (power-law) degree distribution, depending on the partition (Kang et al., 2013).
4. Methodologies and Algorithms
4.1 Brute-force (MAP) Estimator
For subgraph alignment, the optimal MAP estimator tests all 3-subsets $[n]=\{1,\ldots,n\}$4 and bijections 5, returning those for which relabeling 6 by 7 reproduces 8. This is computationally intractable but achieves the information-theoretic threshold (Shiu et al., 8 Jan 2026).
4.2 Tail Degree Signature (TDS)
TDS is a polynomial-time, seedless matching algorithm exploiting the robustness of tail-degree statistics in correlated ER graphs. Feature vectors consist of sorted extremes of neighbor degree distributions across multiple neighborhood shells. Theoretical analysis shows it achieves the information-theoretic threshold 9 in regime 0 (Bozorg et al., 2019).
Complexity
| Algorithm | Time Complexity | Achieves IT Threshold |
|---|---|---|
| Brute-force MAP | Exponential (1) | Yes (exact recovery), not practical for large 2 |
| TDS–h (Hungarian) | 3 | Yes (matching threshold for 4, sparse regime) |
| TDS–g (Greedy) | 5 | Yes, with high probability under threshold conditions |
5. Phase Transitions and Limit Theorems
5.1 Phase Diagrams in Alignment
Define 6. Set recovery is feasible for 7, infeasible for 8, with a grey zone in between. Sharp phase transitions demarcate algorithmic possibility from impossibility (Shiu et al., 8 Jan 2026).
5.2 Community Graph Phase Transitions
For agglomerated super-vertex graphs, thresholds for connectivity and giant component emergence follow from inhomogeneous random graph (IRG) theory (Kang et al., 2013). The key parameter is 9, the average squared community size times edge probability:
- Largest component vanishes if 0, occupies 1 super-vertices if 2.
6. Connections to Broader Random Graph Models
The ER subgraph-pair model is a special case of subgraph generated models (SUGMs), where the only generated subgraphs are links (3 type), with SUGM reducing exactly to ER(4). More general SUGMs encode dependency on motifs such as triangles, stars, and cliques, bridging ER structure and higher-order motif-based randomness (Chandrasekhar et al., 2016).
By tuning the types and rates of subgraph "atoms," the model generalizes ER, permitting tractable closed-form expressions for expectations, variances, and parameter inference.
7. Applications and Implications
The ER subgraph-pair model underpins rigorous analysis of biological network alignment, privacy and de-anonymization of social networks, and statistical models of network community structure. Its phase diagrams and thresholds provide foundational guarantees for algorithmic graph matching and motif-based inference. Recent advances demonstrate that truly seedless and polynomial-time algorithms can saturate the fundamental information-theoretic limits via robust local statistics, revealing new pathways for tractable recovery in high-noise regimes (Shiu et al., 8 Jan 2026, Du, 17 Feb 2025, Bozorg et al., 2019).
References:
(Shiu et al., 8 Jan 2026) Information-Theoretic Limits on Exact Subgraph Alignment Problem (Du, 17 Feb 2025) Optimal recovery of correlated Erdős-Rényi graphs (Bozorg et al., 2019) Seedless Graph Matching via Tail of Degree Distribution for Correlated Erdos-Renyi Graphs (Wen et al., 2012) Edge Union of Networks on the Same Vertex Set (Chandrasekhar et al., 2016) A Network Formation Model Based on Subgraphs (Kang et al., 2013) Evolution of a modified binomial random graph by agglomeration