Duplication-Divergence Growing Graphs

Updated 18 January 2026

Duplication-divergence growing graphs are stochastic network models where nodes duplicate and selectively retain or drop edges to mirror real-world systems.
They employ probabilistic edge copying with parameters p and r, leading to heterogeneous degree distributions, phase transitions, and rich connectivity regimes.
Analytical techniques such as master equations, martingale concentration, and Markov chain embeddings rigorously establish scaling laws and structural properties of these graphs.

Duplication-divergence growing graphs are a class of stochastic network models in which network evolution is driven by the mechanisms of node (vertex) duplication and subsequent divergence, typically via partial retention or deletion of edges. This framework was developed to capture structural properties observed in real-world systems such as biological, protein–protein interaction, and social networks, where new vertices arise by copying the interaction patterns of existing ones but undergoing random displacement of links. These models are mathematically tractable and serve as canonical descriptions for the emergence of heterogeneous degree distributions, high clustering, and modular connectivity in large-scale sparse networks.

1. Formal Model Definition

A canonical duplication-divergence model, such as the $DD(t,p,r)$ model, operates as follows (Frieze et al., 2023):

Start with an initial simple graph $G_{t_0}$ on $t_0$ vertices.
For each time step $i = t_0, ..., t-1$ $i = t_{0}, ..., t - 1$ :
- If $(u,w)\in E(G_i)$ , attach $(v,w)$ independently with probability $G_{t_0}$ 0.
- If $G_{t_0}$ 1, attach $G_{t_0}$ 2 independently with probability $G_{t_0}$ 3 (for background rewiring).
Bernoulli trials are independent.

This encompasses pure and partial duplication, allowance for background mutation ( $G_{t_0}$ 4), as well as tunable divergence probability ( $G_{t_0}$ 5).

Several generalizations exist:

Asymmetric/symmetric coupled divergence: Different probabilities for edge loss from parent and copy (Borrelli, 2024, Borrelli, 11 Jan 2026).
Additional mutation/dimerization steps: New random links or enforced $G_{t_0}$ 6 connections (Borrelli, 18 Jun 2025).
Edge/vertex deletion or rewiring between steps (Barbour et al., 2021).

For directed graphs, edges may be duplicated only in the outgoing or incoming direction, possibly also adding deterministic citations to the parent (Steinbock et al., 2018).

2. Degree Distribution and Concentration Phenomena

The asymptotic degree distribution in duplication-divergence models exhibits rich phenomenology:

Concentration of Maximum and Average Degree:
- For the $G_{t_0}$ 7 model and any $G_{t_0}$ 8:
- The maximum degree $G_{t_0}$ 9 is concentrated around $t_0$ 0, up to polylogarithmic factors, with failure probability at most $t_0$ 1 for any $t_0$ 2: for any $t_0$ 3,
$t_0$ 4 - The average degree $t_0$ 5 scales as $t_0$ 6, with similar high-probability bounds (Frieze et al., 2023).
Threshold and Phase Regimes:
- For $t_0$ 7, $t_0$ 8.
- For $t_0$ 9, $i = t_0, ..., t-1$ 0 sharply concentrates around $i = t_0, ..., t-1$ 1 (Frieze et al., 2023).
- There is no phase transition in the scaling of the maximum degree at $i = t_0, ..., t-1$ 2, in contrast to the average degree (Frieze et al., 2023).
Tail Behavior:
- In basic models with background mutation ( $i = t_0, ..., t-1$ 3), the limiting degree distribution is not always pure power-law:
- With specific duplication/deletion rules, the degree distribution decays as a stretched-exponential:
$i = t_0, ..., t-1$ 4

(Backhausz et al., 2013). - In mean-field models, a power-law tail $i = t_0, ..., t-1$ 5 with exponent $i = t_0, ..., t-1$ 6 solving

$i = t_0, ..., t-1$ 7

can emerge in specific partial-duplication regimes (Borrelli, 18 Jun 2025).
Central Limit Theorem for Log-Degree:
- For robust supercritical duplication-divergence models, a central limit theorem holds for the log-degree:
$i = t_0, ..., t-1$ 8

where $i = t_0, ..., t-1$ 9 and $v = i+1$ 0 are effective birth/catastrophe rates (Barbour et al., 2021).

3. Parameter Regimes, Structural Transitions, and Component Behavior

Parameter tuning in duplication-divergence models controls key network properties (Borrelli, 18 Jun 2025, Borrelli, 2024, Borrelli, 11 Jan 2026):

Divergence Rate ( $v = i+1$ 1) and Densification:
- Edge retention probability $v = i+1$ 2.
- For $v = i+1$ 3, the expected number of edges grows superlinearly with time; the network densifies.
- For $v = i+1$ 4, graphs remain sparse; average degree and edge count grow sublinearly.
Asymmetry Parameter ( $v = i+1$ 5) in Divergence (Borrelli, 2024):
- $v = i+1$ 6 or $v = i+1$ 7 yields complete asymmetric divergence (edges lost only from copy or parent, one giant component).
- $v = i+1$ 8 yields symmetric divergence (edges lost with equal probability, generation of fragmented components with power-law distributed sizes).
Component Size and Percolation:
- Models with symmetric divergence (e.g., $v = i+1$ 9) exhibit a nontrivial phase transition in the emergence of a giant component as divergence increases.
- For the symmetric coupled divergence model, the critical divergence rate $u$ 0 for the appearance of a giant component is numerically estimated as $u$ 1 (Borrelli, 11 Jan 2026).
- The component-size distribution follows $u$ 2, with $u$ 3 for $u$ 4, evidencing heavy-tailed modularity (Borrelli, 2024).
Euler Characteristic:
- The locus where the Euler characteristic $u$ 5 vanishes marks a singularity in network structure and coincides with the percolation transition (Borrelli, 11 Jan 2026).

4. Analytical Techniques and Proof Schemes

Multiple rigorous and mean-field analytical techniques have been deployed (Frieze et al., 2023):

Martingale and Chernoff-type Concentration:
- Proofs for maximum and average degree concentration use telescoping meshes, careful deterministic envelopes, and repeated application of Chernoff bounds on degree increments.
- Martingale methods establish convergence (and concentration) of the total degree and its increments.
- Central recurrences for degree growth mimic polynomial trajectories ( $u$ 6, $u$ 7).
Master Equation Analysis:
- Degree distributions are derived from master equations for vertex counts of given degree (often using binomial thinning per copied edge).
- Linear recurrences, eigen-decomposition of the transition matrix, and asymptotics (via generating functions) yield stationary or non-stationary degree laws (Sudbrack et al., 2017, Borrelli, 18 Jun 2025).
Markov Chain and Birth-Catastrophe Process Embeddings:
- Tagged-vertex Markov chains with duplication (birth) and divergence (catastrophe) transitions map the evolution of degree for specific vertices.
- Quasi-stationary distributions and critical behavior are characterized by spectral equations for the Markov transition generator.
Union Bounds and High-Probability Analysis:
- Application of Chernoff-type union bounds over vertices and time steps ensures superpolynomial concentration of extremal quantities (Frieze et al., 2023).

5. Open Problems and Structural Invariants

Despite extensive progress, several open questions persist (Frieze et al., 2023, Borrelli, 18 Jun 2025):

The exact limiting law and support of the normalized maximum degree $u$ 8 remain undetermined.
Proving the existence of a true power-law degree tail in the generic $u$ 9 model is unresolved, with known special cases showing only stretched-exponential decay (Backhausz et al., 2013).
The full degree distribution, especially in models with background mutation or asymmetric divergence, lacks a comprehensive description.
Further analysis is needed on component sizes, motif frequencies, graph automorphism groups, and efficient encoding schemes for duplication-divergence-generated networks.

6. Biological and Network Science Relevance

Duplication-divergence models capture central aspects of biological network evolution, such as gene or protein duplication followed by interaction loss (Borrelli, 18 Jun 2025). Empirical tests on biological datasets, such as protein-protein interaction or genetic regulatory networks, reveal signatures (e.g., negative deviation from expected distinguishability number) consistent with pure duplication–deletion histories (Crawford-Kahrl et al., 2021).

The general class of models unifies fundamental mechanisms in network science:

Emergence of scale-free (or nearly scale-free) degree distributions.
High clustering and modularity, exceeding those of preferential-attachment or Erdős–Rényi graphs.
Phase transitions in connectivity structure driven by model parameters.

The duplication-divergence framework thus underpins a statistical-mechanics approach to natural network formation and has become a touchstone for analytic exploration of non-equilibrium network growth (Borrelli, 18 Jun 2025, Borrelli, 11 Jan 2026, Frieze et al., 2023).

Key References:

"On the concentration of the maximum degree in the duplication-divergence models" (Frieze et al., 2023)
"Duplication-divergence growing graph models" (Borrelli, 18 Jun 2025)
"Divergence asymmetry and connected components in a general duplication-divergence graph model" (Borrelli, 2024)
"Largest connected component in duplication-divergence growing graphs with symmetric coupled divergence" (Borrelli, 11 Jan 2026)
"Asymptotic properties of a random graph with duplications" (Backhausz et al., 2013)
"Genetic Networks Encode Secrets of Their Past" (Crawford-Kahrl et al., 2021)
"The expected degree distribution in transient duplication divergence models" (Barbour et al., 2021)
"Large-scale behavior of the partial duplication random graph" (Hermann et al., 2014)