Rank–Nexus: Invariants in Phylogenetic Networks
- Rank–Nexus is a framework extending rank invariants from tree structures to phylogenetic networks by using tensor flattenings to identify network clades.
- It employs tensor flattenings of joint site distributions under the general Markov model to derive rank constraints, ensuring all 5×5 minors vanish for genuine network clades.
- The framework adapts to equivariant models by decomposing the flattening matrix into block-diagonal forms, which facilitates efficient detection of reticulate events and network topologies.
Rank–Nexus describes a framework for extending rank invariants from phylogenetic trees to phylogenetic networks, particularly in the context of molecular sequence evolution. This approach generalizes the classical rank-based invariants that characterize tree-like evolutionary relationships, providing analogous constraints (termed "Rank–Nexus invariants") for partially reticulated, acyclic, rooted graphs under both the general Markov model (GMM) and group-equivariant submodels. The principal technical device is the rank of certain tensor flattenings of the joint site distribution, which discriminates between different network topologies and reveals network-clades embedded in reticulate histories (Casanellas et al., 2020).
1. Structural Foundations and Notation
Phylogenetic networks in Rank–Nexus are specifically tree-child binary networks: rooted, acyclic, directed graphs representing labeled taxa. The structure satisfies the following:
- The root has indegree 0 and outdegree 2.
- Each leaf has indegree 1 and outdegree 0, uniquely labeled from $1$ to .
- Interior nodes are either tree-vertices (indegree 1, outdegree 2) or reticulation-vertices (indegree 2, outdegree 1), with every reticulation’s child required to be a tree-vertex.
Evolutionary processes on the network are modeled via assignment of Markov matrices to each edge and an initial distribution on nucleotides . The parameterization governs the distribution over character patterns at the leaves.
Reticulations preclude the network from specifying a unique tree. Each of reticulation-vertices corresponds to binary choices selecting one of the two incoming edges, reducing to a tree . The induced leaf pattern distribution is then a mixture over the possible tree resolutions:
where denotes site-inheritance probabilities.
2. Tensor Flattenings and Rank-Invariants
For a bipartition , the observed joint distribution can be viewed as a vector in , and the flattening operation reshapes into a matrix with entries
$\left[ \mathrm{flatt}_{A|B}(p) \right]_{i, j} = p(\text{char-vector: states $iAjB$} )$
In the traditional GMM on a phylogenetic tree, if corresponds to an edge split, the flattening factors as , with , as conditional probability matrices and the base distribution. This structure implies:
Casanellas and Fernández-Sánchez extend this invariant to networks: if forms a network-clade (a subtree present in every tree resolution), the factorization remains valid after network-to-tree mixture. Consequently, the rank bound applies and all minors must vanish—the polynomial relations constituting the Rank–Nexus invariants.
3. Equivariant Model Extensions
For many biological models, transition matrices exhibit symmetry under a permutation group . These G-equivariant models—including Jukes–Cantor (), Kimura 2-parameter, Kimura 3-parameter, and strand-symmetric models—constrain the distribution to the -invariant subspace . Maschke’s theorem yields a simultaneous decomposition of the state spaces and, correspondingly, the flattening becomes block-diagonal:
with block sizes determined by the isotypic decomposition of . The Rank–Nexus bound now reads:
for each block , with vanishing minors constituting the system of invariants specific to the equivariant model.
4. Theorems and Proof Structure
Theorem 2.1 (GMM Rank–Nexus)
Let be a tree-child binary network and a subset of its leaves forming a true subtree (no internal reticulations). Denoting , then for any GMM parameters , has rank at most $4$; thus, all minors vanish.
Sketch: Reroot every resolution at the root of the subtree. The flattening factors for each tree, with mixture weights yielding a convex sum of rank at most $4$ matrices.
Theorem 2.2 (Equivariant Rank–Nexus)
Under a -equivariant model for and the same clade partition, the block-diagonal flattening has block ranks bounded by , with all corresponding minors vanishing.
Sketch: Equivariance ensures the mixture factors through the appropriate spaces on each isotypic component, enforcing the blockwise rank constraints (Casanellas et al., 2020).
5. Illustrative Example
Consider a four-leaf network with a single reticulation , leaves $1,2$ assigned to , $3,4$ to :
- admits two tree resolutions, indexed by .
- Given a root distribution and edge Markov matrices , the flattening results in a matrix.
- For each , the flattening is , summing to a matrix of rank at most $4$ after mixing.
- Numerically, singular value decomposition confirms at most four nonzero singular values; all minors vanish.
6. Applications and Computational Practices
Detecting Tree-Clades and Reticulations
Given site-frequency data from taxa, if arises from a GMM on a network containing a tree-clade , then should empirically testify to the rank bound. Conversely, partitions not supported by a true subtree generally exhibit ranks significantly exceeding $4$. Scanning bipartitions and evaluating the decay in singular values (beyond the theoretical threshold) reveals tree-like subsets and, in turn, network topology.
Distinguishing Network Topologies
As different networks may share certain clades but not others, their collections of Rank–Nexus invariants differ. Non-vanishing of specified minors can rule out network hypotheses, thus extending tree-invariant methodology to reticulate models.
Algorithmic and Practical Considerations
Enumerating all minors is intractable; typical practice involves:
- Selecting a relevant bipartition
- Forming the empirical flattening
- Deploying SVD (worst-case cost ) and inspecting singular-value spectra
- For equivariant models, first block-diagonalizing via symmetric transforms, reducing the problem to analyzing smaller blockwise SVDs.
For sample sizes , these procedures are computationally feasible (Casanellas et al., 2020).
7. Perspective and Significance
Rank–Nexus invariants provide a linear-algebraic means to generalize robust identifiability conditions and reconstruction techniques from trees to networks. Their key principle is the preservation of flattening-rank constraints in any reticulation-free subtree across all tree resolutions of a network. These polynomial invariants afford practical, statistically grounded tests for the identification of tree-like structure in complex reticulate histories—detecting clades, distinguishing network hypotheses, and ultimately advancing phylogenomic inference under rich evolutionary scenarios.