Normalized Excess Co-occurrence Matrix

Updated 18 February 2026

Normalized excess co-occurrence matrix is a method that measures how often tuples of nodes co-occur beyond expectations based on independent occurrence.
It employs incidence matrices, marginal probability normalization, and the face-splitting product to extend pairwise counts to higher-order tensors.
Applications span word embedding, market basket analysis, and hypergraph community detection by revealing statistically significant interactions.

A normalized excess co-occurrence matrix (and its higher-order tensor analogs) provides a measure of how much more frequently tuples of nodes (typically words, items, or graph vertices) co-occur within a collection of groups (hyperedges, contexts, baskets) than would be expected if these nodes occurred independently. This construct is foundational in hypergraph theory, NLP, and data mining, generalizing the standard co-occurrence matrix and mutual information to arbitrary tuple size via the face-splitting (rowwise Khatri–Rao) product, yielding excess co-occurrence tensors and multivariate pointwise mutual information (Bischof, 2020).

1. Incidence Matrix and Pairwise Co-occurrence

The starting point is a bipartite relationship between a finite set of nodes $X = \{x_1, \dots, x_n\}$ and a family of hyperedges $E = \{e_1, \dots, e_m\}$ . The binary incidence matrix $A \in \{0,1\}^{m \times n}$ is defined by $A_{e,i} = 1$ if node $x_i$ is present in hyperedge $e$ , and $0$ otherwise.

The unnormalized pairwise co-occurrence matrix is calculated as $C = A^T A$ , an $n \times n$ matrix where entry $C_{ij}$ counts the number of hyperedges containing both $x_i$ and $x_j$ . The degree $d_i = C_{ii}$ records the number of hyperedges containing node $x_i$ , and $\sum_{i=1}^n d_i$ gives the total number of node-edge incidences (Bischof, 2020).

2. Normalization and the Excess Co-occurrence Matrix

To interpret co-occurrence significance, a null model assumes independent node participation in edges. The empirical marginal probability for node $x_i$ is $p_i = d_i/m$ . The matrix of empirical pairwise probabilities is $P_{ij} = C_{ij} / m$ , and under independence, the reference probability is the rank-one matrix $pp^T$ with entries $p_i p_j$ .

The excess (or normalized) co-occurrence quantifies deviation from this independent baseline:

Probability form: $P^{\rm excess} = P - pp^T$
Raw-count form: $C^{\rm excess} = C - \frac{1}{m} dd^T$ where $d \in \mathbb{R}^n$ is the vector of node degrees. These highlight pairs co-occurring more or less frequently than independence predicts (Bischof, 2020).

3. Higher-Order Co-occurrence via Face-Splitting Product

Pairwise co-occurrence generalizes to k-way co-occurrence for arbitrary order $k$ through the face-splitting (transpose Khatri–Rao) product. For $A \in \{0,1\}^{m \times n}$ , define $F^{(k-1)} = A \fsplit \cdots \fsplit A$ (with $k-1$ factors), yielding a matrix with

$F^{(k-1)}_{e, (i_1,\dots,i_{k-1})} = \prod_{t=1}^{k-1} A_{e,i_t}$

The order- $k$ tensor of raw counts is then constructed as: $T := (F^{(k-1)})^T A \in \mathbb{R}^{n^{k-1} \times n}$ which can be indexed and reshaped as

$T_{i_1, i_2, \dots, i_k} = \sum_{e=1}^m \prod_{t=1}^k A_{e,i_t}$

representing the number of hyperedges containing all $k$ nodes $(x_{i_1}, \dots, x_{i_k})$ (Bischof, 2020).

4. Normalized Excess for k-way Co-occurrence Tensors

The normalization framework for $k=2$ extends directly to higher orders. Given $T_{i_1, ..., i_k}$ (raw counts), empirical probabilities are computed via normalization: $P_{i_1, \dots, i_k} = \frac{T_{i_1, \dots, i_k}}{m}$ with $p_i = d_i/m$ as before. The "excess" form for the k-way case subtracts the completely independent model $\prod_{t=1}^k p_{i_t}$ from the observed probability: $P^{\rm excess}_{i_1, ..., i_k} = P_{i_1, ..., i_k} - \prod_{t=1}^k p_{i_t}$ or for raw counts: $T^{\rm excess}_{i_1, ..., i_k} = T_{i_1, ..., i_k} - m^{1 - k} \prod_{t=1}^k d_{i_t}$ This formulation generalizes pointwise mutual information (PMI) to $k$ -tuples: $\mathrm{PMI}_k(i_1, ..., i_k) = \log \frac{P_{i_1, ..., i_k}}{\prod_{t=1}^k p_{i_t}} = \log \left(1 + \frac{P^{\rm excess}_{i_1, ..., i_k}}{\prod_t p_{i_t}}\right)$ The multivariate PMI thus obtained connects to generalized mutual information measures (Bischof, 2020).

5. Applications in Word Representations, Recommendation, and Hypergraph Analysis

Normalized excess co-occurrence matrices and their high-order analogs are central to several domains:

Word Embedding Models: The skip-gram with negative sampling (word2vec) and GloVe algorithms can be viewed as implicitly factorizing a transformed excess co-occurrence matrix $C^{\rm excess}$ or its PMI variant. Extending to $k=3$ enables modeling of not just word-context pairs but also triple co-occurrences (e.g., short phrases, syntactic triples) via so-called word tensors, permitting richer compositional embeddings (Bischof, 2020).
Market-Basket and Recommendation Systems: Transactions form a natural hypergraph, with nodes as items and edges as baskets. The third-order tensor $T_{i,j,k}$ enumerates tri-item co-occurrences, and $T^{\rm excess}$ reveals surprising triple relationships, supporting higher-order clustering and embedding (Bischof, 2020).
Hypergraph Community Detection and Similarity: Beyond standard Laplacians based on $C = A^T A$ , higher-order excess co-occurrence tensors serve as refined similarity kernels, facilitating detection of community structure and similarity patterns not observable at the pairwise level (Bischof, 2020).

6. Algorithmic Properties and Computational Considerations

The face-splitting product provides an algorithmically tractable and highly parallelizable method for constructing k-way co-occurrence tensors. By leveraging rowwise Kronecker products, it systematically encodes all k-tuple co-occurrence frequencies without explicit enumeration of all hyperedge membership configurations, enabling scalable learning and statistical testing of higher-order interactions. The excess normalization eliminates high background rates, yielding signals of statistical dependence among nodes across a diverse range of graph and hypergraph-based data structures (Bischof, 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Higher order co-occurrence tensors for hypergraphs via face-splitting (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Normalized Excess Co-occurrence Matrix.