Normalized Excess Co-occurrence Matrix
- Normalized excess co-occurrence matrix is a method that measures how often tuples of nodes co-occur beyond expectations based on independent occurrence.
- It employs incidence matrices, marginal probability normalization, and the face-splitting product to extend pairwise counts to higher-order tensors.
- Applications span word embedding, market basket analysis, and hypergraph community detection by revealing statistically significant interactions.
A normalized excess co-occurrence matrix (and its higher-order tensor analogs) provides a measure of how much more frequently tuples of nodes (typically words, items, or graph vertices) co-occur within a collection of groups (hyperedges, contexts, baskets) than would be expected if these nodes occurred independently. This construct is foundational in hypergraph theory, NLP, and data mining, generalizing the standard co-occurrence matrix and mutual information to arbitrary tuple size via the face-splitting (rowwise Khatri–Rao) product, yielding excess co-occurrence tensors and multivariate pointwise mutual information (Bischof, 2020).
1. Incidence Matrix and Pairwise Co-occurrence
The starting point is a bipartite relationship between a finite set of nodes and a family of hyperedges . The binary incidence matrix is defined by if node is present in hyperedge , and $0$ otherwise.
The unnormalized pairwise co-occurrence matrix is calculated as , an matrix where entry counts the number of hyperedges containing both and . The degree records the number of hyperedges containing node , and gives the total number of node-edge incidences (Bischof, 2020).
2. Normalization and the Excess Co-occurrence Matrix
To interpret co-occurrence significance, a null model assumes independent node participation in edges. The empirical marginal probability for node is . The matrix of empirical pairwise probabilities is , and under independence, the reference probability is the rank-one matrix with entries .
The excess (or normalized) co-occurrence quantifies deviation from this independent baseline:
- Probability form:
- Raw-count form: where is the vector of node degrees. These highlight pairs co-occurring more or less frequently than independence predicts (Bischof, 2020).
3. Higher-Order Co-occurrence via Face-Splitting Product
Pairwise co-occurrence generalizes to k-way co-occurrence for arbitrary order through the face-splitting (transpose Khatri–Rao) product. For , define $F^{(k-1)} = A \fsplit \cdots \fsplit A$ (with factors), yielding a matrix with
The order- tensor of raw counts is then constructed as: which can be indexed and reshaped as
representing the number of hyperedges containing all nodes (Bischof, 2020).
4. Normalized Excess for k-way Co-occurrence Tensors
The normalization framework for extends directly to higher orders. Given (raw counts), empirical probabilities are computed via normalization: with as before. The "excess" form for the k-way case subtracts the completely independent model from the observed probability: or for raw counts: This formulation generalizes pointwise mutual information (PMI) to -tuples: The multivariate PMI thus obtained connects to generalized mutual information measures (Bischof, 2020).
5. Applications in Word Representations, Recommendation, and Hypergraph Analysis
Normalized excess co-occurrence matrices and their high-order analogs are central to several domains:
- Word Embedding Models: The skip-gram with negative sampling (word2vec) and GloVe algorithms can be viewed as implicitly factorizing a transformed excess co-occurrence matrix or its PMI variant. Extending to enables modeling of not just word-context pairs but also triple co-occurrences (e.g., short phrases, syntactic triples) via so-called word tensors, permitting richer compositional embeddings (Bischof, 2020).
- Market-Basket and Recommendation Systems: Transactions form a natural hypergraph, with nodes as items and edges as baskets. The third-order tensor enumerates tri-item co-occurrences, and reveals surprising triple relationships, supporting higher-order clustering and embedding (Bischof, 2020).
- Hypergraph Community Detection and Similarity: Beyond standard Laplacians based on , higher-order excess co-occurrence tensors serve as refined similarity kernels, facilitating detection of community structure and similarity patterns not observable at the pairwise level (Bischof, 2020).
6. Algorithmic Properties and Computational Considerations
The face-splitting product provides an algorithmically tractable and highly parallelizable method for constructing k-way co-occurrence tensors. By leveraging rowwise Kronecker products, it systematically encodes all k-tuple co-occurrence frequencies without explicit enumeration of all hyperedge membership configurations, enabling scalable learning and statistical testing of higher-order interactions. The excess normalization eliminates high background rates, yielding signals of statistical dependence among nodes across a diverse range of graph and hypergraph-based data structures (Bischof, 2020).