A maximum entropy approach to separating noise from signal in bimodal affiliation networks

Published 6 Jul 2016 in physics.soc-ph, physics.data-an, and stat.AP | (1607.01735v1)

Abstract: In practice, many empirical networks, including co-authorship and collocation networks are unimodal projections of a bipartite data structure where one layer represents entities, the second layer consists of a number of sets representing affiliations, attributes, groups, etc., and an inter-layer link indicates membership of an entity in a set. The edge weight in the unimodal projection, which we refer to as a co-occurrence network, counts the number of sets to which both end-nodes are linked. Interpreting such dense networks requires statistical analysis that takes into account the bipartite structure of the underlying data. Here we develop a statistical significance metric for such networks based on a maximum entropy null model which preserves both the frequency sequence of the individuals/entities and the size sequence of the sets. Solving the maximum entropy problem is reduced to solving a system of nonlinear equations for which fast algorithms exist, thus eliminating the need for expensive Monte-Carlo sampling techniques. We use this metric to prune and visualize a number of empirical networks.