Network Filtering for Big Data: Triangulated Maximally Filtered Graph

Published 10 May 2015 in cs.DS, cond-mat.stat-mech, and cs.IR | (1505.02445v2)

Abstract: We propose a network-filtering method, the Triangulated Maximally Filtered Graph (TMFG), that provides an approximate solution to the Weighted Maximal Planar Graph problem. The underlying idea of TMFG consists in building a triangulation that maximizes a score function associated with the amount of information retained by the network. TMFG uses as weights any arbitrary similarity measure to arrange data into a meaningful network structure that can be used for clustering, community detection and modeling. The method is fast, adaptable and scalable to very large datasets, it allows online updating and learning as new data can be inserted and deleted with combinations of local and non-local moves. TMFG permits readjustments of the network in consequence of changes in the strength of the similarity measure. The method is based on local topological moves and can therefore take advantage of parallel and GPUs computing. We discuss how this network-filtering method can be used intuitively and efficiently for big data studies and its significance from an information-theoretic perspective.

Abstract PDF Upgrade to Chat

Citations (165)

View on Semantic Scholar

Summary

Triangulated Maximally Filtered Graph for Network Filtering in Big Data

The paper by Guido Previde Massara et al. introduces a novel network-filtering algorithm called the Triangulated Maximally Filtered Graph (TMFG), designed to address challenges in handling voluminous datasets. The TMFG method is positioned as an approximately optimal solution to the Weighted Maximal Planar Graph (WMPG) problem, with the goal of effectively retaining meaningful network structures from dense matrices of weights. This technique is particularly tailored for applications in clustering, community detection, and modeling within the framework of complex systems.

TMFG Methodology and Construction

The TMFG algorithm begins its process by constructing a network as a planar graph, iteratively inserting vertices using the $T_2$ local move, which preserves planarity. This insertion is guided by an objective function, termed as the score function, which is intended to maximize the retained information within the graph structure. The algorithm incorporates adaptability through the use of local operators like $T_1$, $T_2$, and $A$ moves, and it supports dynamic updates to accommodate incoming data changes. A noteworthy aspect of the TMFG algorithm is its aptitude for parallelization, due to the inherently local nature of these operations, rendering it scalable for large datasets prevalent in big data applications.

Computational Efficiency and Performance

A significant highlight of TMFG is its computational efficiency, scaling approximately as $O(p^2)$ with respect to matrix dimensions, contrasted with the $O(p^3)$ scaling of the existing Planar Maximally Filtered Graph (PMFG) methodology. This efficiency enables TMFG to be applied feasibly to larger datasets, which is often an impediment in network analysis tasks. Furthermore, the TMFG has demonstrated competitive performance in terms of the weight sum of retained edges when compared to the PMFG, across various weight distributions including beta, Pareto, random matrix correlations, and real market data.

Information-Theoretic Implications and Modeling

From an information-theoretic standpoint, the paper suggests optimizing the TMFG algorithm with the objective of minimizing the Kullback-Leibler divergence between an unknown true probability distribution and the modeled distribution represented by the filtered graph. This involves selecting moves that minimize the increase in model uncertainty, which is captured by the change in entropy associated with the graph structure. This approach highlights the TMFG's potential role in probabilistic modeling of joint distributions, particularly leveraging its chordal structure to facilitate efficient inference mechanisms.

Future Directions and Applications

The TMFG framework opens new perspectives for network filtering and sparse modeling, with implications that extend beyond planar topologies. The authors indicate potential for extending the technique to higher-genus embeddings and exploring applications across diverse domains such as finance, biology, and social systems. These implementations could provide valuable insights into the dynamics of complex systems and offer robust tools for risk management and structural analysis.

Overall, the TMFG represents a promising advancement in the realm of network-filtering methods, providing a balance between computational feasibility and fidelity in retaining network information, thereby addressing key challenges in the analysis of large-scale, complex datasets.