Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets
Abstract: Graphs are a fundamental data structure used to represent relationships in domains as diverse as the social sciences, bioinformatics, cybersecurity, the Internet, and more. One of the central observations in network science is that real-world graphs are globally sparse, yet contains numerous "pockets" of high edge density. A fundamental task in graph mining is to discover these dense subgraphs. Most common formulations of the problem involve finding a single (or a few) "optimally" dense subsets. But in most real applications, one does not care for the optimality. Instead, we want to find a large collection of dense subsets that covers a significant fraction of the input graph. We give a mathematical formulation of this problem, using a new definition of regularly triangle-rich (RTR) families. These families capture the notion of dense subgraphs that contain many triangles and have degrees comparable to the subgraph size. We design a provable algorithm, RTRExtractor, that can discover RTR families that approximately cover any RTR set. The algorithm is efficient and is inspired by recent results that use triangle counts for community testing and clustering. We show that RTRExtractor has excellent behavior on a large variety of real-world datasets. It is able to process graphs with hundreds of millions of edges within minutes. Across many datasets, RTRExtractor achieves high coverage using high edge density datasets. For example, the output covers a quarter of the vertices with subgraphs of edge density more than (say) $0.5$, for datasets with 10M+ edges. We show an example of how the output of RTRExtractor correlates with meaningful sets of similar vertices in a citation network, demonstrating the utility of RTRExtractor for unsupervised graph discovery tasks.
- 2024. https://github.com/amazon-science/amazon-RTRExtractor.
- Subgraph Neural Networks. In NeurIPS 2020.
- Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems, Vol. 18.
- Reid Andersen and Kumar Chellapilla. 2009. Finding Dense Subgraphs with Size Bounds. In Algorithms and Models for the Web-Graph. 25–37.
- Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB Endow. 5, 6 (2012), 574–585.
- Complexity of finding dense subgraphs. Discrete Applied Mathematics 121, 1 (2002), 15–26.
- Correlation Clustering. Machine Learning 56, 1 (01 Jul 2004), 89–113.
- A spectral theorem on the cluster structure of real world graphs. https://tr.soe.ucsc.edu/research/technical-reports/UCSC-SOE-23-09
- Cohesion and performance in groups: a meta-analytic clarification of construct relations. The Journal of applied psychology 88 6 (2003), 989–1004.
- Higher-order organization of complex networks. Science 353, 6295 (2016), 163–166.
- Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (oct 2008), P10008.
- Finding densest k𝑘kitalic_k-connected subgraphs. Discrete Applied Mathematics 305 (Dec. 2021), 34–47.
- Flowless: Extracting Densest Subgraphs Without Flow Computations. In Proceedings of The Web Conference 2020 (WWW ’20). Association for Computing Machinery, 573–583.
- Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities (WSDM ’08). 95–106.
- Moses Charikar. 2000. Greedy Approximation Algorithms for Finding Dense Components in a Graph. In Approximation Algorithms for Combinatorial Optimization. 84–95.
- Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. 1531–1555.
- Jie Chen and Yousef Saad. 2012. Dense Subgraph Extraction with Application to Community Detection. IEEE Transactions on Knowledge and Data Engineering 24, 7 (2012), 1216–1230.
- Norishige Chiba and Takao Nishizeki. 1985. Arboricity and Subgraph Listing Algorithms. SIAM J. Comput. 14, 1 (1985), 210–223.
- J. Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. In Technical report, National Security Agency.
- Large Scale Density-friendly Graph Decomposition via Convex Programming (WWW ’17). 233–242.
- Extraction and classification of dense communities in the web. In Proceedings of the 16th International Conference on World Wide Web (WWW ’07). 461–470.
- Migration motif: a spatial - temporal pattern mining approach for financial markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’09). 1135–1144.
- A survey of community search over big graphs. The VLDB Journal 29, 1 (2020), 353–392.
- Uriel Feige. 2002. Relations between average case complexity and approximation complexity (STOC ’02). 534–543.
- D.R. Forsyth. 2010. Group Dynamics. Wadsworth.
- Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries. arXiv:2206.11140 [cs.LG]
- MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics (Oxford, England) 22, 14 (2006), e150—7.
- Thomas M. J. Fruchterman and Edward M. Reingold. 1991. Graph drawing by force-directed placement. Software: Practice and Experience 21, 11 (1991), 1129–1164.
- Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB ’05). 721–732.
- Piggybacking on social networks. Proc. VLDB Endow. 6, 6 (2013), 409–420.
- M. Girvan and M. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826.
- Jennifer Golbeck. 2013. Chapter 3 - Network Structure and Measures. In Analyzing the Social Web. 25–44.
- A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report. USA.
- Decompositions of Triangle-Dense Graphs. Innovations in Theoretical Computer Science (2014), 471–482.
- Johan Håstad. 1999. Clique is hard to approximate within 1-ϵitalic-ϵ\epsilonitalic_ϵ. Acta Mathematica 182, 1 (01 Mar 1999), 105–142.
- Xin Huang and Laks V. S. Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949–960.
- Adaptive epileptic seizure prediction system. IEEE Trans. Biomed. Eng. 50, 5 (2003), 616–627.
- A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1.
- 3-HOP: a high-compression indexing scheme for reachability query. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD ’09). 813–826.
- Subhash Khot. 2006. Ruling Out PTAS for Graph Min-Bisection, Dense k-Subgraph, and Bipartite Clique. SIAM J. Comput. 36, 4 (2006), 1025–1071.
- Aritra Konar and Nicholas D. Sidiropoulos. 2022. The Triangle-Densest-K-Subgraph Problem: Hardness, Lovász Extension, and Application to Document Summarization. Proceedings of the AAAI Conference on Artificial Intelligence 36, 4 (Jun. 2022), 4075–4082.
- Trawling the Web for emerging cyber-communities. Computer Networks 31, 11 (1999), 1481–1493.
- A Survey on the Densest Subgraph Problem and its Variants. arXiv:2303.14467 [cs.DS]
- A Survey of Algorithms for Dense Subgraph Discovery. Springer US, 303–336.
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
- Statistical Properties of Community Structure in Large Social and Information Networks. In Proceedings of the 17th International Conference on World Wide Web (WWW ’08). 695–704.
- David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (jul 1983), 417–427. https://doi.org/10.1145/2402.322385
- Atsushi Miyauchi and Naonori Kakimura. 2018. Finding a Dense Subgraph with Sparse Cut (CIKM ’18). 547–556.
- Atsushi Miyauchi and Yasushi Kawase. 2015. What Is a Network Community? A Novel Quality Function and Detection Algorithms (CIKM ’15). 1471–1480.
- Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76 (Sep 2007), 036106. Issue 3.
- The map equation. The European Physical Journal Special Topics 178, 1 (Nov. 2009), 13–23.
- Martin Rosvall and Carl T. Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 4 (2008), 1118–1123.
- Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions. In World Wide Web (WWW). 927–937.
- Local graph sparsification for scalable clustering (SIGMOD ’11). 721–732.
- Thomas Schank and Dorothea Wagner. 2005. Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study. In Experimental and Efficient Algorithms. Springer Berlin / Heidelberg, 606–609.
- C. Seshadhri. 2023. Some Vignettes on Subgraph Counting Using Graph Orientations. In International Conference on Database Theory (ICDT 2023), Vol. 255. 3:1–3:10.
- CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms (ICDM ’16). 469–478.
- Renata Sotirov. 2020. On solving the densest k-subgraph problem on large graphs. Optimization Methods and Software 35, 6 (2020), 1160–1178.
- ArnetMiner: Extraction and Mining of Academic Social Networks. SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2008), 990–998.
- From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9, 5233 (2019).
- Charalampos E. Tsourakakis. 2014. A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem. CoRR abs/1405.1477 (2014).
- Charalampos E. Tsourakakis. 2015. The K-clique Densest Subgraph Problem. In Proceedings of the 24th International Conference on World Wide Web. 1122–1132.
- Scalable Motif-aware Graph Clustering (WWW ’17). 1451–1460.
- A Correlation Clustering Framework for Community Detection (WWW ’18). 439–448.
- Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9 (2012), 812–823.
- On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4, 2 (2010), 58–68.
- Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery. Proc. ACM Manag. Data 1, 2, Article 169 (2023).
- Bin Zhang and Steve Horvath. 2005. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology 4 (2005), Article17.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.