Papers
Topics
Authors
Recent
Search
2000 character limit reached

Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

Published 23 Jul 2024 in cs.SI, cs.DS, and cs.IR | (2407.16850v1)

Abstract: Graphs are a fundamental data structure used to represent relationships in domains as diverse as the social sciences, bioinformatics, cybersecurity, the Internet, and more. One of the central observations in network science is that real-world graphs are globally sparse, yet contains numerous "pockets" of high edge density. A fundamental task in graph mining is to discover these dense subgraphs. Most common formulations of the problem involve finding a single (or a few) "optimally" dense subsets. But in most real applications, one does not care for the optimality. Instead, we want to find a large collection of dense subsets that covers a significant fraction of the input graph. We give a mathematical formulation of this problem, using a new definition of regularly triangle-rich (RTR) families. These families capture the notion of dense subgraphs that contain many triangles and have degrees comparable to the subgraph size. We design a provable algorithm, RTRExtractor, that can discover RTR families that approximately cover any RTR set. The algorithm is efficient and is inspired by recent results that use triangle counts for community testing and clustering. We show that RTRExtractor has excellent behavior on a large variety of real-world datasets. It is able to process graphs with hundreds of millions of edges within minutes. Across many datasets, RTRExtractor achieves high coverage using high edge density datasets. For example, the output covers a quarter of the vertices with subgraphs of edge density more than (say) $0.5$, for datasets with 10M+ edges. We show an example of how the output of RTRExtractor correlates with meaningful sets of similar vertices in a citation network, demonstrating the utility of RTRExtractor for unsupervised graph discovery tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. 2024. https://github.com/amazon-science/amazon-RTRExtractor.
  2. Subgraph Neural Networks. In NeurIPS 2020.
  3. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems, Vol. 18.
  4. Reid Andersen and Kumar Chellapilla. 2009. Finding Dense Subgraphs with Size Bounds. In Algorithms and Models for the Web-Graph. 25–37.
  5. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB Endow. 5, 6 (2012), 574–585.
  6. Complexity of finding dense subgraphs. Discrete Applied Mathematics 121, 1 (2002), 15–26.
  7. Correlation Clustering. Machine Learning 56, 1 (01 Jul 2004), 89–113.
  8. A spectral theorem on the cluster structure of real world graphs. https://tr.soe.ucsc.edu/research/technical-reports/UCSC-SOE-23-09
  9. Cohesion and performance in groups: a meta-analytic clarification of construct relations. The Journal of applied psychology 88 6 (2003), 989–1004.
  10. Higher-order organization of complex networks. Science 353, 6295 (2016), 163–166.
  11. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (oct 2008), P10008.
  12. Finding densest k𝑘kitalic_k-connected subgraphs. Discrete Applied Mathematics 305 (Dec. 2021), 34–47.
  13. Flowless: Extracting Densest Subgraphs Without Flow Computations. In Proceedings of The Web Conference 2020 (WWW ’20). Association for Computing Machinery, 573–583.
  14. Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities (WSDM ’08). 95–106.
  15. Moses Charikar. 2000. Greedy Approximation Algorithms for Finding Dense Components in a Graph. In Approximation Algorithms for Combinatorial Optimization. 84–95.
  16. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. 1531–1555.
  17. Jie Chen and Yousef Saad. 2012. Dense Subgraph Extraction with Application to Community Detection. IEEE Transactions on Knowledge and Data Engineering 24, 7 (2012), 1216–1230.
  18. Norishige Chiba and Takao Nishizeki. 1985. Arboricity and Subgraph Listing Algorithms. SIAM J. Comput. 14, 1 (1985), 210–223.
  19. J. Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. In Technical report, National Security Agency.
  20. Large Scale Density-friendly Graph Decomposition via Convex Programming (WWW ’17). 233–242.
  21. Extraction and classification of dense communities in the web. In Proceedings of the 16th International Conference on World Wide Web (WWW ’07). 461–470.
  22. Migration motif: a spatial - temporal pattern mining approach for financial markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’09). 1135–1144.
  23. A survey of community search over big graphs. The VLDB Journal 29, 1 (2020), 353–392.
  24. Uriel Feige. 2002. Relations between average case complexity and approximation complexity (STOC ’02). 534–543.
  25. D.R. Forsyth. 2010. Group Dynamics. Wadsworth.
  26. Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries. arXiv:2206.11140 [cs.LG]
  27. MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics (Oxford, England) 22, 14 (2006), e150—7.
  28. Thomas M. J. Fruchterman and Edward M. Reingold. 1991. Graph drawing by force-directed placement. Software: Practice and Experience 21, 11 (1991), 1129–1164.
  29. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB ’05). 721–732.
  30. Piggybacking on social networks. Proc. VLDB Endow. 6, 6 (2013), 409–420.
  31. M. Girvan and M. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826.
  32. Jennifer Golbeck. 2013. Chapter 3 - Network Structure and Measures. In Analyzing the Social Web. 25–44.
  33. A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report. USA.
  34. Decompositions of Triangle-Dense Graphs. Innovations in Theoretical Computer Science (2014), 471–482.
  35. Johan Håstad. 1999. Clique is hard to approximate within 1-ϵitalic-ϵ\epsilonitalic_ϵ. Acta Mathematica 182, 1 (01 Mar 1999), 105–142.
  36. Xin Huang and Laks V. S. Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949–960.
  37. Adaptive epileptic seizure prediction system. IEEE Trans. Biomed. Eng. 50, 5 (2003), 616–627.
  38. A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1.
  39. 3-HOP: a high-compression indexing scheme for reachability query. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD ’09). 813–826.
  40. Subhash Khot. 2006. Ruling Out PTAS for Graph Min-Bisection, Dense k-Subgraph, and Bipartite Clique. SIAM J. Comput. 36, 4 (2006), 1025–1071.
  41. Aritra Konar and Nicholas D. Sidiropoulos. 2022. The Triangle-Densest-K-Subgraph Problem: Hardness, Lovász Extension, and Application to Document Summarization. Proceedings of the AAAI Conference on Artificial Intelligence 36, 4 (Jun. 2022), 4075–4082.
  42. Trawling the Web for emerging cyber-communities. Computer Networks 31, 11 (1999), 1481–1493.
  43. A Survey on the Densest Subgraph Problem and its Variants. arXiv:2303.14467 [cs.DS]
  44. A Survey of Algorithms for Dense Subgraph Discovery. Springer US, 303–336.
  45. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
  46. Statistical Properties of Community Structure in Large Social and Information Networks. In Proceedings of the 17th International Conference on World Wide Web (WWW ’08). 695–704.
  47. David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (jul 1983), 417–427. https://doi.org/10.1145/2402.322385
  48. Atsushi Miyauchi and Naonori Kakimura. 2018. Finding a Dense Subgraph with Sparse Cut (CIKM ’18). 547–556.
  49. Atsushi Miyauchi and Yasushi Kawase. 2015. What Is a Network Community? A Novel Quality Function and Detection Algorithms (CIKM ’15). 1471–1480.
  50. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76 (Sep 2007), 036106. Issue 3.
  51. The map equation. The European Physical Journal Special Topics 178, 1 (Nov. 2009), 13–23.
  52. Martin Rosvall and Carl T. Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 4 (2008), 1118–1123.
  53. Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions. In World Wide Web (WWW). 927–937.
  54. Local graph sparsification for scalable clustering (SIGMOD ’11). 721–732.
  55. Thomas Schank and Dorothea Wagner. 2005. Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study. In Experimental and Efficient Algorithms. Springer Berlin / Heidelberg, 606–609.
  56. C. Seshadhri. 2023. Some Vignettes on Subgraph Counting Using Graph Orientations. In International Conference on Database Theory (ICDT 2023), Vol. 255. 3:1–3:10.
  57. CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms (ICDM ’16). 469–478.
  58. Renata Sotirov. 2020. On solving the densest k-subgraph problem on large graphs. Optimization Methods and Software 35, 6 (2020), 1160–1178.
  59. ArnetMiner: Extraction and Mining of Academic Social Networks. SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2008), 990–998.
  60. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9, 5233 (2019).
  61. Charalampos E. Tsourakakis. 2014. A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem. CoRR abs/1405.1477 (2014).
  62. Charalampos E. Tsourakakis. 2015. The K-clique Densest Subgraph Problem. In Proceedings of the 24th International Conference on World Wide Web. 1122–1132.
  63. Scalable Motif-aware Graph Clustering (WWW ’17). 1451–1460.
  64. A Correlation Clustering Framework for Community Detection (WWW ’18). 439–448.
  65. Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9 (2012), 812–823.
  66. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4, 2 (2010), 58–68.
  67. Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery. Proc. ACM Manag. Data 1, 2, Article 169 (2023).
  68. Bin Zhang and Steve Horvath. 2005. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology 4 (2005), Article17.

Summary

  • The paper introduces RTRExtractor, which leverages triangle-rich sets to efficiently identify dense subgraphs in large graphs.
  • Experimental results reveal that RTRExtractor covers 24.2% of vertices with subgraphs of edge density above 0.5 in large-scale datasets.
  • This method has broad implications, enhancing community detection in social network analysis, bioinformatics, and cybersecurity.

An Expert Analysis of "Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets"

Introduction

The paper "Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets" introduces a novel approach to dense subgraph discovery by leveraging triangle-rich sets. The study is inspired by the observation that real-world graphs, although generally sparse, contain numerous dense substructures. The authors present RTRExtractor, an algorithm designed to efficiently identify these dense communities in large-scale graphs, with a focus on regularly triangle-rich (RTR) sets. This approach has significant implications for various domains, including social network analysis, bioinformatics, and cybersecurity.

Key Contributions

RTR Sets and Triangle Density

The paper introduces the concept of regularly triangle-rich (RTR) sets, which are subsets of vertices with comparable degrees and a high concentration of triangles. Unlike traditional dense subgraph formulations that rely primarily on edge density, RTR sets account for triangle density, offering a more nuanced measure that aligns closely with the structure of real-world communities.

RTRExtractor Algorithm

RTRExtractor is designed to identify RTR sets within a graph efficiently. The algorithm distinguishes itself by focusing on triangle participation as a criterion for edge retention, thereby ensuring that the discovered subgraphs maintain a high internal density. This approach has been validated through theoretical proofs offering guarantees on its output quality, demonstrating that the algorithm can cover a significant portion of a graph with dense subgraphs.

Experimental Results

The authors showcase the efficacy of RTRExtractor across various large-scale datasets, highlighting its superior performance in terms of both speed and output quality compared to other state-of-the-art algorithms. For instance, on the Orkut social network dataset, RTRExtractor covers 24.2% of the vertices with subgraphs of edge density greater than 0.5, significantly outperforming other methods. The results confirm RTRExtractor's ability to handle graphs with hundreds of millions of edges within minutes, making it a practical tool for real-world applications.

Implications and Future Directions

Practical Applications

The RTRExtractor algorithm has numerous practical applications, particularly in areas requiring unsupervised graph analysis. By effectively uncovering hidden dense structures within graphs, it serves as a powerful tool for tasks like community detection, motif finding, and even enhancing the performance of graph-based machine learning models.

Theoretical Implications

From a theoretical standpoint, the study enriches the discourse on dense subgraph discovery by introducing an innovative framework based on triangle density. This lays the groundwork for further explorations into alternative density metrics that may capture the nuances of graph structure more effectively.

Future Developments

Potential future advancements could involve refining the algorithm to improve its coverage further or adapting its framework to other types of density calculations. Additionally, exploring hierarchical clustering forms using RTR sets could yield richer insights into the layered structure of large networks.

Conclusion

The paper offers a well-founded algorithmic solution to the challenge of dense subgraph discovery by rethinking the role of triangle involvement in network structures. RTRExtractor's ability to efficiently locate numerous dense subgraphs across diverse datasets not only advances the field of graph mining but also opens up new avenues for both theoretical inquiry and practical application. As AI continues to evolve, tools like RTRExtractor will be invaluable for uncovering the complex, interwoven patterns inherent in vast data sets.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.