Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection
Abstract: Graph Neural Networks (GNNs) have improved unsupervised community detection of clustered nodes due to their ability to encode the dual dimensionality of the connectivity and feature information spaces of graphs. Identifying the latent communities has many practical applications from social networks to genomics. Current benchmarks of real world performance are confusing due to the variety of decisions influencing the evaluation of GNNs at this task. To address this, we propose a framework to establish a common evaluation protocol. We motivate and justify it by demonstrating the differences with and without the protocol. The W Randomness Coefficient is a metric proposed for assessing the consistency of algorithm rankings to quantify the reliability of results under the presence of randomness. We find that by ensuring the same evaluation criteria is followed, there may be significant differences from the reported performance of methods at this task, but a more complete evaluation and comparison of methods is possible.
- Journal of machine learning research 13(2) (2012)
- In: International conference on machine learning, pp. 115–123. PMLR (2013)
- In: 2016 Annual Conference on Information Science and Systems (CISS), pp. 584–589. IEEE (2016)
- Tech. rep., Carnegie-mellon univ pittsburgh pa school of computer Science (1998)
- arXiv preprint arXiv:2003.00982 (2020)
- arXiv preprint arXiv:1912.09893 (2019)
- Field, A.P.: Kendall’s coefficient of concordance. Encyclopedia of Statistics in Behavioral Science 2, 1010–11 (2005)
- In: Proceedings of the third ACM conference on Digital libraries, pp. 89–98 (1998)
- In: International Conference on Machine Learning, pp. 4116–4126. PMLR (2020)
- In: proceedings of the 25th international conference on world wide web, pp. 507–517 (2016)
- arXiv preprint arXiv:2103.09430 (2021)
- IEEE Transactions on Knowledge and Data Engineering (2021)
- arXiv preprint arXiv:2103.14958 (2021)
- arXiv preprint arXiv:1609.02907 (2016)
- arXiv preprint arXiv:1611.07308 (2016)
- arXiv preprint arXiv:2005.08225 (2020)
- arXiv preprint arXiv:2211.12875 (2022)
- In: Proceedings of the ACM Web Conference 2022, pp. 1392–1403 (2022)
- Information Retrieval 3(2), 127–163 (2000)
- In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5145–5152. IEEE (2021)
- arXiv preprint arXiv:1902.06673 (2019)
- arXiv preprint arXiv:2007.08663 (2020)
- In: Proceedings of the 2020 genetic and evolutionary computation conference, pp. 533–541 (2020)
- arXiv preprint arXiv:2203.00112 (2022)
- The Journal of Machine Learning Research 22(1), 7459–7478 (2021)
- In: AAAI: DLG-AAAI’22 (2022)
- Salzberg, S.L.: On comparing classifiers: Pitfalls to avoid and a recommended approach. Data mining and knowledge discovery 1, 317–328 (1997)
- Schaeffer, S.E.: Graph clustering. Computer science review 1(1), 27–64 (2007)
- arXiv preprint arXiv:1811.05868 (2018)
- In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 990–998 (2008)
- In: ICLR 2021 Workshop on Geometrical and Topological Representation Learning (2021)
- arXiv preprint arXiv:2006.16904 (2020)
- ICLR (Poster) 2(3), 4 (2019)
- arXiv preprint arXiv:1906.06532 (2019)
- 10.48550/ARXIV.2206.07897. URL https://arxiv.org/abs/2206.07897
- Cambridge university press (1994)
- nature 393(6684), 440–442 (1998)
- In: 2013 IEEE 13th international conference on data mining, pp. 1151–1156. IEEE (2013)
- Neurocomputing 415, 295–316 (2020)
- In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 386–394 (2021)
- arXiv preprint arXiv:2006.04131 (2020)
- Journal of artificial intelligence research 70, 409–472 (2021)
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.