Self-supervised Graph-level Representation Learning with Local and Global Structure

Published 8 Jun 2021 in cs.LG | (2106.04113v1)

Abstract: This paper studies unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks such as molecule properties prediction in drug and material discovery. Existing methods mainly focus on preserving the local similarity structure between different graph instances but fail to discover the global semantic structure of the entire data set. In this paper, we propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Specifically, besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters. An efficient online expectation-maximization (EM) algorithm is further developed for learning the model. We evaluate GraphLoG by pre-training it on massive unlabeled graphs followed by fine-tuning on downstream tasks. Extensive experiments on both chemical and biological benchmark data sets demonstrate the effectiveness of the proposed approach.

Abstract PDF Upgrade to Chat

Authors (5)

Citations (183)

View on Semantic Scholar

Summary

Self-supervised Graph-level Representation Learning with Local and Global Structure

The study presented focuses on the self-supervised and unsupervised learning of whole-graph representations, a subject with profound implications for fields such as molecular property prediction and biological network analysis. Existing methodologies predominantly emphasize preserving local similarity structures across graph instances but often overlook global semantic structures. Addressing this gap, the paper introduces GraphLoG, a framework designed to enhance self-supervised learning by incorporating both local-instance and global-semantic structures.

GraphLoG innovatively combines hierarchical prototypes to capture global semantic clusters in graph data sets. This comprehensive approach is supported by an efficient online expectation-maximization (EM) algorithm, pushing the boundaries of traditional methods by learning more holistic graph representations. Following pre-training on extensive unlabeled graph data, GraphLoG is evaluated through fine-tuning on various downstream tasks, showcasing its effectiveness across different domains, especially in chemical and biological data sets.

Graph-level representation is fundamental in applications ranging from drug discovery to circuit design. As data labeling remains a costly and resource-intensive task, self-supervised methodologies like GraphLoG are crucial. Graph Neural Networks (GNNs) have traditionally required large labeled data sets for training. GraphLoG counters this limitation by leveraging self-supervised learning frameworks akin to those that have succeeded in NLP and computer vision.

The GraphLoG framework distinguishes itself by capturing both local and global structures of graph data. Local-instance learning ensures that similar graphs remain proximate in latent space, whereas global-semantic learning organizes graph representations into hierarchically structured semantic clusters. This dual-structure preservation is realized through a GNN-based embedding process combined with a model of hierarchical prototypes, ensuring semantically similar graphs are positioned closely within the embedding space.

GraphLoG's theoretical appeal is empirically validated across several benchmarks. Specifically, it achieves superior performance in drug discovery datasets and protein function predictions compared to other recent self-supervised learning techniques. The results highlight a notable performance increase, particularly in terms of Average ROC-AUC scores.

The implications of GraphLoG are multifaceted. The framework's innovative model of integrating local and global semantic insights not only enhances the immediate applicability of GNNs in scientific domains but also poses exciting prospects for future exploration in AI. Its approach to semantic learning could catalyze further advancements in the seamless fusion of graph representation learning with semantic clustering, opening avenues for more nuanced AI system designs.

Going forward, potential enhancements to GraphLoG might include refining global structure modeling techniques and integrating pre-training with fine-tuning operations. Furthermore, extending the framework's application to new domains like sociology and material science could unfold additional dimensions in both theoretical investigation and practical implementations.

Summary

The paper presents GraphLoG, a unified framework that addresses the shortcomings of existing self-supervised graph representation algorithms by preserving both local-instance and global-semantic structures in graph data sets. Through empirical evaluation, it effectively outperforms prior methods in various key domains, setting a precedent for future research in multi-scale graph semantics within self-supervised learning paradigms.

Markdown Report Issue