IMPA-HGAE:Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder

Published 7 Jun 2025 in cs.LG and cs.AI | (2506.06809v1)

Abstract: Self-supervised learning (SSL) methods have been increasingly applied to diverse downstream tasks due to their superior generalization capabilities and low annotation costs. However, most existing heterogeneous graph SSL models convert heterogeneous graphs into homogeneous ones via meta-paths for training, which only leverage information from nodes at both ends of meta-paths while underutilizing the heterogeneous node information along the meta-paths. To address this limitation, this paper proposes a novel framework named IMPA-HGAE to enhance target node embeddings by fully exploiting internal node information along meta-paths. Experimental results validate that IMPA-HGAE achieves superior performance on heterogeneous datasets. Furthermore, this paper introduce innovative masking strategies to strengthen the representational capacity of generative SSL models on heterogeneous graph data. Additionally, this paper discuss the interpretability of the proposed method and potential future directions for generative self-supervised learning in heterogeneous graphs. This work provides insights into leveraging meta-path-guided structural semantics for robust representation learning in complex graph scenarios.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel dual reconstruction strategy that leverages masking of both node features and meta-path-derived adjacency matrices.
The framework employs an attention-based message passing mechanism to integrate intermediate node information for enhanced graph representations.
Experimental results on datasets like DBLP, Freebase, ACM, and AMiner demonstrate significantly improved node classification performance over existing models.

Overview of IMPA-HGAE: Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder

The paper "IMPA-HGAE: Intra-Meta-Path Augmented Heterogeneous Graph Autoencoder" introduces a novel self-supervised learning framework designed for heterogeneous graphs, termed IMPA-HGAE. The framework addresses existing limitations in self-supervised learning models applied to heterogeneous graph data, specifically those that convert heterogeneous graphs to homogeneous graphs via meta-paths, which often fail to leverage information optimally from intermediate nodes along these meta-paths. IMPA-HGAE seeks to enhance node embeddings by thoroughly exploiting internal node information within meta-paths, offering improvements in the performance of self-supervised models across diverse datasets.

Main Contributions and Methodology

Of significant importance, IMPA-HGAE proposes a dual reconstruction strategy. This involves masking feature matrices and adjacency matrices derived from meta-paths, followed by their reconstruction as a learning objective. The core methodology extends beyond endpoint node analysis along meta-paths to effectively integrate feature information from intermediate nodes. The framework is novel in its dual focus on structural and semantic information using innovative masking strategies—random, degree-based, and attention-score-based—which are applied not only to node features but also meta-path-derived adjacency matrices to mitigate information loss.

The propagation strategy crucially relies on an attention mechanism, aggregating intra-meta-path node information for enhanced representation learning. The approach utilizes subgraph extraction and attention-based message passing, with a Graph Neural Network (GNN) underpinning the encoder phase to generate node embeddings, which, ultimately, are decoded and reconstructed for optimized loss calculations.

Experimental Validation

The experimental results furnished by IMPA-HGAE reveal marked improvements over a spectrum of baseline models, including well-established methods like HetGNN, HAN, DMGI, GraphMAE, and HeCo. The performance is notably validated across multiple datasets—DBLP, Freebase, ACM, and AMiner—where IMPA-HGAE consistently demonstrates superior metrics in node classification tasks, including Micro-F1, Macro-F1, and AUC scores. The ablation studies further dissect the effectiveness of various masking strategies and model components, highlighting the robust performance yielded by degree-based masking, which positively impacts model accuracy and F1-score.

Theoretical Implications and Future Directions

IMPA-HGAE significantly advances generative self-supervised learning for heterogeneous graphs and paves the way for leveraging complex graph structures in representation learning. Its successful integration of intermediate node information within meta-paths is a promising leap for future research endeavors focusing on enhancing the interpretative capacity and robustness of graph embeddings. Future work might explore optimizing the decoder's functionalities or employing auxiliary tasks to further refine the performance of generative models in handling intricate graph data.

Conclusion

Overall, IMPA-HGAE presents a refined approach to dealing with the nuances and challenges associated with heterogeneous graph learning. Its dual reconstruction philosophy and attention-centric propagation mechanisms reflect an astute understanding of graph semantics and structure. As self-supervised learning evolves, frameworks like IMPA-HGAE will likely shape the trajectory of generative learning models, providing deeper insights into effective graph representation strategies.