Recurrent Distance Filtering for Graph Representation Learning

Published 3 Dec 2023 in cs.LG and cs.NE | (2312.01538v3)

Abstract: Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our approach stems from the recent breakthroughs in long-range modeling provided by deep state-space models: for a given target node, our model aggregates other nodes by their shortest distances to the target and uses a linear RNN to encode the sequence of hop representations. The linear RNN is parameterized in a particular diagonal form for stable long-range signal propagation and is theoretically expressive enough to encode the neighborhood hierarchy. With no need for positional encoding, we empirically show that the performance of our model is comparable to or better than that of state-of-the-art graph transformers on various benchmarks, with a significantly reduced computational cost. Our code is open-source at https://github.com/skeletondyh/GRED.

Abstract PDF HTML Upgrade to Chat

References (68)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces GRED, which aggregates node features based on shortest distances using a linear recurrent network to overcome traditional MPNN limitations.
It employs a parallelizable recurrent unit that removes the need for positional encoding, resulting in reduced computational complexity and training time.
Empirical results show that GRED outperforms state-of-the-art graph transformers and MPNNs on benchmarks, with improved efficiency and GPU memory usage.

Introduction to Graph Learning

Graphs are a common method to model complex relationships and structures, such as those found in social networks or molecular biology. Traditional Graph Neural Networks (GNNs), known as Message Passing Neural Networks (MPNNs), face limitations in effectively incorporating the influence of remotely connected nodes, as they require multiple rounds of information exchange to reach distant parts of the graph.

Advancements in Graph Representation

Graph transformers, an evolution of MPNNs, attempt to solve this by implementing global attention mechanisms, where each node in a graph can directly attend to all other nodes. This allows for a significantly expanded scope of information sharing but at the cost of increased computational demands and the necessity for specialized positional encodings to capture the graph structure.

Introducing GRED

In response to these challenges, the proposed model, Graph Recurrent Encoding by Distance (GRED), capitalizes on recent successes in long-range modeling for sequential data. GRED aggregates information from nodes based on their shortest distances to a specific target node and leverages a parallelizable linear recurrent neural network to encode these aggregated features. This architecture eliminates the need for positional encoding and introduces notable reductions in computational complexity.

Practical Achievements

GRED demonstrates comparable or superior performance to state-of-the-art graph transformers on a variety of benchmarks, with significantly increased computational efficiency. It notably outperforms MPNNs due to its ability to encode more expressive features from larger node neighborhoods. Theoretical analysis confirms that GRED's capabilities transcend those of one-hop MPNNs due to its potent linear recurrent unit and injective multiset functions. In practice, GRED achieves results that not only hold up against the best graph transformers but do so with drastically reduced training times and GPU memory consumption, highlighting its potential as a powerful tool for graph representation learning.