Recurrent Distance Filtering for Graph Representation Learning
Abstract: Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our approach stems from the recent breakthroughs in long-range modeling provided by deep state-space models: for a given target node, our model aggregates other nodes by their shortest distances to the target and uses a linear RNN to encode the sequence of hop representations. The linear RNN is parameterized in a particular diagonal form for stable long-range signal propagation and is theoretically expressive enough to encode the neighborhood hierarchy. With no need for positional encoding, we empirically show that the performance of our model is comparable to or better than that of state-of-the-art graph transformers on various benchmarks, with a significantly reduced computational cost. Our code is open-source at https://github.com/skeletondyh/GRED.
- Shortest path networks for graph property prediction. In Learning on Graphs Conference, 2022.
- Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In ICML, 2019.
- On the bottleneck of graph neural networks and its practical implications. In ICLR, 2021.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- Rajendra Bhatia. Matrix analysis. Springer Science & Business Media, 2013.
- Guy E Blelloch. Prefix sums and their applications, 1990.
- Residual gated graph convnets. arXiv preprint arXiv:1711.07553, 2017.
- Structure-aware transformer for graph representation learning. In ICML, 2022.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
- Rethinking attention with performers. In ICLR, 2021.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Principal neighbourhood aggregation for graph nets. In NeurIPS, 2020.
- Tri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691, 2023.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. In NeurIPS, 2022.
- Language modeling with gated convolutional networks. In ICML, 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Benchmarking graph neural networks. JMLR, 2022a.
- Graph neural networks with learnable structural and positional representations. In ICLR, 2022b.
- Long range graph benchmark. In NeurIPS, 2022c.
- How powerful are k-hop message passing graph neural networks. In NeurIPS, 2022.
- Robert W Floyd. Algorithm 97: shortest path. Communications of the ACM, 1962.
- Hungry hungry hippos: Towards language modeling with state space models. In ICLR, 2023.
- Neural message passing for quantum chemistry. In ICML, 2017.
- It’s raw! audio generation with state-space models. In ICML, 2022.
- Hippo: Recurrent memory with optimal polynomial projections. In NeurIPS, 2020.
- Efficiently modeling long sequences with structured state spaces. In ICLR, 2022a.
- On the parameterization and initialization of diagonal state space models. In NeurIPS, 2022b.
- How to train your hippo: State space models with generalized orthogonal basis projections. In ICLR, 2023.
- Diagonal state spaces are as effective as structured state spaces. In NeurIPS, 2022.
- Liquid structural state-space models. In ICLR, 2023.
- Long short-term memory. Neural computation, 1997.
- Global self-attention as a replacement for graph convolution. In SIGKDD, 2022.
- Pure transformers are powerful graph learners. In NeurIPS, 2022.
- Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
- Reformer: The efficient transformer. In ICLR, 2020.
- Rethinking graph transformers with spectral attention. NeurIPS, 34:21618–21629, 2021.
- Distance encoding: Design provably more powerful neural networks for graph representation learning. In NeurIPS, 2020.
- Approximation and optimization theory for linear continuous-time recurrent neural networks. JMLR, 2022.
- Structured state space models for in-context reinforcement learning. In NeurIPS, 2023.
- Graph inductive biases in transformers without message passing. In ICML, 2023a.
- Mega: moving average equipped gated attention. In ICLR, 2023b.
- Path neural networks: Expressive and accurate graph neural networks. In ICML, 2023.
- S4nd: Modeling images and videos as multidimensional signals using state spaces. In NeurIPS, 2022.
- k-hop graph neural networks. Neural Networks, 2020.
- On the universality of linear recurrences followed by nonlinear projections. In ICML Workshop on High-dimensional Learning Dynamics, 2023a.
- Resurrecting recurrent neural networks for long sequences. In ICML, 2023b.
- Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048, 2023.
- Recipe for a general, powerful, scalable graph transformer. In NeurIPS, 2022.
- Simplified state space layers for sequence modeling. In ICLR, 2023.
- Social influence analysis in large-scale networks. In SIGKDD, 2009.
- Long range arena: A benchmark for efficient transformers. In ICLR, 2021.
- Understanding over-squashing and bottlenecks on graphs via curvature. In ICLR, 2022.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Attention is all you need. In NeurIPS, 2017.
- Graph attention networks. In ICLR, 2018.
- Pretraining without attention. arXiv preprint arXiv:2212.10544, 2022.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
- Stephen Warshall. A theorem on boolean matrices. Journal of the ACM (JACM), 1962.
- Lite transformer with long-short range attention. In ICLR, 2020.
- Representing long-range context for graph neural networks with global attention. In NeurIPS, 2021.
- How powerful are graph neural networks? In ICLR, 2019.
- Do transformers really perform badly for graph representation? In NeurIPS, 2021.
- Graph convolutional neural networks for web-scale recommender systems. In SIGKDD, 2018.
- Deep sets. In NeurIPS, 2017.
- Rethinking the expressive power of gnns via graph biconnectivity. In ICLR, 2023.
- Gated recurrent neural networks discover attention. arXiv preprint arXiv:2309.01775, 2023a.
- Online learning of long range dependencies. arXiv preprint arXiv:2305.15947, 2023b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.