GNN-FiLM: Feature-wise Modulated GNNs
- The paper introduces a novel feature-wise linear modulation mechanism that conditions edge messages on target node states to enhance context-sensitive aggregation.
- It employs per-dimension scaling and translation to modulate incoming messages, enabling selective information flow and improved performance on benchmark tasks.
- Empirical results on datasets like QM9 and PPI demonstrate that GNN-FiLM achieves competitive accuracy and parameter efficiency compared to traditional GNN models.
GNN-FiLM, or Graph Neural Networks with Feature-wise Linear Modulation, is a class of message-passing neural architectures that extend standard GNNs by introducing a feature-wise, target-conditioned affine transformation to each incoming edge message. Rather than computing edge messages solely from the source node representation, GNN-FiLM leverages the target node’s hidden state to modulate incoming messages via per-dimension scaling and translation. This mechanism enables richer, target-conditional information flow, facilitating selective message gating and context-sensitive aggregation. The model has been empirically validated on diverse benchmark tasks, showing competitive or superior results, particularly on graph regression.
1. Feature-wise Linear Modulation Mechanism
GNN-FiLM modifies canonical GNN message passing by conditioning edge messages on the target node's state. For each node at layer , with hidden state , and for each incoming edge (where indicates edge type), the message is constructed as follows:
- The base message from source: with .
- Target-derived modulation: The modulation parameters are computed by a small neural network , typically a single linear layer resulting in for and .
- Modulated message: $m'_{u \to v}^{(t)} = \gamma_{\ell, v}^{(t)} \odot m_{u \to v}^{(t)} + \beta_{\ell, v}^{(t)}$, where denotes element-wise multiplication.
A nonlinear activation (e.g., ReLU) is applied per message before aggregation. The updated node state is then
where is a post-aggregation transformation, often a nonlinearity or small MLP, and may include layer normalization or residual connections. Applying before summation, as described, yields improved performance, particularly for counting-type tasks (Brockschmidt, 2019).
2. Architectural Details and Implementation
The core modules of GNN-FiLM consist of edge-type-specific linear projections, target-conditioned modulation hypernetworks, and per-node post-aggregation transformations. Key aspects include:
- Hidden state dimensionality governs all feature vectors.
- Each edge type maintains its own and (parameterized by ).
- is most often implemented as a single-layer linear map; deeper FiLM “hypernetworks” provide no consistent advantages.
- Aggregation involves summing nonlinearly transformed, modulated messages, then applying .
- Per-layer computation introduces an cost due to per-edge modulation parameters, with shared for all incoming edges of a given type to .
A concise pseudocode formalization of a single GNN-FiLM layer is:
1 2 3 4 5 6 7 8 9 10 11 |
for v in V: beta_l_v, gamma_l_v = split(U_l @ h_v + b_l) # Compute for all edge types l for u, l, v in E: m = W_l @ h_u m_mod = gamma_l_v * m + beta_l_v m_act = sigma(m_mod) Agg_v += m_act # aggregate per node v for v in V: h_v = l(Agg_v) |
Layer stacking is performed by iterative application of the GNN-FiLM layer, propagating hidden states forward; final outputs are processed via aggregation appropriate for node- or graph-level tasks.
3. Experimental Setup and Benchmarks
GNN-FiLM was evaluated on three benchmark datasets:
- PPI (Protein–Protein Interaction): Node classification across 24 graphs, 2.5K nodes each, with 121 classes and two edge types plus self-loops.
- QM9: Regression on molecular graphs (130K small molecules with edge types and self-loops, 13 targets).
- VarMisuse: Program variable-use ranking (130K training graphs).
Baselines re-implemented for direct comparison include GGNN (Li et al., 2015), R-GCN (Schlichtkrull et al., 2017), R-GAT (edge-typed GAT), R-GIN (Xu et al., 2019), and GNN-MLP0/1 variants. Extensive hyperparameter search ensured fair comparison, revealing that inter-model differences are smaller than previously reported when tuning is controlled. Training used early stopping and results were averaged over 5–10 random seeds.
4. Quantitative Results
Performance across tasks can be summarized as follows, with GNN-FiLM often achieving the best or competitive metrics:
| Task (Metric) | GNN-FiLM | Best Baseline | Comment |
|---|---|---|---|
| PPI (micro-F1) | 0.992 ± 0.000 | GNN-MLP0/1: 0.992 | Fastest convergence |
| QM9, α (polarizability, MAE) | 3.75 ± 0.11 | GNN-MLP0: 4.27 ± 0.36 | GNN-FiLM best |
| QM9, HOMO (MAE) | 1.22 ± 0.07 | GNN-MLP0: 1.25 ± 0.04 | GNN-FiLM best |
| QM9, U0 (MAE) | 5.43 ± 0.96 | GNN-MLP0: 5.55 ± 0.38 | GNN-FiLM best |
| VarMisuse SeenProjTest (Acc) | 87.0 ± 0.2 | R-GCN: 87.2 ± 1.5 | GNN-FiLM competitive |
| VarMisuse UnseenProjTest | 81.3 ± 0.9 | R-GCN: 81.4 ± 2.3 | GNN-FiLM competitive |
GNN-FiLM matches or outperforms baselines on all 13 regression targets in QM9. All re-implementations outperform the originally reported GGNN reference. Simple GNN-MLP(0/1) variants can match or beat previously state-of-the-art GGNN, R-GCN, and R-GAT under uniform tuning (Brockschmidt, 2019).
5. Mechanistic Insights and Ablation Findings
Target-conditioned, per-feature modulation in GNN-FiLM allows each node to selectively amplify or suppress particular input dimensions of its messages. This facilitates context-sensitive neighbor aggregation in a single layer, as evidenced by the ability to selectively count neighbor types without architectural depth or expansion.
Ablation studies identify that:
- Placing the nonlinearity before aggregation is critical for degree-separability and counting-type problems, though it introduces degree-dependent scaling that must control.
- The single-layer suffices; deeper modulation networks do not improve results.
- FiLM modulation enables tasks requiring context-sensitive gating (such as “count only one edge-type” scenarios) with one layer, compared to the need for extra feature dimensions or stack depth in vanilla GNNs.
6. Strengths, Limitations, and Applicability
Strengths of GNN-FiLM include rapid convergence (fewer training epochs), expressive per-feature gating based on the target node’s context, and parameter efficiency when handling many edge types. Limitations arise from the per-edge modulation computation overhead, though shared computation amortizes cost within nodes.
Optimal use cases are where each node must dynamically control which aspects of its neighbors to aggregate, or tasks involving numerous edge types that would otherwise require an unsustainable explosion of parameters if handled by fixed matrices alone. When FiLM modulation is most effective, it enables richer source–target interactions and achieves state-of-the-art or competitive results with modest model complexity.
7. Summary and Broader Context
GNN-FiLM introduces a lightweight, feature-wise affine modulation mechanism into standard message-passing GNNs. This architectural refinement, by conditioning incoming edge messages on the target node’s hidden state, principal enables selective and context-dependent information propagation. The approach delivers state-of-the-art or competitive performance across node classification, graph regression, and program analysis tasks, with faster convergence and high parameter efficiency, especially under rigorous hyperparameter search and reimplementation of baselines (Brockschmidt, 2019).