Papers
Topics
Authors
Recent
Search
2000 character limit reached

TempoKGAT: Temporal Graph Attention Model

Updated 30 January 2026
  • TempoKGAT is a temporal graph attention architecture that integrates time-decay weighted node features with selective top-k neighbor aggregation.
  • The model improves prediction accuracy and interpretability by reducing MSE and RMSE by 20–40% over standard GATs in various spatio-temporal forecasting tasks.
  • Its design efficiently limits computation by focusing on the most influential neighbors, making it ideal for traffic, epidemiology, and energy forecasting applications.

TempoKGAT is a graph attention network architecture designed to model temporal, dynamic graph data, integrating time-decay weighted node features and a selective top-kk neighbor attention protocol. Developed to address limitations of conventional graph neural networks (GNNs) in capturing evolving relationships within spatio-temporal datasets, TempoKGAT enables both improved prediction accuracy and enhanced interpretability in temporal forecasting contexts (Sasal et al., 2024).

1. Architectural Components

TempoKGAT consists of a single-layer graph attention framework that interleaves two principal mechanisms: a Temporal Block and a Spatial Block. The Temporal Block applies element-wise exponential decay to node features based on relative timestamps, allowing recent observations to exert stronger influence. Given node features XRN×F\mathbf{X}\in\mathbb{R}^{N\times F}, timestamps tRN\mathbf{t}\in\mathbb{R}^N, and decay rate λ>0\lambda > 0, the decayed features are computed as:

Xdecay=Xexp(λt)\mathbf{X}_{\mathrm{decay}} =\mathbf{X} \odot \exp\bigl(- \lambda \mathbf{t}\bigr)

where \odot denotes the Hadamard product.

The Spatial Block restricts attention to the top-kk neighbors by edge weights, selecting for each node ii:

Nk(i)=arg ⁣ ⁣top-kSN(i),S=k ⁣{wijjN(i)}\mathcal{N}_k(i)= \arg\!\!\underset{S\subseteq N(i),\,|S|=k}{\mathrm{top}\textrm{-}k}\!\bigl\{w_{ij}\bigm|j\in N(i)\bigr\}

Attention coefficients are computed using a single-head additive mechanism. For projected decayed features Wxdecay,i\mathbf{W}\vec{x}_{\mathrm{decay},i}, the attention for node ii toward neighbor jj is:

eij=aT[Wxdecay,iWxdecay,j],jNk(i)e_{ij} = \mathbf{a}^T\left[\mathbf{W}\vec{x}_{\mathrm{decay},i} \Vert \mathbf{W}\vec{x}_{\mathrm{decay},j}\right],\quad j\in\mathcal{N}_k(i)

αij=exp(LeakyReLU(eij))kNk(i)exp(LeakyReLU(eik))\alpha_{ij} = \frac{\exp(\mathrm{LeakyReLU}(e_{ij}))}{\sum_{k\in\mathcal{N}_k(i)}\exp(\mathrm{LeakyReLU}(e_{ik}))}

Neighbor contributions are further modulated by edge weights:

βij=αijwij\beta_{ij} = \alpha_{ij} \cdot w_{ij}

xi=jNk(i)βij(Wxdecay,j)\vec{x}'_i = \sum_{j\in\mathcal{N}_k(i)} \beta_{ij} \left(\mathbf{W} \vec{x}_{\mathrm{decay},j}\right)

This design combines localized temporal weighting and edge-aware attention to represent latent dynamic patterns in temporal graphs.

2. Selective Neighbor Aggregation Protocol

TempoKGAT's neighbor selection restricts each node's receptive field to the kk largest edge-weight neighbors. Only this subset contributes to the spatial aggregation, substantially reducing computational cost relative to full adjacency attention.

The protocol can be summarized as:

  • Selection: Equation (2) (above) finds the top-kk neighbors by descending edge weight.
  • Aggregation: Equations (4)–(7) specify the attention and feature aggregation over only these critical neighbors.

This mechanism allows for focused exploitation of salient graph connections corresponding to the strongest temporal and spatial dependencies, effectively filtering out noise from weak or irrelevant relationships.

3. Objective Function and Optimization

TempoKGAT is optimized for point forecasting/regression tasks using Mean Squared Error (MSE) as the sole loss function:

LMSE=1ni=1n(YiY^i)2\mathcal{L}_{\mathrm{MSE}} = \frac{1}{n}\sum_{i=1}^n (Y_i - \hat Y_i)^2

Performance metrics, reported during evaluation, include:

  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE):

RMSE=1ni=1n(YiY^i)2\mathrm{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^n (Y_i-\hat Y_i)^2}

  • Mean Absolute Error (MAE):

MAE=1ni=1nYiY^i\mathrm{MAE} = \frac{1}{n}\sum_{i=1}^n |Y_i-\hat Y_i|

No explicit regularization or temporal smoothing beyond the time-decay factor is employed. Training uses the Adam optimizer (learning rate 0.001) over 200 epochs with an 80/20 temporal split.

4. Computational Complexity Analysis

TempoKGAT's selective neighbor aggregation reduces attention and aggregation cost from O(deg(i)F)O(\mathrm{deg}(i)F') in standard GATs to O(kF+k)O(kF' + k) per node, with kk typically far less than average node degree. Top-kk selection per node incurs O(deg(i)logk)O(\mathrm{deg}(i)\log k) via sorting, or linear time via selection algorithms.

Space requirements increase only modestly, due to storage for decayed feature masks and top-kk neighbor indices. For graphs where kmaxideg(i)k \ll \max_i \mathrm{deg}(i), overall batch runtime is improved compared to full attention mechanisms.

5. Experimental Protocol and Quantitative Results

TempoKGAT evaluation spans five open-source spatio-temporal benchmarks across traffic, epidemiology, and energy domains:

Dataset Optimal kk MAE MSE RMSE
PedalMe 1 0.7476 1.1717 1.0825
ChickenPox 1 0.6489 1.0017 1.0008
England Covid 5 0.4953 0.4192 0.6474
Small Windmill 7 0.7949 0.9821 0.9910
Medium Windmill 17 0.7198 0.8890 0.9429

Comparative baselines include GRU, LSTM (graph extensions), GCN, GAT, TGCN, DCRNN, EvolveGCNH, and naive edge-weight-injected versions. TempoKGAT consistently outperforms these methods, achieving reductions of 20–40% in MSE and RMSE over standard GATs. Optimal kk varies: small values suffice for dense graphs; large values are preferable for sparse or high-variance settings.

6. Interpretative Insights and Limitations

The joint role of learned attention scores αij\alpha_{ij} and time-decay factors enables granular interpretability, highlighting nodes and historical lags most influential in predictions—facilitating qualitative causal analysis. Empirically, the sufficiency of k=1k=1 in several datasets implies that a single dominant neighbor, modulated by time-weighting, often provides the main predictive signal.

Identified limitations include increased computational overhead with large kk and potential under-capture of complex, multi-relational graph dynamics due to single-head attention. Prospective improvements involve faster top-kk algorithms, multi-head attention, scaling to orders-of-magnitude larger graphs, and adaptive decay mechanisms.

7. Context, Applications, and Prospects

TempoKGAT is situated at the intersection of temporal GNNs and local attention-based aggregation strategies, addressing challenges in temporal graph analysis across diverse domains such as traffic prediction, epidemiological modeling, and renewable energy forecasting. Its architecture enables both advanced predictive accuracy and model interpretability without explicit regularization. Future work will likely focus on algorithmic efficiency, richer multi-pattern attention, and generalization to extremely large-scale dynamic graphs (Sasal et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TempoKGAT.