A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction

Published 16 Jun 2025 in cs.LG | (2506.13678v4)

Abstract: Human activity intensity prediction is crucial to many location-based services. Despite tremendous progress in modeling dynamics of human activity, most existing methods overlook physical constraints of spatial interaction, leading to uninterpretable spatial correlations and over-smoothing phenomenon. To address these limitations, this work proposes a physics-informed deep learning framework, namely Gravity-informed Spatiotemporal Transformer (Gravityformer) by integrating the universal law of gravitation to refine transformer attention. Specifically, it (1) estimates two spatially explicit mass parameters based on spatiotemporal embedding feature, (2) models the spatial interaction in end-to-end neural network using proposed adaptive gravity model to learn the physical constraint, and (3) utilizes the learned spatial interaction to guide and mitigate the over-smoothing phenomenon in transformer attention. Moreover, a parallel spatiotemporal graph convolution transformer is proposed for achieving a balance between coupled spatial and temporal learning. Systematic experiments on six real-world large-scale activity datasets demonstrate the quantitative and qualitative superiority of our model over state-of-the-art benchmarks. Additionally, the learned gravity attention matrix can be not only disentangled and interpreted based on geographical laws, but also improved the generalization in zero-shot cross-region inference. This work provides a novel insight into integrating physical laws with deep learning for spatiotemporal prediction.

Abstract PDF Upgrade to Chat

Summary

The paper's main contribution is integrating gravitational physics into a transformer architecture to improve urban human activity intensity predictions.
It introduces AdaGravity and parallel spatiotemporal modules that mitigate over-smoothing and boost performance measured by RMSE, MAE, and MAPE.
Experiments on six metropolitan datasets validate Gravityformer’s effectiveness, offering actionable insights for urban planning, traffic management, and public safety.

A Gravity-Informed Spatiotemporal Transformer for Human Activity Intensity Prediction

Introduction

The paper "A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction" (2506.13678) introduces a novel framework aimed at enhancing the prediction of human activity intensity by integrating the law of universal gravitation within a deep learning model. Human activity intensity, defined as the dynamic distribution of an active population, serves critical roles in urban planning, traffic management, and public safety. Although existing models, such as Spatiotemporal Graph Neural Networks (ST-GNNs), have advanced the spatiotemporal prediction field, they often lack considerations for physical constraints of spatial interactions and suffer from over-smoothing in spatial correlation modeling.

Methodology

The proposed framework, designated as the Gravity-informed Spatiotemporal Transformer (Gravityformer), seeks to address these limitations by incorporating physics-based constraints directly into the deep learning architecture. The framework refines the spatial transformer attention mechanism by embedding principles from the universal gravitation law, which molds the learning process for spatial interactions.

Key innovations of the Gravityformer include the estimation of two spatial mass parameters based on inflow and outflow, modeling interaction likelihood with closed-form equations, and guiding the transformer’s attention to mitigate over-smoothing. Additionally, the model employs a Spatiotemporal Graph Convolution Transformer structure (ST-GC2former) to maintain balanced spatial and temporal learning.

Experimental Results

The model's performance was validated through extensive experiments on six large-scale real-world datasets of metropolitan human activity. Gravityformer demonstrated superior predictive accuracy over state-of-the-art benchmarks, quantified by significant improvements in RMSE, MAE, and MAPE metrics across varied urban scales. The interpretation of the gravity attention matrices further corroborated the model's adherence to geographical laws, highlighting the practical effectiveness and theoretical soundness of integrating physical principles with machine learning.

Discussion

The paper includes a detailed ablation study illustrating the contributions of Gravityformer’s components such as AdaGravity and the parallel spatiotemporal learning module. The results indicate that physics-informed mechanisms effectively constrain spatial interactions, consequently enhancing predictive accuracy and robustness. The adaptive gravity model also proved essential as it dynamically adjusted mass parameters to reflect genuine spatial correlations.

Furthermore, the proposed framework's interpretable spatial attention, contrasted against conventional attention matrices, showcases its capacity for disentangling complex spatiotemporal dependencies while adhering to empirical geographical patterns.

Implications and Future Work

The Gravityformer presents a significant advancement in spatiotemporal prediction by successfully merging physical laws with deep learning architectures. Its practical implications span urban computing tasks, including improved predictions in areas like urban flow and emergency response planning. Theoretically, the model sets a precedent for physics-informed machine learning, offering a pathway for future exploration in spatially-explicit predictive modeling across diverse applications.

Future work could explore further integration of other physical models, refining the model's adaptability to different spatiotemporal scales and improving computational efficiency. The promising results suggest that expanding physics-informed principles within deep learning frameworks warrants substantial research efforts, potentially yielding breakthroughs in modeling and applications beyond urban environments.

Conclusion

The Gravityformer represents a pivotal stride in enhancing the predictive acuity of spatiotemporal models through the integration of gravity-based physical constraints. Its superior performance and interpretability advocate for the broader adoption of physics-informed learning paradigms, which could revolutionize approaches to modeling complex interactions within urban systems and beyond.

Markdown Report Issue