Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction

Published 29 Apr 2025 in nlin.CD | (2504.20858v2)

Abstract: Generating long-term trajectories of dissipative chaotic systems autoregressively is a highly challenging task. The inherent positive Lyapunov exponents amplify prediction errors over time. Many chaotic systems possess a crucial property - ergodicity on their attractors, which makes long-term prediction possible. State-of-the-art methods address ergodicity by preserving statistical properties using optimal transport techniques. However, these methods face scalability challenges due to the curse of dimensionality when matching distributions. To overcome this bottleneck, we propose a scalable transformer-based framework capable of stably generating long-term high-dimensional and high-resolution chaotic dynamics while preserving ergodicity. Our method is grounded in a physical perspective, revisiting the Von Neumann mean ergodic theorem to ensure the preservation of long-term statistics in the $\mathcal{L}^2$ space. We introduce novel modifications to the attention mechanism, making the transformer architecture well-suited for learning large-scale chaotic systems. Compared to operator-based and transformer-based methods, our model achieves better performances across five metrics, from short-term prediction accuracy to long-term statistics. In addition to our methodological contributions, we introduce a new chaotic system benchmark: a machine learning dataset of 140$k$ snapshots of turbulent channel flow along with various evaluation metrics for both short- and long-term performances, which is well-suited for machine learning research on chaotic systems.

Abstract PDF Upgrade to Chat

Summary

Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction

In the realm of predicting high-dimensional chaotic systems, the paper "Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction" introduces a novel transformer-based model designed to make long-term predictions while preserving the statistical properties of chaotic systems. Chaotic systems are notoriously difficult to predict due to their sensitive dependence on initial conditions, leading to error propagation. This paper tackles the challenge by leveraging the ergodicity of chaotic systems, wherein long-term predictions are stabilized by aligning the distributions of predicted and actual trajectories.

Model Architecture and Innovations

The proposed framework integrates several innovative design choices. Primarily, it employs a tailored transformer architecture with modifications to the attention mechanism that make it suitable for learning the dynamics of large-scale chaotic systems:

Attention Mechanism with Random Fourier Features (RFF): The paper replaces traditional positional encodings with random Fourier features to enhance the transformer's ability to model the spatial correlations inherent in chaotic systems. The RFF allows for embedding spatial information efficiently and effectively, aligning with the chaotic systems' ergodic property.
Axial Mean-Max-Min (A3M) Attention: The attention mechanism is further refined through the A3M attention block, designed to capture statistical moments and extreme values in physical fields, crucial for understanding and predicting chaotic systems.
Unitary Operator Learning: The model incorporates constraints to maintain the unitarity of operator dynamics, grounded in the Von Neumann ergodic theorem, to ensure long-term statistical consistency. This is operationalized through a novel loss function that preserves the ergodic properties in $\mathcal{L}^2$ space during long-term predictions.
Scalability: Addressing the challenge of dimension, the paper implements a computationally efficient factorized attention mechanism. This reduces complexity from a full-dimensional computation to computations scaled with each dimension, facilitating the handling of high-dimensional data.

Empirical Evaluation

The capability of the proposed framework is substantiated through rigorous empirical evaluations on two benchmark chaotic systems: Kolmogorov Flow and Turbulent Channel Flow.

For Kolmogorov Flow (KF256): The model demonstrates superior performance in both short-term prediction accuracy, as evidenced by relative $L^2$ errors, and long-term statistical properties, showing significant improvements over baselines like UNO, MNO, and MWT.
For Turbulent Channel Flow (TCF): The results indicate that the transformer not only excels in short-term predictions but also in preserving the energy spectrum and mixing rates—a key indicator of ergodicity. The model’s performance showed marked improvements in matching the energy spectrum and maintaining consistency in the mixing rates.

Implications and Future Directions

The introduction of a benchmark dataset (140k high-resolution snapshots of turbulent channel flow) highlights the practical relevance of the model for machine learning research in chaotic systems. The dataset can accelerate research in large-scale chaotic prediction, offering a common ground for evaluating future models.

From a theoretical standpoint, the paper navigates the relationship between operator theory and chaos, providing insights into the use of transformers in non-linear dynamical systems. This bridges the gap between statistical mechanics and machine learning, suggesting that future advancements could involve extending these techniques to non-ergodic or more complex systems without the requirement of spatial uniformity.

An intriguing direction for future exploration includes incorporating mesh-free methods within this framework, potentially allowing for applications beyond uniform grid-based systems, which are common in real-world chaotic phenomena.

Overall, the research presents a meticulous analysis of chaotic dynamics through the lens of transformer architectures, offering a promising avenue for robust and scalable predictions in fields where chaotic systems play a pivotal role, such as meteorology and fluid dynamics.