ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution

Published 23 May 2024 in cs.LG and cs.AI | (2405.14527v2)

Abstract: One of the guiding principles for designing AI-based weather forecasting systems is to embed physical constraints as inductive priors in the neural network architecture. A popular prior is locality, where the atmospheric data is processed with local neural interactions, like 3D convolutions or 3D local attention windows as in Pangu-Weather. On the other hand, some works have shown great success in weather forecasting without this locality principle, at the cost of a much higher parameter count. In this paper, we show that the 3D local processing in Pangu-Weather is computationally sub-optimal. We design ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, and demonstrate that this design improves forecasting skill. ArchesWeather is trained at 1.5° resolution and 24h lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. An ensemble of four of our models shows better RMSE scores than the IFS HRES and is competitive with the 1.4° 50-members NeuralGCM ensemble for one to three days ahead forecasting. Our code and models are publicly available at https://github.com/gcouairon/ArchesWeather.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel AI weather forecasting model that employs cross-level attention to improve vertical atmospheric interactions.
It utilizes a 3D Swin U-Net transformer on ERA5 climatological data, achieving competitive RMSE improvements over the IFS HRES baseline.
The study demonstrates high-resolution forecasting with reduced computational costs, setting a benchmark for efficient operational meteorology.

ArchesWeather: An Efficient AI Weather Forecasting Model at 1.5° Resolution

Introduction

The paper "ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution" (2405.14527) introduces an AI model designed to optimize weather forecasting performance using advanced neural architectures. The study leverages the integration of physics-informed inductive priors, which are integral to designing neural networks for atmospheric data. Existing models like Pangu-Weather employ 3D local attention but are outshone by systems deploying non-local attention mechanisms, prompting the development of the sc model—a 1.5° resolution transformer that innovatively combines 2D attention with column-wise interactions to improve forecast accuracy.

Figure 1: Relative RMSE improvement over the IFS HRES as a function of training computational budget.

Methodological Advances

The cornerstone of ArchesWeather is the strategic use of Cross-Level Attention (CLA), deviating from local interaction paradigms. Traditional local attention, though grounded in atmospheric physics that dictate localized interactions, is computationally limiting. CLA enables vertical column-wise feature interactions, thus expanding the receptive computation field vertically without increasing the parameter count excessively. Alternative methods like enlarged attention windows boost computational costs disproportionately, justifying the selection of CLA for this model.

In deploying CLA, sc ensures that each atmospheric layer's interactions are accounted for comprehensively, vastly enhancing computational efficiency and processing coherence across vertical layers.

Figure 2: Comparison of attention schemes used in Pangu-Weather (left) versus CLA (right).

Dataset and Training

The model utilizes the ERA5 dataset, aligning standard resolution formats for data regridding and employing specific climatological variables across multiple atmospheric levels. The training employs a latitude-weighted RMSE and Relative RMSE improvement metrics. These provide robust evaluations of model performance relative to the International Forecast System High-Resolution (IFS HRES) baseline. The systems' architecture—a 3D Swin U-Net transformer—is fine-tuned on recent ERA5 data samples post-2000 to mitigate historical distribution shifts in data quality.

Figure 3: Geopotential (left) and wind speed (right) RMSE comparison of models with and without fine-tuning.

Results and Discussion

The ArchesWeather model, particularly in its ensemble configurations (M and L variants), demonstrates superior performance over previous state-of-the-art models across several key atmospheric variables. Notably, the model achieves competitive RMSE scores while employing less computational resources, significantly undercutting the training budget required by comparable high-resolution models like NeuralGCM and Pangu-Weather. The CLA mechanism, specifically, is illustrative in improving forecasting skill with a compact parameter architecture compared to extensive parameter reliance in alternative models.

Figure 4: RMSE scores of weather models for lead times up to 10 days.

Conclusion

ArchesWeather sets a benchmark for efficient, high-resolution weather forecast modelling utilizing minimal computational infrastructure. Its core advancements in implementing CLA highlight an evolution in atmospheric modeling architectures, balancing computational expediency with robust multiscale atmospheric interaction modeling. This paves the way for future research avenues, particularly focused on model refinement for region-specific forecasting accuracy enhancements and integration with diffusion models for further downscaling and resolution upgrades. The results and methodologies presented in this study indicate significant potential for broader applications in climate modeling and operational meteorology.