- The paper introduces ML methods for agricultural drought detection using the Soil Moisture Index as ground truth.
- It employs high-resolution ERA5-Land data and MODIS satellite observations with a modified k-fold time-series split to ensure robust evaluation.
- Results highlight that models like CNN and LSTM perform well in identifying drought conditions, though performance degrades with lower data resolution.
On the Generalization of Agricultural Drought Classification from Climate Data
Introduction
The study "On the Generalization of Agricultural Drought Classification from Climate Data" investigates the classification of agricultural droughts using ML approaches informed by climate data. With climate change increasing the likelihood of drought events, this research addresses the critical need for improved drought detection and classification methodologies. The study diverges from previous work by utilizing the Soil Moisture Index (SMI) obtained from a hydrological model as a ground truth, focusing on agricultural droughts and their impact on food security.
Data Preparation
The research employs various climate data sources and methodologies to classify droughts based on SMI. The key data inputs include high-resolution ERA5-Land climate data and land use information derived from MODIS satellite observations.

Figure 1: Data examples for 1 month. Left: ERA5 input variable example "pressure" Right: Target variable: Binarized SMI.
The classification is framed as a binary task, with the binarization of SMI data following established drought monitor thresholds. The dataset spans from January 1981 to December 2018, geographically constrained to Germany due to available SMI data. A modified k-fold time-series split is applied to simulate climate projection scenarios and mitigate data leakage, with k=5. This approach ensures robust evaluation metrics and prevents overfitting.
Methodology
The classification models explored include SVM, Multi-Layer Perceptron (MLP), CNN, and LSTM architectures. The choice of these models reflects an interest in evaluating both models with and without explicit sequential inductive biases.
The study employs a six-month window for input features, aligning with temporal dependencies observed in SMI correlations. Seasonal and positional encodings are incorporated to represent location and time-based variations relevant to drought conditions.
Results
In the PR-AUC evaluation, the models demonstrated success, outperforming mere frequency predictions of the minority class.

Figure 2: Left: Results on PR-AUC of the different models on the test dataset across five different random seeds for drought classification using a window of six months. Right: Ablation study: Inference on models trained on high resolution given input with decreasing resolution.
The ablation study investigating the impact of resolution changes found performance degradation as input data resolution decreased, but the models maintained reasonable efficacy. This suggests the potential for transferring models trained on fine-grained data to coarser-resolution climate model outputs.
Summary and Outlook
The research highlights the efficacy of ML models in classifying agricultural drought based on SMI and underscores the importance of high-resolution input data for maintaining classification accuracy. By providing an ablation study focused on resolution effects, the paper suggests that current model capabilities can be adapted for application to climate model projections, paving the way for future climate scenario analyses.
Future research directions include the application of these ML models to climate model outputs for global drought prediction and potential utilization of alternative ground truth labels, such as SMAP data. These efforts aim to extend the research framework to a global scale and improve generalization capabilities.
Overall, this study represents a significant advancement in applying ML to agricultural drought detection, supporting the goal of proactively managing the impacts of climate change on agriculture.