Papers
Topics
Authors
Recent
Search
2000 character limit reached

AgroFlux: Spatial-Temporal GHG Benchmark Dataset

Updated 9 February 2026
  • AgroFlux is a pioneering benchmark dataset that integrates high-resolution physics-based simulations with empirical observations to quantify GHG fluxes in agroecosystems.
  • It standardizes spatiotemporal data processing with protocols supporting rigorous evaluation and transfer learning for deep learning models.
  • The dataset facilitates detailed analysis of carbon and nitrogen exchanges under diverse management, environmental, and crop conditions across the U.S. Corn Belt.

The Spatial-Temporal Agroecosystem GHG Benchmark Dataset, known as AgroFlux, is a pioneering, AI-ready resource for benchmarking greenhouse-gas (GHG) flux prediction in agricultural ecosystems. AgroFlux uniquely consolidates high-resolution physics-based simulations from mechanistic process models with real-world empirical measurements, establishing a standardized framework for the development and evaluation of deep learning models targeting the quantification of carbon and nitrogen exchanges. The dataset spans diverse management, environmental, and crop scenarios across the U.S. Corn Belt, addressing the challenges of data sparsity, spatiotemporal heterogeneity, and complex biogeophysical processes inherent in agroecosystem monitoring and modeling (Cheng et al., 2 Feb 2026).

1. Data Sources and Integration Protocols

AgroFlux’s core innovation is the systematic fusion of mechanistic model outputs with empirical observations, providing broad coverage of synthetic scenarios alongside observations reflecting measurement uncertainty and site-specific variability.

Physics-based Model Simulations

  • Ecosys: Simulated at daily resolution from 2000–2018 across 99 counties in Iowa, Illinois, and Indiana. Each county features 20 nitrogen fertilization regimes FERTZRN[0,33.6]FERTZR_N \in [0,33.6] g N m2^{-2} day1^{-1}, with actual weather, soil, and planting-date drivers.
  • DayCent (v.279): Simulated daily at 2,562 U.S. Corn Belt locations (randomly selected), spanning 2000–2020. Each site is evaluated under 42 distinct agronomic scenarios covering nitrogen rate, application timing, and crop rotation.

Empirical Measurements

  • Controlled-Environments: Hourly N2_2O flux (aggregated to daily) from 2016–2018 in six chambered soil cores subjected to variable irrigation.
  • Eddy Covariance Towers: 11 sites with continuous CO2_2 flux and gross primary productivity (GPP) from 2000–2020 in Illinois, Iowa, Michigan, Nebraska, and Minnesota, with supporting weather, soil, and crop-identity data.

Integration Protocols

  • Feature-wise normalization (mean 0, std 1, clipping to the 1st–99th percentiles).
  • Masking of missing output targets during training; input drivers retained where available.
  • Annual, non-overlapping time-blocks (T=365T=365 days) for uniform sequence modeling.
  • Parallel directory structure for simulated and observed samples, each annotated by meta-information (location, scenario, timestamp).

2. Spatial and Temporal Coverage

AgroFlux targets key spatial and temporal axes of agricultural GHG flux monitoring:

  • Simulated Data:
    • Ecosys: 99 counties, IA/IL/IN; 2000–2018; daily; point-based.
    • DayCent: 2,562 sites throughout the Midwest; 2000–2020; daily; point-based.
  • Observational Data:
    • Chamber N2_2O: Six soil cores (single central U.S. field), April–July 2016–2018; aggregated daily.
    • Eddy Covariance: 11 cropland sites, each 5–19 annual cycles, 2000–2020; aggregated daily; flux tower footprints (\sim1 km2^2) or chamber plots (\sim0.1 m2^2).

This coverage provides both depth (high temporal frequency and detailed agronomic variation at specific sites) and breadth (geographic and management diversity).

3. Variables and Fluxes

AgroFlux encompasses a comprehensive suite of input drivers and output fluxes or state variables, partitioned into carbon, nitrogen, water, and thermal domains, with meteorological and soil context.

Category Selected Inputs Selected Outputs
Meteorology TmaxT_{max}, TminT_{min}, PREC, RADN, HUM, WIND
Soil TBKDS, TSAND, TSILT, TPH, TSOC SWC, Tsoilmax/minT_{soil}^{max/min}
Management FERTZR_N, PDOY, PLANTT
Carbon FNEE(t)F_{NEE}(t), FGPP(t)F_{GPP}(t), FRe(t)F_{Re}(t), YyieldY_{yield}, ΔSOCX\Delta SOC_X, LAI
Nitrogen FN2O(t)F_{N_2O}(t), [NH4+][\mathrm{NH}_4^+], [NO3][\mathrm{NO}_3^-]
Water SWC (various depths), ET
Thermal Tsoilmax/minT_{soil}^{max/min} (various depths)
  • Notation: All fluxes Fi(t)F_i(t) carry units g C m2^{-2} d1^{-1} (carbon) or g N m2^{-2} d1^{-1} (nitrogen). Concentrations use g N kg1^{-1} soil; management, weather, and yield variables use domain-appropriate units.

4. Dataset Organization and Access

AgroFlux v1.0 is distributed under a CC-BY 4.0 license with a structure supporting transparent use and reproducibility:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
agroflux-v1.0/
    simulated/
        ecosys/
            inputs.csv    # X_{es}
            outputs.csv   # Y_{es}
            meta.json     # site-id, lat/lon, scenario
        daycent/
            inputs.csv
            outputs.csv
            meta.json
    observed/
        n2o/
            inputs.csv    # X_{obs}
            n2o_flux.csv  # Y_{N2O}
            meta.json
        co2_gpp/
            inputs.csv
            co2_flux.csv  # Y_{CO2}
            gpp.csv       # Y_{GPP}
            meta.json

  • All files are comma-delimited UTF-8 text.
  • Metadata sidecars (JSON) provide identifiers, geocoordinates, scenario tags, and temporal coverage.
  • A machine-readable “dataset card” and schema are available on HuggingFace Datasets ("agroflux"). Standardized code for rapid loading is provided:
    1
    2
    
    from datasets import load_dataset
    ds = load_dataset("agroflux", version="1.0")
  • Detailed statistical characterizations appear in Appendix A of (Cheng et al., 2 Feb 2026).

5. Benchmarking Tasks and Experimental Protocols

AgroFlux supports rigorous assessments of sequential prediction models under both temporal and spatial generalization, defining a set of canonical splits and tasks:

Simulated Data Tasks

  • Ecosys temporal split: train 2000–2012, validate 2013–2015, test 2016–2018
  • Ecosys spatial: 5-fold cross-validation on 99 counties
  • DayCent temporal: train 2000–2016, validate 2017–2018, test 2019–2020
  • DayCent spatial: 5-fold cross-validation on 2,562 sites

Observational Data Tasks

  • N2_2O temporal: train on 2016–2017, test on 2018
  • CO2_2/GPP temporal: train 2000–2015, test 2016–2020
  • CO2_2/GPP spatial: 5-fold cross-validation on eddy-covariance sites
  • N2_2O spatial: 5-fold cross-validation on chamber cores

This framework enables direct comparison of model architectures and transfer learning strategies across measurement domains and extrapolation regimes.

6. Model Benchmarks, Transfer Learning Protocols, and Recommendations

AgroFlux provides reference evaluations for LSTM-based sequential models, temporal convolutional networks, and Transformer-based architectures in predicting both carbon and nitrogen fluxes. Transfer learning is explicitly evaluated, exploring pretraining on simulated data followed by fine-tuning or adaptation to real-world observations. All protocols exclude missing targets from loss computation but retain input sequences, ensuring realistic treatment of inevitable gaps in environmental data streams.

The design and release of AgroFlux aim to catalyze the development of trustworthy, generalizable AI-driven agroecosystem models by providing both the breadth of synthetic, scenario-rich data and the depth of observational, noisy real-world measurements under a unified, reproducible protocol (Cheng et al., 2 Feb 2026). A plausible implication is the facilitation of more robust, interoperable models that provide accurate, transparent GHG flux estimates at relevant spatiotemporal scales for sustainability and climate science applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spatial-Temporal Agroecosystem GHG Benchmark Dataset.