ALOS-2 LULC Benchmark: SAR Land Cover Segmentation

Updated 29 January 2026

ALOS-2 LULC Benchmark is a large-scale dataset and evaluation framework designed for SAR-based semantic segmentation, addressing challenges like boundary over-smoothing and rare-class degradation.
It utilizes single-polarized ALOS-2 SAR data over Japan with 131 GeoTIFF tiles and meticulously curated labels remapped from 14 to 9 classes, ensuring structured training, validation, and testing splits.
A baseline pipeline with a hierarchical Swin Transformer encoder and UPerNet decoder, augmented by high-resolution feature injection, progressive refine-up, and α-scale reweighting, demonstrated mIoU improvements up to 50.26%.

The ALOS-2 LULC Benchmark is a comprehensive large-scale dataset and evaluation framework for land-use/land-cover (LULC) semantic segmentation based on ALOS-2 single-polarization (HH) synthetic-aperture radar (SAR) data over Japan. Developed to address the challenges inherent in SAR-based dense prediction, including boundary over-smoothing, rare-class degradation under long-tailed distributions, and the limitations of single-polarized input, the benchmark comprises carefully curated and annotated data, standardized splits, and a set of baseline and refined model architectures for reproducible performance evaluation. It provides a foundation for advancing SAR LULC mapping methodologies and enables rigorous comparison of novel segmentation techniques under controlled, domain-relevant scenarios (Caglayan et al., 22 Jan 2026).

1. Dataset Composition and Label Structure

The ALOS-2 LULC Benchmark utilizes data from the ALOS-2 SAR sensor in L-band, acquired in HH single polarization, with a spatial resolution of 10 m ground sampling distance (GSW). The dataset spans all of Japan, partitioned into 131 GeoTIFF tiles of 12,000×12,000 pixels in WGS84 projection. The temporal range covers September–November 2022, with pretraining performed on patches from October–November (328,238 unlabeled patches), and model finetuning and evaluation on September data (123,842 SAR–label pairs).

LULC classes derive from JAXA’s high-resolution LULC map (version 23.12), remapped from 14 original categories to 9 for segmentation tasks:

1. Water

Built-up
Cropland
Grassland
Forest (merging five forest subclasses)
Bare
Solar panels
Wetland
Greenhouse

The dataset exhibits a long-tailed class distribution, where the 'forest' and 'water' classes each account for approximately 40–50% of the labeled pixels, while 'solar panels' and 'greenhouse' comprise less than 1% each. Subtraction-normalized weights $w_k = 1 - f_k$ yield focal loss reweighting factors $\alpha_k \propto w_k$ , directly addressing representation imbalance.

For the supervised segmentation protocol, SAR–label patch pairs are divided as follows:

Split	# Patches
Train	80,497
Validation	22,291
Test	21,054

Label-guided sampling during pretraining, utilizing pixel-wise inverse frequency anchors, ensures rare classes are sufficiently represented in the learned representations (Caglayan et al., 22 Jan 2026).

2. Baseline Modeling Pipeline

The standard pipeline leverages a hierarchical Swin Transformer-Base encoder (channel configuration: [128, 256, 512, 1024]; attention heads: [4, 8, 16, 32]; blocks: [2, 2, 18, 2]) pretrained with SAR-W-MixMAE. This self-supervised routine employs a mixed masked autoencoder (MixMAE-style input mixing and dual reconstruction) and a backscatter-power-weighted pixel reconstruction loss to attenuate speckle sensitivity. The pretraining schedule runs for 600 epochs with AdamW optimizer, 0.05 weight decay, base learning rate $1.5 \times 10^{-4}$ scaled by batch, a 40-epoch linear warmup, and cosine learning rate decay.

For decoding, UPerNet is employed (featuring pyramid pooling module and FPN top-down aggregation), consuming multi-scale feature maps at resolutions $1/4$, $1/8$, $1/16$, and $1/32$ of the input image (Caglayan et al., 22 Jan 2026).

Three lightweight refinements were developed to overcome specific SAR-based segmentation failure modes without increasing pipeline complexity:

High-Resolution Feature Injection: Early patch-embedding outputs ( $F_0$ ) are injected directly into the multi-scale FPN decoder, aiding the localization of fine spatial details critical for accurate boundary reconstruction. This extends the standard FPN pipeline with an additional lateral connection at the highest spatial resolution.
Progressive Refine-Up Segmentation Head: Progressive upsampling interleaves $2\times$ upscaling with $3 \times 3$ convolutional blocks and normalization/activation (e.g., BN + ReLU), introducing lateral skips at each scale (from $1/32$ up to $1/4$) prior to the classification head. This structure encourages gradual boundary sharpening and facilitates recovery of slender or small object classes.
$\alpha$ -Scale Class Reweighting: The focal loss component is modulated via $\alpha_k$ (subtraction-normalized from class frequencies) and a global scaling parameter $\alpha_{\text{scale}}=2.25$ , combined with a soft dice loss for stabilization:

$L = \lambda_{\text{focal}} L_{\text{focal}} + \lambda_{\text{dice}} L_{\text{dice}},\quad \lambda_{\text{focal}} = 0.57,\, \lambda_{\text{dice}} = 0.32$

Focal loss parameters are set to $\gamma=1.1$ , with the dice loss supporting multi-class settings. The reweighting scheme is designed to suppress over-correction for rare classes, improving mIoU on under-represented categories without adversely affecting majority classes (Caglayan et al., 22 Jan 2026).

4. Experimental Procedure and Data Augmentation

Data are processed as single-channel (HH polarization) 256×256 patches, normalized by the training set mean and standard deviation. Finetuning utilizes conventional augmentations—random horizontal and vertical flip, random rotations in $90^\circ$ increments, and randomly applied brightness jitter to simulate speckle variations present in SAR data.

Optimization adopts AdamW with a base learning rate $6 \times 10^{-4}$ , layer-wise decay $0.7$, minimum learning rate $1\times 10^{-6}$ , 10 warming-up epochs, and a 200-epoch schedule (batch size 512 across 8×NVIDIA H200 GPUs). Pretraining on the unlabeled corpus spans ~3 days (600 epochs), while finetuning requires less than 24 hours (200 epochs). Distributed training is conducted using PyTorch DistributedDataParallel over NCCL (Caglayan et al., 22 Jan 2026).

5. Benchmark Results and Performance Analysis

Ablation studies and comparative results validate the efficacy of the proposed refinements:

Validation mIoU Progression

Model Variant	mIoU (%)	mAcc (%)
Pretrained + UPerNet (baseline)	47.45	58.34
+ High-Res Injection	48.99	—
+ Progressive Refine-Up	49.67	—
+ $\alpha$ -Scale Weighting	50.26	—
All Refinements, No Pretraining	41.18	—

On the test set, the refined model achieves 0.50 mIoU overall, with class-wise mIoUs of 0.93 (water), 0.70 (built-up), 0.65 (cropland), 0.38 (grassland), 0.88 (forest), 0.33 (bare), 0.23 (solar), 0.22 (wetland), and 0.17 (greenhouse). For binary water detection, the model with SAR-W-MixMAE pretraining and all refinements yields IoU_water = 93.64%, precision = 96.53%, recall = 96.91%. Incremental improvements are observed for narrow-structure (boundary) F1 (+4–6%) and in rare-category mIoU, notably solar panels (+0.03) and greenhouse (+0.06) over the baseline (Caglayan et al., 22 Jan 2026).

6. Qualitative Evaluation and Analytical Insights

Qualitative analysis demonstrates that sharper boundaries, reduced spatial over-smoothing, and improved detection of thin urban strips and small inland water bodies are achieved with the refinements. In water detection, coastline and narrow river delineations are accurately recovered; residual errors align with annotation noise at shorelines.

The high-resolution feature injection recovers edge precision otherwise lost in standard patch embedding and downsampling. The progressive refine-up head enforces gradual spatial detail recovery, sharpening contours and yielding better segmentation of small or elongated structures. The $\alpha$ -scale class reweighting method stabilizes the optimization, avoiding instability often induced by long-tailed class distributions, while preserving majority-class performance through tempered adjustment (Caglayan et al., 22 Jan 2026).

7. Limitations and Prospects

Current limitations include the restriction to single-polarization (HH) SAR, which limits the discrimination of certain man-made versus natural classes compared to multi-polarization or interferometric SAR modalities. Seasonal coverage is restricted to autumn, limiting phenology-driven class separability (e.g., for cropland versus bare soil). The integration of ancillary geospatial metadata or graph-based context modules is identified as a prospective route to enhance structural coherence in segmentation outputs.

Recommendations for practitioners emphasize the following:

Employ self-supervised MIM pretraining (e.g., SAR-W-MixMAE) tailored to SAR archives.
Incorporate high-resolution encoder outputs into the decoder to preserve spatial details.
Use multi-stage upsampling with interleaved convolutional refinement to recover edges and fine structures.
Adjust class weighting using a global $\alpha$ -scale parameter for stable training on long-tailed data.

The refinements introduced for the ALOS-2 LULC Benchmark are modular and adaptable, with generalization potential to other SAR sensors and diverse geographic contexts (Caglayan et al., 22 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Enhanced LULC Segmentation via Lightweight Model Refinements on ALOS-2 SAR Data (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ALOS-2 LULC Benchmark.