Road Extraction Dataset Overview

Updated 20 January 2026

Road Extraction Dataset is a curated collection of georeferenced imagery paired with machine-readable road network annotations that support automated extraction and mapping.
It encompasses diverse geographic regions and scales with various annotation protocols, including polygonal outlines and graph-based representations for robust benchmarking.
Evaluation metrics focus on both geometric fidelity and topological correctness, driving improvements in model performance for tasks such as map updates and navigation.

A road extraction dataset is a curated collection of georeferenced imagery paired with precise, machine-readable annotations of road networks—typically represented as polygons, graphs, or centerlines—intended for training, validating, and benchmarking computational methods for automated road extraction, network reconstruction, map update, and related tasks. These datasets vary in spatial scale, annotation granularity, coverage diversity, and are evaluated using metrics that capture both geometric @@@@1@@@@ and topological correctness. The landscape encompasses datasets specialized for urban, rural, off-road, and global contexts, each with meticulously documented protocols to ensure reproducibility, generalizability, and utility across research and application domains.

1. Dataset Taxonomy and Scope

Road extraction datasets span diverse geographic, semantic, and technical axes. Typical categorization includes:

Urban, Rural, Off-Road, and Global-Scale: Datasets such as Map2ImLas target high-resolution urban and suburban contexts with polygonized road surfaces, while Global-Scale and WildRoad introduce graph-based labels for both urban and rural/off-road regions, the latter extending coverage to unpaved, ambiguous, or occluded tracks (Jiao et al., 29 Apr 2025, Yin et al., 2024, Guan et al., 11 Dec 2025).
Imagery Source and Resolution: Inputs range from orthophoto tiles at 7.5 cm GSD (Map2ImLas: 4000×4000 px), 1 m/pixel multispectral satellite scenes (Global-Scale: 2048×2048 px), 0.3 m/pixel high-res off-road images (WildRoad: 8192×4096 px), to compact binary segmentation masks (Toulouse: 64×64 px) (Belli et al., 2019).
Region Diversity: Coverage varies from regional (Deventer, Enschede, Giethoorn in Map2ImLas), urban US cities (MUNO21), to continent-spanning (WildRoad) or global (Global-Scale) repositories, with corresponding attention to generalization across domains (Bastani et al., 2021).

2. Annotation Protocols and Representations

The structure and semantics of road annotations depend on dataset design and the intended modeling paradigm.

Polygonal Outlines: Map2ImLas annotates full road surfaces per the Dutch BGT standard as polygons (or multipolygons) with strict boundary adherence, including internal holes for non-road islands and explicit junction handling. All vector data are stored in GeoJSON (EPSG:28992) and partitioned by input images (Jiao et al., 29 Apr 2025).
Graph-Based Representations: Global-Scale and WildRoad express road networks as graphs $G=(V,E)$ , where $V$ includes junctions, endpoints, and keypoints and $E$ encodes connectivity. Attributes may include degree, edge length, and geocoordinates, often serialized as GeoJSON or binary adjacency matrices plus coordinate arrays (Yin et al., 2024, Guan et al., 11 Dec 2025, Belli et al., 2019).
Change-Centric Annotation: MUNO21 uniquely provides pre- and post-change OSM-derived graphs ( $G$ , $G^*$ ) per scenario for the map update task, augmented with fine-grained change tags (e.g., Constructed, Was-Missing, Deconstructed) and time series imagery (Bastani et al., 2021).
Quality Control: Dual-pass or multi-round manual/expert annotation, automated topological validation (removal of self-intersections, dangling edges), and targeted visual review (10–50% of samples) are standard for high-fidelity labeling (Jiao et al., 29 Apr 2025, Guan et al., 11 Dec 2025).

3. Quantitative Characteristics and Splits

Road extraction datasets are characterized by carefully enumerated statistics to support benchmarking and statistical rigor.

Dataset	Area/Size	#Tiles/Images	Split (Train/Val/Test/OOD)	Road Poly/Graph Stats
Map2ImLas	≈ 850 km² (NL)	303 images	3584/—/448 (Deventer), 400, 416	~18k/2.2k/2.3k polygons; mean 48 vertices/poly; width: 22% <3 m, 57% 3–6 m, 21% >6 m
Global-Scale	~13,800 km²	3,468 tiles	2375/339/624/130	Vector graph $G=(V,E)$ from OSM; tens of 1000s km centerline
WildRoad	~2,100 km²	221 images	6448/1493/1333 patches	4,000+ km road, >11k junctions, 35k endpoints in train
Toulouse	~110 m²/tile	111,034 tiles	80,357/11,679/18,998	4–9 nodes & 5–15 edges per graph; 64×64 px segmentation masks
MUNO21	6,052 km²	39 tiles	726/568 scenarios (train/test)	514 change windows, 780 no-change; 948 km changed roads

Annotation splits are typically region- or city-stratified (to avoid spatial overfitting), with explicit OOD sets (e.g., Hong Kong, Shenzhen, Lucerne for Global-Scale). Patching or tiling (e.g., 256×256 px for Map2ImLas, 1,024×1,024 px for WildRoad) enables fine-grained input to models and local performance analysis (Jiao et al., 29 Apr 2025, Guan et al., 11 Dec 2025).

4. Evaluation Metrics and Benchmarks

Road extraction evaluation encompasses geometric, structural, and operational criteria:

Polygon Metrics (Map2ImLas):
- Simplicity $S(m, \hat{m}) = \operatorname{IoU}(m, \hat{m}) \times SF(N_{\hat{m}})$ , with $SF(N) = 1 - (N-N_\text{min})/(N_\text{max}-N_\text{min})$ , $N_\text{min} = 3$ , $N_\text{max} = 200$ ; balances IoU and vertex minimization.
- Boundary Smoothness $B = 1 - (1/L)\sum_{i=1}^{L}(|\theta_i - \theta_{i+1}|/\pi)$ , promoting regular, non-jagged polygons (Jiao et al., 29 Apr 2025).
Graph Metrics:
- TOPO (Biagioni & Eriksson): Precision, recall, F₁ on matched reachable vertex pairs.
- APLS (SpaceNet): $1-\text{mean}_{i,j}\left( {|L_p(i,j) - L_g(i,j)|}/{L_g(i,j)} \right)$ over sampled vertex pairs; sensitive to topological continuity.
- StreetMover (Toulouse): Entropy-regularized Earth-Mover’s distance between graphs' sampled point clouds; invariant to permutation, translation, rotation (Belli et al., 2019).
- Edge Precision/Recall: Fraction of matched edges within spatial buffers; Edge-F₁ critical for off-road evaluation (WildRoad, Global-Scale).
Change and Map Update Metrics (MUNO21):
- Improvement Score: Normalized gain in core metric (APLS or pixel-F₁) between pre- and post-change graphs.
- Scenario-Level Precision: Fraction of no-change scenarios where the map remains unaltered post-inference (Bastani et al., 2021).

5. Notable Datasets and Access

Domain: Urban and peri-urban Netherlands; 4000×4000 px aerial orthophotos at 7.5 cm GSD.
Annotation: Polygons/multipolygons for contiguous paved and unpaved surfaces (motorways, paths, railways, etc.), strict adherence to Dutch BGT definitions.
Quality: Expert dual-pass, topology-checked, 10% visual review.
Access: University of Twente open portal; CC BY 4.0.

Domain: Truly global; 13,800 km², 3,468 scenes spanning urban, rural, mountainous terrain.
Annotation: OSM-derived centerline graphs, quality-controlled, snapped to roads in satellite imagery.
Access: https://github.com/earth-insights/samroadplus; open-access (see repository).

Domain: Off-road, unpaved, and wild environments across six continents; 8,192×4,096 px images at 0.3 m/pixel.
Annotation: Graphs with explicit topology, interactive human-in-the-loop pipeline; >4000 km of road, bootstrapped for efficiency and accuracy.
Access: https://github.com/xiaofei-guan/attorch_copy; CC BY 4.0.

Domain: Dense urban, compact 64×64 binary masks; road graphs in tiles ~110 m².
Annotation: OSM-based segmentation masks and adjacency matrices; filtered for trivial/irregular graphs.
Access: https://github.com/davide-belli/toulouse-road-network-dataset; MIT License.

Domain: Map update for 21 US cities; multi-temporal NAIP imagery and OSM graphs (2012–2019).
Annotation: Per-scenario pre- and post-change graphs, segment-wise change tags, time-stamped.
Access: https://favyen.com/muno21/; BSD-style open source.

6. Task Definitions and Research Benchmarks

Polygonal Extraction: Precise reconstruction of road footprints as polygons emphasizing vertex economy and smoothness (Map2ImLas), evaluated on coverage, efficiency, and regularity.
Graph Extraction: Direct vectorization of centerline networks, suited for navigation/generalization tasks (Global-Scale, WildRoad, Toulouse). Emphasis on graph connectivity, shortest paths, and robust handling of occlusion and discontinuity (Yin et al., 2024, Guan et al., 11 Dec 2025, Belli et al., 2019).
Map Update: Incorporates historical context for minimum-edit updates to existing maps, with challenge centering on high-precision insertion, removal, and correction with scenario-level consistency (MUNO21).
Model Benchmarks: Comparative studies utilize both pixel/IoU and advanced graph/path-based metrics, with baseline and state-of-the-art methods (e.g., LDPoly, SAM-Road++, MaGRoad, GGT) recorded for each dataset (Jiao et al., 29 Apr 2025, Yin et al., 2024, Guan et al., 11 Dec 2025, Belli et al., 2019).

7. Practical Access, Licensing, and Recommendations

Preprocessing: Standardized input normalization, patching/tiling scripts, and coordinate reprojection guidance are included with major dataset releases (e.g., normalize RGB values to [−1, 1] for diffusion models in Map2ImLas).
Licensing: Public datasets are largely under permissive licenses (CC BY 4.0, MIT, BSD); OSM-based annotations comply with ODbL, and NAIP imagery is U.S. public domain (Bastani et al., 2021, Jiao et al., 29 Apr 2025).
Supplementary Resources: Most datasets provide comprehensive splits, demonstration code (including model conditioning, baseline pipelines), and detailed manifests. Some (e.g., WildRoad) package interactive annotation tools alongside imagery and vector data.

This synthesis underscores the centrality of carefully labeled, geographically and semantically diverse road extraction datasets to advancing computational approaches to mapping, autonomy, and infrastructure analytics, setting rigorous baselines and providing the foundation for cross-domain, cross-region algorithmic generalization.

Markdown Report Issue Upgrade to Chat

References (5)

LDPoly: Latent Diffusion for Polygonal Road Outline Extraction in Large-Scale Topographic Mapping (2025)

Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method (2024)

Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction (2025)

Image-Conditioned Graph Generation for Road Network Extraction (2019)

Beyond Road Extraction: A Dataset for Map Update using Aerial Images (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Road Extraction Dataset.

Road Extraction Dataset Overview

1. Dataset Taxonomy and Scope

2. Annotation Protocols and Representations

3. Quantitative Characteristics and Splits

4. Evaluation Metrics and Benchmarks

5. Notable Datasets and Access

Map2ImLas (Jiao et al., 29 Apr 2025)

Global-Scale (Yin et al., 2024)

WildRoad (Guan et al., 11 Dec 2025)

Toulouse Road Network (Belli et al., 2019)

MUNO21 (Bastani et al., 2021)

6. Task Definitions and Research Benchmarks

7. Practical Access, Licensing, and Recommendations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Road Extraction Dataset Overview

1. Dataset Taxonomy and Scope

2. Annotation Protocols and Representations

3. Quantitative Characteristics and Splits

4. Evaluation Metrics and Benchmarks

5. Notable Datasets and Access

Map2ImLas (Jiao et al., 29 Apr 2025)

Global-Scale (Yin et al., 2024)

WildRoad (Guan et al., 11 Dec 2025)

Toulouse Road Network (Belli et al., 2019)

MUNO21 (Bastani et al., 2021)

6. Task Definitions and Research Benchmarks

7. Practical Access, Licensing, and Recommendations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics