S3LI Vulcano Dataset: Multi-Modal Benchmark
- S3LI Vulcano Dataset is a comprehensive multi-modal dataset integrating hyperspectral, visual, LiDAR, inertial, GNSS, and Sentinel-2 imagery collected in volcanic, lunar-analogue environments.
- The dataset supports advanced research in machine learning and SLAM by providing meticulously calibrated sensor data, high-quality ground-truth, and open-source toolkit integration.
- Key evaluations include mineralogical characterization using hyperspectral analysis, robust SLAM and place recognition under low-texture conditions, and near-real-time volcanic activity detection.
The S3LI Vulcano Dataset is a comprehensive, multi-modal suite of datasets comprising hyperspectral, visual, LiDAR, inertial, GNSS, and Sentinel-2 orbital imaging, specifically acquired and curated to benchmark and enable advanced machine learning and SLAM research under highly unstructured, planetary analogue environments. Collected on Vulcano Island (Aeolian Archipelago, Italy), the dataset integrates both characterization of mineralogical diversity—focused especially on olivine–pyroxene mixtures as lunar basalt analogues—and robotic navigation challenges encountered in low-texture, volcanic terrains. The dataset is made publicly available with extensive ground-truth, calibration, and an open-source toolkit for task-centric dataset preparation (Hesar et al., 28 Mar 2025, Gonzalez et al., 7 Nov 2025, Priyasad et al., 27 Oct 2025, Giubilato et al., 27 Jan 2026, Giubilato et al., 2022).
1. Sensor Modalities and Dataset Structure
S3LI Vulcano amalgamates ground-based hyperspectral cubes, synchronized RGB stereo pairs, solid-state LiDAR, high-rate IMU, GNSS, and spaceborne Sentinel-2 imagery, each structured per application domain.
- Hyperspectral Cubes: Acquired via Specim FX10 (400–1000 nm, 224 bands, FWHM ≈ 5.5 nm), each cube is 818×1024×224 voxels. Nine manually identified Regions of Interest (ROIs) span the compositional variability between olivine- and pyroxene-rich basalt samples, capturing spatial and spectral endmembers for planetary mineralogy (Hesar et al., 28 Mar 2025).
- SLAM and Place-Recognition Sequences: Seven multimodal traverses use AVT Manta G-319C stereo cameras (2 MP, 10 Hz), Blickfeld Cube 1 MEMS LiDAR (4.7 Hz, 17,800 pts/scan), XSens MTi-G 10 IMU (400 Hz), and UBlox GNSS (5 Hz). Sensor synchronization is via PTP and hardware triggers; ground truth is refined with RTKLIB SLAM alignment to EUREF base (Giubilato et al., 27 Jan 2026).
- Sentinel-2 Volcanic Monitoring Dataset: Orbits include SWIR-augmented RGB and 9-channel MSI cubes at 10, 20, and 75 m GSD, with crops of 224×224 pixels over 35 volcanoes, including Vulcano (Priyasad et al., 27 Oct 2025).
The file hierarchy reflects modality, with per-sequence subdirectories (/cam0, /cam1, /lidar, /imu.csv, /gnss), calibration YAMLs (intrinsics, extrinsics), and per-modality timestamps as Unix nanoseconds (Giubilato et al., 27 Jan 2026).
2. Acquisition, Calibration, and Preprocessing Protocols
Meticulous calibration and preprocessing pipelines guarantee scientific fidelity across modalities:
- Hyperspectral Workflows: Dark current subtraction (shutter closed), white reference normalization using a 99% Spectralon panel, and (Raw − Dark)/(White − Dark) reflectance normalization, followed by σ-clipping to remove outliers and unit-variance band-normalization. Extreme bands (<410 nm, >990 nm) with low SNR are omitted (Hesar et al., 28 Mar 2025).
- SLAM Modalities: Camera intrinsic calibration follows plumb-bob radial–tangential models; stereo and LiDAR–camera extrinsics are resolved via CalDe/CalLab and checkerboard alignment, IMU–camera via Kalibr. Temporal alignment is hardware-triggered with post-hoc adjustment using cross-modal VI-odometry latency estimation. GNSS logs are differentially corrected (RTKLIB) for sub-decimeter ENU pose, and IMU bias/intrinsic parameters are released in YAML (Giubilato et al., 27 Jan 2026, Giubilato et al., 2022).
- Data Organization and Annotation: Place-recognition labeling leverages field-of-view overlap and 3D pointcloud intersections, assigning positive matches by geometric criteria (e.g., FoV overlap ≥0.3, spatial distance ≤10 m) with all label metadata preserved in JSON or pickle (Giubilato et al., 27 Jan 2026).
3. Mineralogical Characterization and Hyperspectral Analysis
S3LI Vulcano’s hyperspectral cube delivers high signal-to-noise spectral profiles representative of lunar-like volcanic basalts:
- ROI Statistics: Within-ROI standard deviation is typically <0.02 reflectance units; mean reflectance spectra distinguish olivine (asymmetric absorption ~1000 nm; high 600–700 nm reflectance) from pyroxene (sharper absorption near 900 nm, secondary dips at 520–550 nm). Convex hull continuum removal isolates band centers for mineral discrimination (Hesar et al., 28 Mar 2025).
- Dimensionality Reduction: PCA on the reflectance matrix (X ∈ ℝ{N×B}) retains >95% spectral variance in the first three PCs, with PC1 vs. PC2 separating olivine–pyroxene endmembers.
- Clustering Methodology: K-Means (k=4, silhouette 0.47), hierarchical agglomerative clustering (Ward’s, up to 94% olivine-NMF similarity), GMM (k=4, RMSE up to 0.55), and spectral clustering are benchmarked per region, with performance ranked by silhouette, RMSE, and cluster-spectrum similarity to laboratory template spectra (Hesar et al., 28 Mar 2025).
- Geological Implications: NMF abundance estimation indicates 90–100% olivine across most ROIs (Region 1: 98%, Region 6: 75% olivine/25% pyroxene), directly paralleling lunar maria mineralogy and supporting Vulcano’s application as a lunar analogue (Hesar et al., 28 Mar 2025).
4. SLAM, Place Recognition, and Loop Closure Benchmarking
S3LI Vulcano is specifically architected for stress-testing visual–LiDAR–inertial SLAM in planetary-analog volcanics:
- Data Environment: Unstructured terrains (basaltic outcrops, lava tubes, dry vegetation) exhibit severe aliasing (repetitive features), minimal texture, and GNSS denial—mimicking lunar exploration conditions (Gonzalez et al., 7 Nov 2025).
- Benchmark Tasks: Place recognition (loop/non-loop label generation via overlap α=), 6-DoF pose estimation, and end-to-end SLAM are supported. Evaluation reports include error metrics (RMSE normalized by path length, recall@K, mAP) and initialization/failure rates per sequence (Giubilato et al., 27 Jan 2026, Gonzalez et al., 7 Nov 2025).
- Representative Baselines: R-VIO2 and VINS-Fusion, tested on the “waterfront” and “moon_lake” traverses, achieve RMSE as low as 0.03–0.28% (normalized trajectory length). Challenge factors include drift under low vertical LiDAR structure, visual loop closure failures under aliasing, and photometric contrast (Giubilato et al., 27 Jan 2026, Giubilato et al., 2022).
- Toolkit Support: The s3li-toolkit (github.com/DLR-RM/s3li-toolkit) provides utilities for data conversion, calibration, GNSS RTK processing, pairwise place-recognition label assignment, and baseline evaluation scripts (Giubilato et al., 27 Jan 2026).
5. Volcanic Activity Detection: Satellite Benchmarking and Onboard Inference
Beyond ground-based sensing, the S3LI Vulcano Dataset encompasses orbital monitoring benchmarks for anomalous volcanic events:
- Sentinel-2 Modalities: Crops of 224×224 pixels at 10/20/75 m GSD use SWIR-augmented RGB and full MSI cubes, supporting both thermal anomaly/ash emission annotation (manual + automated SWIR channel fusion) and ML benchmarking for eruption detection (Priyasad et al., 27 Oct 2025).
- Dataset Splits: Standardized partitions (Train: 2,343; Validation: 311; Test: 729), as well as Leave-One-Volcano-Out cross-validation, ensure comparability (Priyasad et al., 27 Oct 2025).
- Benchmarked Models: ResNet18–152, MobileNetV3, Swin-Transformer (Tiny/Base/Large), custom DCNNs (≤5 MB). Top F1 on holdout (20 m GSD) reaches 0.9586 (Swin-Large), with distilled DCNN achieving F1 ≈ 0.9185 and suitability for onboard deployment (Priyasad et al., 27 Oct 2025).
- Onboard Execution: Intel Movidius Myriad X is supported via OpenVINO IR compilation; full inference loop achieves throughput of 40–55 fps (latency ≈18–25 ms), with only ±1% variation from ground benchmarks (Priyasad et al., 27 Oct 2025).
6. Data Access, Licensing, and Toolkit
- Public Release: S3LI Vulcano (ground-based and Sentinel-2 variants) is freely available for non-commercial research; access via https://rmc.dlr.de/s3li_dataset (SLAM, place recognition, mineralogy) and https://github.com/your-lab/S3LI_Vulcano (volcanic activity detection) (Giubilato et al., 27 Jan 2026, Priyasad et al., 27 Oct 2025).
- Licensing: DLR Research Use License and CC-BY-4.0 for broad academic use. All supporting scripts, model definitions, and annotator protocols are released alongside (Priyasad et al., 27 Oct 2025, Giubilato et al., 27 Jan 2026).
- Citation Requirements: Publications employing the dataset are expected to cite the appropriate primary publications, e.g., (Hesar et al., 28 Mar 2025, Gonzalez et al., 7 Nov 2025, Priyasad et al., 27 Oct 2025, Giubilato et al., 27 Jan 2026, Giubilato et al., 2022).
7. Applications, Limitations, and Outlook
The S3LI Vulcano Dataset provides a benchmark for:
- Machine Learning Mineralogy: Hyperspectral ROI analysis for olivine/pyroxene discrimination.
- SLAM and Place Recognition: Robustness assessment in GNSS-denied, unstructured volcanics.
- Volcanic Activity Detection and Onboard Inference: Near-real-time event recognition in orbital imagery, with workflows for satellite-borne compute.
- Semantic Segmentation, Domain Adaptation, and Cross-Modal Learning: Segmentation of geological classes and transfer learning between terrestrial and planetary analogues.
Limitations include absence of nighttime/color imagery in SLAM data, sparse LiDAR without intensity cues, and low vegetation content (models may underperform in forested settings). The dataset’s focus on planetary-analog conditions implies limited generalization to urban or temperate terrestrial domains (Gonzalez et al., 7 Nov 2025, Giubilato et al., 27 Jan 2026).
S3LI Vulcano constitutes a rigorously curated, multi-modal benchmark—linking in situ geometallurgical analysis, navigation in extreme topographies, and automated activity detection—enabling methodologically robust advances in robotics, remote sensing, and planetary science.