CMU-GPR Dataset for Subsurface-Aided Localization
- CMU-GPR Dataset is a multimodal open-source resource integrating GPR, IMU, wheel encoders, and RGB camera data for indoor, GPS-denied localization research.
- It employs synchronized sensor streams with precise temporal alignment and calibration to ensure accuracy in odometry and mapping evaluation.
- Advanced preprocessing and subsurface feature matrix construction enable robust scan matching, SLAM, and feature extraction for practical research applications.
The CMU-GPR Dataset is a multimodal, open-source dataset designed for research in subsurface-aided robot localization and mapping in GPS-denied indoor environments. It provides synchronized measurements from single-channel ground-penetrating radar (GPR), inertial measurement unit (IMU), wheel encoders, auxiliary RGB camera, and high-accuracy ground truth via a robotic total station. Primarily captured to enable spatio-temporal mapping and odometry based on subsurface features, the dataset is widely referenced in recent works on GPR-assisted robotic navigation and multimodal sensor fusion (Li et al., 24 Mar 2025, Baikovitz et al., 2021).
1. Sensor Modalities, Hardware, and Configurations
The dataset integrates several sensor modalities to enable comprehensive perception of both surface and subsurface geometries:
- Ground-Penetrating Radar (GPR): Single-channel, monostatic configuration producing A-scans at fixed intervals, collocated transmitter and receiver on a ground platform. The original CMU-GPR release specifies a Sensors & Software Noggin 500 unit (250 MHz nominal center frequency, ~20 Hz A-scan rate, ~0.25 ns vertical resolution). Frequency range, RF bandwidth, and full hardware details are provided in the manufacturer’s datasheets; some publications utilizing CMU-GPR do not restate all specifications (Baikovitz et al., 2021).
- Inertial Measurement Unit (IMU): XSENS MTI-30, providing tri-axial accelerometer, gyroscope, and magnetometer data at 100 Hz. The full noise, bias, and calibration parameters are included as standard in metadata files.
- Wheel Encoders: YUMO quadrature encoders with 1024 ticks/rev, mounted on 0.1 m diameter wheels, giving sub-millimeter spatial increment resolution.
- RGB Camera: Intel RealSense D435, 848×480@30 Hz; used for incidental frame capture, not for odometry in GPR-centric studies.
- Robotic Total Station: Leica TS15, providing 3D position (x, y, z) at 5–10 Hz with <5 mm error for ground truth (Baikovitz et al., 2021).
All sensing modalities are timestamped on a common controller clock for precise temporal alignment; total station timing is wirelessly synchronized.
2. Data Collection Protocols and Environments
The data were collected in controlled, non-GPS-accessible indoor environments, enabling repeatable benchmarking under consistent subsurface conditions:
- Environments:
- gates_g: parking garage (3 sequences, ~365 m total)
- nsh_b: university basement (7 sequences, ~264 m)
- nsh_h: factory-floor hall (5 sequences, ~90 m)
- Trajectory Executions: Each sequence ranges from ~2 to ~10 minutes at nominal speeds of ~0.5 m/s, with both forward and backward passes and closed loops to increase feature diversity and revisit potential for relocalization.
- Surface/Subsurface Composition: Predominantly concrete slabs with embedded rebar, utility conduits, and metallic fixtures; minimal soil heterogeneity or layered structure, minimizing dielectric complexity in B-scan data.
- Environmental Conditions: All sequences are indoor; temperature, humidity, and seasonal factors are not varied, suggesting stable sensor and ground-truth performance.
The measurement platform consists of a manually pulled, sensor-rigged trolley with onboard compute (Intel NUC), logging all streams and establishing wireless ground-truth synchronization via the base station (Baikovitz et al., 2021).
3. Dataset Organization, Structure, and File Formats
The dataset is organized by environment and trajectory, each with dedicated folders conforming to the following structure:
| Modality | File Type/Folder | Description |
|---|---|---|
| GPR | raw_gpr.bin/.mat/.csv | Raw A-scan time series, B-scan aggregation, timestamped |
| IMU | imu.csv | Timestamps, accelerations (ax, ay, az), gyros (ωx, ωy, ωz) |
| Wheel Encoder | encoder.csv/enc.csv | Timestamps, left/right tick counts |
| Camera | images/, cam_timestamps.csv | RGB frames and matching timestamps |
| Ground Truth | totalstation.txt/ground_truth.csv | Timestamps, global (x, y, z) positions (total station) |
| Metadata/Calibration | metadata.yaml | Sensor specs, calibration info, IMU noise/bias |
File formats are simple comma-separated values (CSV) for time series; GPR raw data is stored as time-amplitude samples per line (one A-scan per row). Full CAD-based rigid-body transforms for all sensors to the robot base frame are available via factory calibrations and provided on the project site (Baikovitz et al., 2021). Provided utilities generate uniformly spaced 2D radargram images (B-scans) for learning and mapping applications.
4. Synchronization, Calibration, and Ground Truth
- Temporal Alignment: All onboard modalities are synchronized via the NUC system clock; total station synchronization is achieved through the wireless link. Consistency of the Δt between modalities (GPR, IMU, encoder) is enforced for factor-graph pre-integration; specific synchronization equations are not provided in the foundational papers (Li et al., 24 Mar 2025).
- Spatial Calibration: Sensor-to-base and body-frame transforms, including the calibration of the GPR antenna, IMU, and camera optical frames, are assumed constant, derived from direct measurement or CAD.
- Ground Truth: The total station tracks a reflective prism on the robot roof, yielding 3D position at ≈10 Hz and manufacturer-rated accuracy of <5 mm. This trajectory is used as “perfect” ground truth for odometry RMSE evaluation—reported combined RMSEs are on the order of 0.4–0.7 m over full traversals (Li et al., 24 Mar 2025).
5. Signal Preprocessing and the Subsurface Feature Matrix
Raw GPR data in the CMU-GPR dataset undergoes standard preprocessing before being used for odometry or mapping:
- Preprocessing Pipeline (as per (Baikovitz et al., 2021)):
- Rubber-band interpolation for uniform spatial trace spacing
- Mean background subtraction, “dewow” filtering, bandpass filtering, zero-time correction, time-varying (SEC) gain, wavelet denoising, Gaussian smoothing
- Subsurface Feature Matrix (SFM) Construction (Li et al., 24 Mar 2025):
- B-scans are decomposed in the frequency domain per Li et al. (2024).
- Damped-sinusoid models are fit to the scan amplitudes using MLE to locate prominent peaks.
- Each peak is quantized and binned into a 2D histogram, with amplitude discretized to an integer bucket.
- The resulting SFM serves as a compact feature map, facilitating robust scan matching and multimodal integration.
Parameter choices and thresholds (e.g., DS, LS, amplitude quantization) inherit from prior Li et al. pipelines; not all are numerically specified.
6. Size, Usage, and Benchmarking Practices
- Summary Statistics:
- ~2 GB of GPR data (≈15 runs × ~500 B-scans/run × D samples/B-scan)
- IMU: ≈3 million samples (200 Hz, 5–10 min/run, 15 runs)
- Encoder: ≈1 million ticks, ~50 Hz logging
- Total recording time: ≈75 minutes
- Benchmark Recommendations:
- Environment-based splits: train/validate on nsh_b + nsh_h, test on gates_g, or vice versa
- Report trajectory RMSE versus ground truth; for GPR-only odometry, compare encoder-derived with ground-truth travel
- In multimodal pipelines, evaluate using standard factor-graph solvers (e.g., GTSAM) employing IMU preintegration, GPR travel-distance, and wheel-encoder factors
Open-source utility scripts in Python are provided for preprocessing, radargram generation, and creating auto-labeled submaps to support SLAM, odometry, or supervised learning pipelines.
7. Applications, Availability, and Limitations
The CMU-GPR dataset has become a canonical testbed for:
- Subsurface-aided localization: GPR-based odometry, fusion with IMU and encoder signals to provide robust navigation in environments where visual or LiDAR features are unreliable.
- Mapping/SLAM: Online mapping of subsurface reflectors; association of hyperbolic reflection features for loop closure.
- Feature Extraction: Supervised detection of pipes, rebar, and other subsurface features via 2D radargram analysis or CNN-based approaches.
- Evaluation: Standardized benchmarking of odometry accuracy, fusion techniques, and learning-based mapping.
The data is hosted on GitHub [https://github.com/rpl-cmu/CMU-GPR-Dataset] under the MIT license, with each run available as a ≤4 GB archive via Git LFS. Full hardware and calibration details, along with guidance on coordinate systems and recommended usage practices, are available in the technical documentation and metadata files (Baikovitz et al., 2021).
Limitations include lack of raw environmental variability, non-disclosure of certain hardware settings in derivative publications, and the use of rigid, engineered subsurfaces. For precise modeling and interpretation of GPR signals, users are referred to the instrument datasheets and supplementary material accompanying the dataset (Baikovitz et al., 2021).
References
- "Ground Penetrating Radar-Assisted Multimodal Robot Odometry Using Subsurface Feature Matrix" (Li et al., 24 Mar 2025)
- "CMU-GPR Dataset: Ground Penetrating Radar Dataset for Robot Localization and Mapping" (Baikovitz et al., 2021)