Synthetic Underwater Benchmark Dataset

Updated 22 January 2026

Synthetic underwater benchmark datasets are rigorously simulated collections that emulate real aquatic imaging conditions using physics-based and generative approaches.
They are constructed by combining rendering, asset libraries, and parameterized environmental effects like scattering, marine snow, and turbidity.
These benchmarks facilitate algorithm development in restoration, depth estimation, tracking, and sonar analysis through extensive ground-truth annotations.

A synthetic underwater benchmark dataset is a rigorously constructed collection of underwater imaging data generated by computational or physically-based simulation rather than direct capture in real aquatic environments. These datasets play a foundational role in the development and evaluation of algorithms for underwater perception, restoration, depth estimation, object tracking, event-based sensing, and sonar analysis. Synthetic benchmarks enable large-scale, annotated data generation with exact ground truth under controlled variability, addressing the inherent logistical constraints and incomplete supervision characteristic of real underwater data acquisition.

1. Modeling Underwater Imaging: Physical and Generative Approaches

Synthetic dataset construction for underwater vision and acoustics almost universally begins with explicit modeling of the relevant image or signal formation process. For optical imagery, the dominant frameworks are the Jaffe–McGlamery model and its successors, which incorporate the effects of wavelength-dependent absorption, scattering (both forward and backward), and particulate matter (“marine snow”):

Classical Image Formation:

$I_c(x) = J_c(x)\,\exp(-\beta_c\,d(x)) + B_c^\infty\,\left[1-\exp(-\beta_c\,d(x))\right]$

with $I_c(x)$ the observed intensity in channel $c$ , $J_c(x)$ the true scene radiance, $\beta_c$ the attenuation, $d(x)$ the range, and $B_c^\infty$ the asymptotic veiling light (Cai et al., 2 Jul 2025, Wen et al., 2023).

Enhanced Physics-Inspired Models:

Many modern works extend this to account for non-uniform media, wavelength-specific scattering, learned backscatter fields, and forward scattering terms—critical in simulating realistic degradations, especially in turbid or deep scenes (Ismiroglou et al., 18 Sep 2025, Kaneko et al., 2024).

Marine Snow Synthesis:

Specialized datasets explicitly model marine snow as analytically parameterized 2D intensity profiles (elliptic frustums), simulating both typical “highland” and “volcanic crater” particle artifacts with spatial blending, randomization, and surface perturbation to match observed oceanic imagery (Kaneko et al., 2021).

Generative Data Synthesis:

In cases where forward simulation cannot capture the diversity of real underwater scenes, generative frameworks such as GANs, diffusion models, and large-scale image-to-image translation architectures are deployed. These produce paired synthetic degraded and “clean” images for restoration/learning, leveraging style codes or learned mappings sampled from real-world imagery (Tian et al., 18 Nov 2025, Jain et al., 2022).

Sonar and Acoustic Signal Modeling:

For acoustic datasets, the forward model incorporates propagation loss, target reflectivity, beam pattern, and reverberation. Simulation platforms such as Gazebo or Unreal’s HoloOcean integrate CAD assets, wave physics, and stochastic noise to generate side-scan images, echo profiles, and 3D reconstructions (S et al., 2024, Oliveira et al., 21 May 2025).

2. Dataset Construction Pipelines

Dataset synthesis pipelines combine physical scene assets, probabilistic parameter sampling, and rendering or simulation:

Scene Generation:

Asset libraries (e.g., ShapeNet, photogrammetry-derived fish models or scanned corals) are spatially distributed in a simulated 3D volumetric scene. Environmental settings such as lighting, fog density, and turbidity are randomized to model varied water types and visibility regimes (Lv et al., 2024, Mansour et al., 19 May 2025).

Rendering and Data Simulation:

For optical data, physically-based renderers (Blender Cycles, Unreal Engine 5) simulate volumetric scattering, absorption, caustics, and sun-glint. For sonar, acoustic ray-tracing and array sampling are invoked to create beamformed intensity images and point clouds, with configurable sensor geometry emulated at the protocol level (S et al., 2024, Oliveira et al., 21 May 2025).

Particle and Artifact Injection:

Degradations including marine snow, particulate scattering, and specular highlights are injected through parameterized pr ocesses, often drawing statistical properties from small-crop measurements on real underwater images (Kaneko et al., 2021, Kaneko et al., 2024).

Reference and Distorted Pairs:

Most visual benchmarks provide paired clean/degraded or air/underwater examples, with some including pixel-wise depth, disparity, semantic masks, or camera pose annotations. Synthetic–real domain bridging is achieved via multimodal style transfer networks or domain-adversarial adaptation (Jain et al., 2022, Tian et al., 18 Nov 2025).

3. Dataset Scope, Structure, and Modalities

Synthetic underwater benchmarks span a wide range of modalities and tasks:

Single Image and Video Enhancement:

Large-scale paired datasets (e.g., PHISWID, PGSM, UWNature) offer thousands to hundreds of thousands of atmospheric–underwater pairs for color restoration and dehazing (Kaneko et al., 2024, Wen et al., 2023, Tian et al., 18 Nov 2025). Synthetic underwater video benchmarks provide temporally consistent sequences for multi-frame denoising and enhancement (SUVE) (Du et al., 2024).

Stereo and Depth Estimation:

Datasets such as UWStereo and synthetic Hypersim underwater variants offer stereo pairs and dense metric ground truth for disparity and depth prediction; water type, veiling-light, and geometry are varied for robust domain coverage (Lv et al., 2024, Cai et al., 2 Jul 2025).

Acoustic Imaging and Sonar:

S3Simulator and Synthetic Enclosed Echoes (SEE) produce thousands of simulated sonar images and 3D reconstructions, including variations in seafloor texture, target class (ships, planes), and noise regimes. Each sample typically includes paired synthetic–real or multi-modal outputs (polar, Cartesian, point cloud) (S et al., 2024, Oliveira et al., 21 May 2025).

Event-Based Vision:

Datasets such as eStonefish-scenes and UEOF introduce microsecond-resolution event streams, dense flow labels, and paired intensity/depth, allowing benchmarking of event-based, neuromorphic optical flow, and odometry networks (Mansour et al., 19 May 2025, Truong et al., 15 Jan 2026).

Object Tracking & Semantic Benchmarks:

Synthetic video sets with per-frame object bounding boxes and trajectories augment real multi-object tracking benchmarks by introducing parameterized backgrounds, turbidity, and distractor entities (Pedersen et al., 2023).

Airborne Through-Water and Bathymetry:

Synthetic benchmarks such as Sea-Undistort simulate glint, wave-induced distortions, and volumetric scattering for aerial–underwater mapping, supporting image restoration and bathymetric model evaluation (Kromer et al., 11 Aug 2025).

4. Benchmarking Protocols, Evaluation Metrics, and Baseline Results

Standardized benchmarks prescribe specific splits, metrics, and protocols to permit fair comparison:

Image/Restoration Quality:

Quantitative performance is assessed via PSNR, SSIM, and domain-specific underwater image quality measures (UCIQE, UIQM, CLIPIQA, MUSIQ, BRISQUE, CONTRIQUE). Paired set-ups enable absolute fidelity evaluation; unpaired real sets are assessed with no-reference metrics (Kaneko et al., 2024, Tian et al., 18 Nov 2025, Kromer et al., 11 Aug 2025).

Video Consistency:

Temporal consistency in enhanced video is measured via MABD and color histogram dynamics (CDC) (Du et al., 2024).

Stereo and Depth:

End-point error (EPE), bad-pixel rate ( $>$ 3 px), threshold accuracy $\delta_t$ , AbsRel, scale-invariant log RMSE (SiLog), and Hausdorff distance (for 3D sonar data) are used for geometric modalities (Lv et al., 2024, Cai et al., 2 Jul 2025, Oliveira et al., 21 May 2025, Truong et al., 15 Jan 2026).

Detection, Tracking, and Segmentation:

Tasks including multi-object tracking, segmentation, and event-based flow estimation benchmark detection/association (e.g., HOTA, MOTA, IDF1) and segmentation overlaps (IoU, mIoU, pixel classification rates) (Pedersen et al., 2023, Kromer et al., 11 Aug 2025).

Qualitative and Human Assessment:

Expert user studies rank synthesized image realism and restoration fidelity; e.g., selection rates of 82.5% for pipelines incorporating forward scattering and inhomogeneous media (Ismiroglou et al., 18 Sep 2025).

Baseline Model Performance:

Classical filtering and transform methods are directly compared to deep neural architectures (U-Net, WaterNet, ResShift, CenterTrack, DenseNet121, ElevateNET variants), with synthetic data routinely supporting neural models exceeding prior baselines in fidelity, robustness, and transfer to real data (Kaneko et al., 2021, Tian et al., 18 Nov 2025, Du et al., 2024, S et al., 2024, Mansour et al., 19 May 2025).

5. Limitations, Pitfalls, and Best Practices

While synthetic underwater benchmarks offer precise supervision and arbitrary scale, domain fidelity and generalization remain ongoing challenges:

Physics-Realism vs. Domain Gap:

Despite advances in scattering, inhomogeneity, and particle modeling, fully replicating the photometric subtleties of real underwater light transport and sensor characteristics is not yet feasible. Bridging the sim-to-real gap often requires domain adaptation, inclusion of learned style mappings, or post-processing with real data distributions (Jain et al., 2022, Wen et al., 2023).

Scene and Task Diversity Constraints:

Many datasets focus on specific geometries (indoor scenes, shallow or tank-based sonar) or restricted object classes, potentially limiting cross-domain generalization. Open-water, long-range, and multi-source lighting scenarios are under-represented (Cai et al., 2 Jul 2025, Oliveira et al., 21 May 2025, Lv et al., 2024).

Parameter Space Coverage:

Model realism is tied to the diversity of water-type parameters (Jerlov classes, turbidity, illumination). Expanding these with well-characterized real-world measurements improves the underlying benchmark's utility and breadth (Wen et al., 2023, Ismiroglou et al., 18 Sep 2025, Kaneko et al., 2024).

Annotation and Ground Truth Precision:

Synthetic methods provide exact, dense supervision, but accuracy is limited to the simulation model. For matched synthetic-real benchmarks, efforts are made to replicate tank or open-sea geometries, sensor pose, and environmental materials (Oliveira et al., 21 May 2025).

Best Practices:

Recommendations include: balanced data splits; randomizing water types and degradation parameters per sample; cross-validation on real and synthetic; leveraging both reference and no-reference metrics; reporting both quantitative and user-perceptual outcomes; and open-source releases with full metadata logs (Wen et al., 2023, Ismiroglou et al., 18 Sep 2025, Kaneko et al., 2024).

6. Impact, Application Domains, and Availability

Synthetic underwater benchmarks underpin rapid progress in underwater perception:

Algorithm Development:

High-functional impact across restoration, enhancement, 3D reconstruction, metric depth, event-based SLAM/odometry, object tracking, and semantic segmentation (Truong et al., 15 Jan 2026, Kaneko et al., 2024, Pedersen et al., 2023).

Robotics and Navigation:

Benchmarks support the development and objective comparison of algorithms for AUV/ROV navigation, obstacle avoidance, and mission planning in optically or acoustically adverse environments (Lv et al., 2024, Mansour et al., 19 May 2025, Oliveira et al., 21 May 2025).

Cross-Domain Transfer and Generalization:

Recent works explicitly evaluate transfer from terrestrial/synthetic to real underwater domains, with pretraining on synthetic data significantly improving finetuning and downstream real-world performance (Cai et al., 2 Jul 2025, Jain et al., 2022, Ismiroglou et al., 18 Sep 2025).

Accessible Resources:

Most major datasets release data, generation scripts, and baseline implementations on open repositories with permissive licensing (e.g., https://github.com/ychtanaka/marine-snow, https://github.com/RockWenJJ/SyreaNet.git, https://github.com/yftian2025/SynUIEDatasets.git, https://github.com/osakau-uwvision/PHISWID, https://robotic-vision-lab.github.io/ueof) (Kaneko et al., 2021, Wen et al., 2023, Tian et al., 18 Nov 2025, Kaneko et al., 2024, Truong et al., 15 Jan 2026).

Synthetic underwater benchmark datasets have become integral to validating, comparing, and generalizing perception and reconstruction methods under the complex, varied degradations characteristic of aquatic environments. Their continuing development, driven by increasingly sophisticated modeling and generative frameworks, ensures an evolving foundation for the underwater vision and robotics community.