UIEB: Underwater Image Enhancement Benchmark
- UIEB is a comprehensive benchmark that provides 950 annotated underwater images to support quantitative and perceptual evaluation of enhancement algorithms.
- It employs full-reference metrics (PSNR, SSIM) and no-reference measures (UIQM, UCIQE) along with human assessments to ensure robust performance analysis.
- UIEB drives algorithm development by benchmarking diverse methods—from classical and CNN-based techniques to GANs and diffusion models—highlighting trade-offs in performance and efficiency.
The Underwater Image Enhancement Benchmark (UIEB) is a large-scale, real-world dataset and evaluation platform for underwater image enhancement (UIE), designed to support rigorous algorithmic comparison, facilitate perceptual and quantitative analysis, and guide the development of robust data-driven and physically grounded enhancement methods. UIEB provides 950 natural underwater images with comprehensive annotation: 890 raw images paired with human-selected "reference" enhancements and 60 challenging images intentionally left unpaired due to the failure of all reference-generation candidates. The benchmark defines community-standard evaluation protocols, including full-reference metrics such as PSNR and SSIM, no-reference measures such as UIQM and UCIQE, and subjective human assessments, enabling fine-grained analysis of both algorithmic performance and perceptual quality across diverse underwater scenes (Li et al., 2019).
1. Dataset Construction and Annotation
UIEB comprises 950 real-world underwater images spanning a wide range of aquatic environments, acquired from heterogeneous sources including online repositories, prior underwater vision studies, and authors' fieldwork. Image resolutions range from 300×200 up to 8,000×6,000 pixels, stratified across typical capture conditions (40% at 640×480–1,200×800, 30% below, 30% above). The dataset covers extensive scene variability: coral reefs, marine fauna, wrecks, man-made structures, sediment-laden waters, and diverse viewpoints, with color casts (greenish, bluish, yellowish), scattering, haze, and low-visibility conditions comprehensively represented (Li et al., 2019, Bakht et al., 2023).
Of these, 890 images are paired with reference images following a human-in-the-loop selection protocol. For each raw image, twelve state-of-the-art enhancement algorithms (including fusion-based [Ancuti et al.], UDCP [Drews-Jr et al.], histogram-prior [Li et al.], Dive+, etc.) were used to generate candidate enhancements. A panel of 50 volunteers conducted pairwise visual comparisons in a knockout tournament structure; the most-preferred candidate was deemed the reference unless a majority found it unsatisfactory. Images without a satisfactory enhancement (60/950, commonly exhibiting extreme color casts, high backscatter, or severe visibility loss) were collected in the “challenging” set for no-reference and qualitative evaluation. The reference set is dominated by Dive+ (43.9%) and fusion-based (24.7%) outputs, with the rest contributed by a spectrum of algorithmic approaches (Li et al., 2019).
2. Evaluation Protocols and Quality Metrics
UIEB supports both full-reference and no-reference assessment, and provides a detailed evaluation framework.
Paired Evaluation (Full-Reference, 890 Pairs):
- Train/test split: Common practice involves 800 images for training, 90 for testing/validation (Zhou et al., 2023, Bakht et al., 2023).
- Metrics:
- Mean Squared Error (MSE):
- Peak Signal-to-Noise Ratio (PSNR): , with for 8-bit images.
- Structural Similarity Index (SSIM): , averaged over local windows (Bakht et al., 2023, Zhou et al., 2023).
- Underwater Image Quality Measure (UIQM): , with standard weights , , (Schein et al., 27 Jan 2025, Bakht et al., 2023).
Unpaired Evaluation (No-Reference, "Challenging60"):
- Metrics:
- UIQM (see above for formula).
- UCIQE:
- NIQE: Non-reference naturalness metric, lower is better (Bakht et al., 2023).
Subjective scoring by the original 50 volunteers is included for the “challenging” set, using a five-point Likert scale (1=Bad, 5=Excellent) (Li et al., 2019).
3. Algorithmic Benchmarks and Results
UIEB is the established benchmark for comparing UIE algorithms, including classical physical-prior methods, deep CNNs, GANs, diffusion models, and hybrid dual-domain techniques. The transition from classical to deep architectures (Water-Net (Li et al., 2019), MuLA-GAN (Bakht et al., 2023), FUSION (Walia et al., 1 Apr 2025), DGNet (Zhou et al., 2023), Mamba-UIE (Zhang et al., 2024), UDBE (Schein et al., 27 Jan 2025)) coincides with consistent increases in PSNR/SSIM and visually substantiated improvements in color fidelity, contrast, and fine detail restoration.
Representative Quantitative Results on UIEB (Paired Test Split, PSNR/SSIM/UIQM):
| Method | PSNR (dB) | SSIM | UIQM |
|---|---|---|---|
| Water-Net | 19.11 | 0.797 | — |
| FUSION | 23.72 | 0.883 | 3.414 |
| MuLA-GAN | 25.59 | 0.893 | — |
| DGNet-L | 25.62 | 0.929 | — |
| Mamba-UIE | 27.13 | 0.93 | 3.10 |
| UDBE | 23.54 | 0.856 | 0.928 |
DGNet (Zhou et al., 2023) achieves 25.62 dB/0.929 SSIM with <1M parameters; MuLA-GAN (Bakht et al., 2023) 25.59 dB/0.893 SSIM and highest non-reference UIQM/UCIQE; Mamba-UIE (Zhang et al., 2024) reports the highest PSNR/SSIM (27.13/0.93) by leveraging physical-model constraints and a linear-complexity hybrid transformer/CNN. FUSION’s dual-domain architecture provides top results in perceptual and UIQM metrics (Walia et al., 1 Apr 2025).
Qualitative findings highlight that recent attention-based and physics-aware models are superior in restoring natural color balance, mitigating turbidity and scattering, enhancing global and local contrast, and adaptively suppressing artifacts, especially in challenging scenes.
4. Baseline Architectures and Methodological Advances
Water-Net (Li et al., 2019) serves as the canonical UIEB-trained baseline, employing a gated fusion CNN that combines white balancing, histogram equalization, and gamma correction in an end-to-end learnable framework. The architecture includes feature transformation units (FTUs), guided fusion based on confidence maps, and is optimized via VGG19-based perceptual loss, eliminating oversmoothing associated with traditional / objectives.
Recent methods introduce critical advances:
- Frequency/spatial dual-domain fusion: (FUSION (Walia et al., 1 Apr 2025)) leverages FFT-based attention to address long-range color dependencies and combines it with multi-scale convolutional and CBAM-style attention in the spatial domain.
- Physically-constrained models: (Mamba-UIE (Zhang et al., 2024)) estimates scene radiance and physical light transmission using a revised image formation model, enforcing reconstruction consistency at both pixel and perceptual levels via composite losses involving , SSIM, edge, and UIQM regularization.
- Self-supervised brightness and SNR conditioning: (UDBE (Schein et al., 27 Jan 2025)) introduces conditional diffusion guided by per-channel color and Gaussian-blur-based SNR maps, producing robust brightness enhancement and uniformity even in low-light or shadowed regions.
A plausible implication is that domain-invariant physical priors and spectral attention are key for both generalization and photorealism in underwater enhancement.
5. Limitations, Failure Modes, and Analysis
UIEB, while the dominant benchmark, presents several structural and interpretation challenges:
- Reference Limitations: The “reference” for each paired image is itself an algorithmically produced enhancement, not true ground truth. In challenging underwater scenes, human raters may overlook distant veiling effects or subtle color shifts, introducing selection bias (Li et al., 2019). In UDBE’s evaluation, references are synthetically generated via brightness offsets, which may not fully match real-world lighting (Schein et al., 27 Jan 2025).
- Metric-Grounding Issues: Classical no-reference metrics (UIQM, UCIQE) are only loosely correlated with perceptual quality; over-enhancement or strong color shifts can yield high scores despite limited human acceptability (Bakht et al., 2023, Li et al., 2019).
- Content and Scene Diversity: The benchmark’s single-reference protocol does not grade multiple enhancement levels and may favor more aggressive corrections that are not universally preferable. Highly turbid or extreme lighting scenarios remain underrepresented (Li et al., 2019).
- Compute Bottlenecks: Some diffusion-based methods (UDBE, DDIM) deliver superior fidelity but at significant computational cost relative to CNN- or GAN-based models (Schein et al., 27 Jan 2025).
6. Impact and Usage in UIE Research
UIEB has become the de facto standard for training, validation, and cross-comparison of underwater image enhancement models, cited across all contemporary UIE deep learning literature (Schein et al., 27 Jan 2025, Zhang et al., 2024, Zhou et al., 2023, Walia et al., 1 Apr 2025, Bakht et al., 2023, Li et al., 2019). Prominent studies have established its utility for both supervised (paired) and unsupervised/self-supervised (unpaired) learning protocols, reinforcing robust generalization across domains (SUIM, RUIE, EUVP). The public release of the dataset and code (https://li-chongyi.github.io/proj_benchmark.html) has facilitated reproducibility and rapid benchmarking (Li et al., 2019).
The benchmark enables development and refinement of advanced methodology:
- Multi-branch feature fusion (Water-Net, FUSION).
- Multi-level attention and GAN frameworks (MuLA-GAN).
- Dynamic gradient supervision and frequency domain regularization (DGNet).
- Diffusion-based generative models with task-specific conditioning (UDBE).
- Physical-model constraints and hybrid state-space/conv-transformer architectures (Mamba-UIE).
7. Prospects and Open Challenges
UIEB authors and subsequent analyses highlight future directions:
- Dataset expansion with more extreme degradations, spatial depth maps, and video sequences.
- Inclusion of physically accurate priors based on more recent underwater formation models (e.g., Sea-Thru).
- Development of no-reference metrics that better align with perceptual image quality, jointly considering color, sharpness, naturalness, and veiling light (Li et al., 2019).
- Introduction of multi-level reference annotation (grading image enhancement levels) and more nuanced human evaluation protocols.
- Architecturally, further exploration of efficient AI-accelerated pipelines (e.g., linear transformers, dual-domain diffusion, explicit spectral modeling) is proposed to mitigate computation/accuracy tradeoffs while improving generalization.
UIEB is foundational to current and future research in underwater image enhancement, providing standardized protocols, diverse scenarios, and an evolving corpus of state-of-the-art results. Its impact spans from the objective comparison of algorithms to the shaping of new physical-model-driven and spectrum-aware enhancement methodologies (Schein et al., 27 Jan 2025, Zhang et al., 2024, Bakht et al., 2023, Walia et al., 1 Apr 2025, Zhou et al., 2023, Li et al., 2019).