AquaDiff: Diffusion-Based Underwater Image Enhancement for Addressing Color Distortion

Published 15 Dec 2025 in cs.CV | (2512.14760v1)

Abstract: Underwater images are severely degraded by wavelength-dependent light absorption and scattering, resulting in color distortion, low contrast, and loss of fine details that hinder vision-based underwater applications. To address these challenges, we propose AquaDiff, a diffusion-based underwater image enhancement framework designed to correct chromatic distortions while preserving structural and perceptual fidelity. AquaDiff integrates a chromatic prior-guided color compensation strategy with a conditional diffusion process, where cross-attention dynamically fuses degraded inputs and noisy latent states at each denoising step. An enhanced denoising backbone with residual dense blocks and multi-resolution attention captures both global color context and local details. Furthermore, a novel cross-domain consistency loss jointly enforces pixel-level accuracy, perceptual similarity, structural integrity, and frequency-domain fidelity. Extensive experiments on multiple challenging underwater benchmarks demonstrate that AquaDiff provides good results as compared to the state-of-the-art traditional, CNN-, GAN-, and diffusion-based methods, achieving superior color correction and competitive overall image quality across diverse underwater conditions.

Abstract PDF Upgrade to Chat

Summary

The paper introduces AquaDiff, a diffusion-based framework leveraging cross-attention with physics-inspired chromatic priors for underwater image enhancement.
It employs a U-Net variant enriched with residual dense blocks and multi-resolution attention to effectively suppress artifacts and recover fine structural details.
Experimental evaluations show superior UCIQE scores and competitive PSNR/SSIM metrics, demonstrating robust mitigation of wavelength-dependent color distortions.

Diffusion-Based Underwater Image Enhancement with AquaDiff

Introduction

AquaDiff introduces a conditional diffusion-based approach for underwater image enhancement, specifically targeting the mitigation of wavelength-dependent color distortion while maintaining perceptual and structural fidelity. Underwater imagery is uniquely challenging due to complex physical phenomena such as selective light attenuation and multi-path scattering, which significantly impair the performance of CV systems. Traditional methods, both model-free and physically inspired, and recent data-driven CNN/GAN-based models each have notable limitations regarding generalization, color fidelity, and artifact suppression under severe and diverse degradation. Diffusion models, leveraging strong generative priors and iterative denoising, offer a compelling foundation for robust enhancement but have yet to be fully adapted for underwater-specific degradations.

AquaDiff Framework

AquaDiff fundamentally employs a DDPM-inspired architecture, integrating unique mechanisms tailored for underwater scenarios. The central framework is depicted in (Figure 1).

Figure 1: Overview of the AquaDiff framework, illustrating the interplay between forward diffusion, chromatic prior conditioning, and reverse denoising via cross-attention.

The forward diffusion adds Gaussian noise to clean reference images over $T$ steps, culminating in highly noisy latent representations. The reverse process leverages a conditional diffusion model, where at each denoising step, the model receives the current noisy image, a chromatic prior-guided conditioning image, and the timestep index. The core denoising backbone is a U-Net variant with three residual dense blocks, rich skip connectivity, and multi-resolution spatial attention, facilitating hierarchical feature extraction and global-local context merging.

Critically, conditioning is performed via cross-attention—eschewing direct concatenation—where degradations in the chromatic prior are dynamically fused with the evolving noisy state. This cross-attention enables timestep-dependent, spatially-selective conditioning, optimally leveraging the color-compensated prior for detail and context recovery at all noise levels.

Chromatic Prior-Guided Conditioning

The chromatic prior is generated using a physics-inspired three-channel color compensation (3C) method, operating in Lab color space to suppress color casts by reconstructing attenuated chromatic channels via spatial masking and Gaussian smoothing. The mask is adaptively generated to avoid overcompensation near highlights and is merged back into the input for cross-attention-based guidance. This preprocessing crucially embeds wavelength attenuation statistics into the model's conditioning signal, aligning the restoration process with the physical characteristics of underwater image formation.

Diffusion and Denoising Network

The forward process introduces progressive, schedule-controlled noise. Sampling at arbitrary timesteps is performed analytically using closed-form marginalization. The reverse process iteratively reconstructs clean images by predicting and removing additive noise, proceeding from the latent Gaussian through cross-attention fusion with the color-compensated prior.

Key architectural elements include:

Residual Dense Blocks: Enhanced feature propagation and improved gradient flow, facilitating recovery of decayed structural semantics.
Dense Skip Connections: Inspired by U-Net++, enabling multi-level information sharing and preventing bottleneck artifacts.
Multi-Resolution Attention: Explicit attention at $16 \times 16$ and $32 \times 32$ feature maps augments the network's capacity to reconcile large-scale color drift with local texture attenuation.

The model is conditioned on both the noisy latent and the color-compensated prior at every denoising step, refining the enhancement with spatially and temporally adaptive attention.

Cross-Domain Consistency Loss

AquaDiff introduces a cross-domain consistency loss (CDCL) that enforces fidelity in pixel, perceptual, structural, and frequency domains. The loss comprises:

$\ell_1$ Pixel and Multi-Scale Losses: Enforce accurate local- and global-level reconstruction.
VGG-19 Deep Perceptual Loss: Encourages restoration of high-level semantic consistency.
SSIM Loss: Preserves luminance, contrast, and structural similarity, mitigating geometric artifacts.
Frequency-Domain Loss: Enforces recovery of high-frequency components (e.g., edges, textures) typically attenuated under scattering.

The hybrid CDC loss robustly constrains the generative process to suppress diffusion artifacts, suppresses over-smoothing, and incentivizes realistic color and detail recovery.

Experimental Evaluation

Datasets and Implementation

Training utilizes the LSUI (5,004 pairs) and UIEB (800 pairs) datasets, each comprising highly varied underwater imagery. Testing spans TEST-U90 (90 images), U45, S16, and C60 datasets, ensuring generalization across unseen environments and degradations. The model is implemented in PyTorch, uses 2000 diffusion steps, and is trained using Adam for 1 million iterations.

Quantitative Results

Evaluation considers both full-reference (PSNR, SSIM) and no-reference (UIQM, UCIQE) metrics. Comparative analysis involves state-of-the-art traditional, CNN-, GAN-, and diffusion-based methods.

U45: AquaDiff achieves a UCIQE score of 0.5390 (highest among all methods), indicating superior chromatic correction, with UIQM at 4.6097 (second-tier, highly competitive).
S16: UCIQE 0.5243 (highest), UIQM 4.4385.
C60: Again, UCIQE 0.5176 (highest), UIQM competitive.
TEST-U90: PSNR and SSIM approaching or matching best-in-class CNN/GAN/diffusion models.
Figure 1: Overview of the AquaDiff framework and its iterative denoising and conditioning stages.

(Figure 2)

Figure 2: Quantitative results for UIQM and UCIQE across major benchmarks, showing top ranking for AquaDiff in chromatic fidelity and competitive/restoration quality.

AquaDiff's improvements are particularly pronounced in UCIQE, reflecting its impact on color balance and perceived naturalness—critical for underwater visual tasks.

Qualitative Results

Qualitative analysis on U90, U45, S16, and C60 datasets reveals that AquaDiff:

Consistently restores color balance in blue- and green-dominated scenes.
Effectively removes haze and veiling, recovers structural details even in extreme turbidity.
Suppresses artifacts common in GAN- and CNN-based outputs (halos, banding, over-enhancement).

(Figure 3)

Figure 3: Visual comparison on U90, demonstrating superior haze removal, artifact suppression, and color recovery by AquaDiff.

Additional results on U45 and S16 corroborate robust performance in scenarios with artificial lighting, strong scattering, and depth-induced chromatic distortion.

Ablation Studies

Systematic ablation reveals:

Removing the cross-domain consistency loss results in significant degradation in both UIQM and UCIQE.
Excluding enhanced U-Net blocks or multi-resolution attention consistently lowers structural and color restoration accuracy.

(Table 1)

Model Variant	UIQM	UCIQE
Baseline Diffusion	4.12	0.486
+ CDCL Only	4.38	0.521
+ Enhanced U-Net Only	4.45	0.528
AquaDiff (Full)	4.61	0.539

Table 1: Enhancement contributions of AquaDiff components.

Implications and Future Directions

AquaDiff’s strong performance on challenging, real-world datasets demonstrates the efficacy of integrating physical priors, cross-attention conditioning, and hybrid loss design into the generative diffusion paradigm for underwater image enhancement. The method establishes the value of explicit architectural and loss-driven biases against hydro-optical distortions and suggests broad potential for furthering physically-informed diffusion models in other low-level vision domains (e.g., dehazing, deblurring).

Practical implications are clear for real-time underwater robotics, object detection, SLAM, and 3D mapping—where enhanced input fidelity directly affects downstream algorithm robustness and reliability. The design of AquaDiff can inform future work in multi-modal diffusion conditioning, color/frequency domain regularization, and architecture adaptation for deployment efficiency (e.g., fast sampling, reduced resolutions).

Potential extensions include joint enhancement-task adaptation (e.g., simultaneous image enhancement and detection), self-supervised adaptation to unseen underwater environments, and multi-sensor fusion for domain transfer.

Conclusion

AquaDiff presents a technically rigorous, physically-guided, and empirically validated framework for underwater image enhancement, demonstrating dominance in color fidelity and competitive overall quality. By leveraging cross-attention conditioning via chromatic priors, residual-attention U-Net backbones, and cross-domain consistency losses, it advances the application of diffusion models to complex, real-world vision enhancement tasks. The results further solidify the role of diffusion-based architectures in mission-critical underwater visual applications and provide a foundation for future research in this direction.