AquaFeat+: Underwater Vision Enhancement

Updated 21 January 2026

AquaFeat+ is an underwater vision enhancement pipeline that improves machine-oriented visual perception in aquatic environments.
It integrates a modular workflow involving deterministic color correction, hierarchical feature enhancement, and adaptive residual outputs tailored for downstream tasks.
Benchmark results on the FishTrack23 dataset show improved metrics for object detection, classification, and tracking over prior methods.

AquaFeat+ is an underwater vision learning-based enhancement pipeline designed to boost the performance of automated vision tasks such as object detection, classification, and tracking in challenging aquatic environments. It operates upstream of standard computer vision architectures—specifically targeting machine-oriented feature quality rather than human perceptual fidelity. AquaFeat+ is composed of a modular sequence including color correction, hierarchical feature enhancement, and an adaptive residual output, all trained end-to-end according to the final application’s loss. Benchmark results on the FishTrack23 dataset demonstrate substantial improvements over prior methods and raw backbones, establishing AquaFeat+ as one of the most effective current approaches for underwater robotic visual perception (Silva et al., 14 Jan 2026).

1. System Architecture and Pipeline Integration

AquaFeat+ is architected as a plug-and-play module that preprocesses raw underwater video frames before input to downstream vision networks (e.g., YOLOv8, YOLOv11s-cls, ByteTrack). Its core workflow comprises three main modules:

Color Correction: A deterministic white-balance adjustment that equalizes channel-wise intensities to reduce color cast in underwater imagery.
Hierarchical Feature Enhancement: Multi-scale processing via an Underwater-Feature Enhancement Network (U-FEN), supported by a Global-Scale Attention Module (GSAM) for spatial and contextual feature integration.
Adaptive Residual Output: This network branch produces a residual that is added to the original image, learning corrections and enhancements that complement the downstream perception losses.

The typical dataflow is:

Raw Frame
   ↓ Color Correction
White-balanced Image
   ↓ Three-scale U-FEN
   ↓ GSAM
   ↓ Concatenation & 3×3 Conv
   ↓ SpecialConv + tanh (Residual)
   ↓ Add to Original Image
Enhanced Frame → Vision Backbone

All modules except the color correction are trained jointly with the end-task network's parameters (Silva et al., 14 Jan 2026).

2. Module Formulations and Mathematical Operations

AquaFeat+ uses specific mathematical operations in each processing stage:

2.1 Color Correction

For an RGB input frame $I \in \mathbb{R}^{H \times W \times 3}$ , with channel means $\mu_R, \mu_G, \mu_B$ , the target median $m$ is:

$m = \mathrm{median}(\mu_R, \mu_G, \mu_B)$

Each channel is rescaled:

$s_c = m / \mu_c, \quad I^{corr}(x,y,c) = s_c \cdot I(x,y,c)$

This non-learnable transformation normalizes color bias (Silva et al., 14 Jan 2026).

2.2 Hierarchical Feature Enhancement

Three downsampled versions (1×, ¼×, 1/8×) of $I^{corr}$ are processed by U-FEN (six conv layers with SpecialConv, LeakyReLU, and skip connections). At each encoding stage $\ell$ :

$F_\ell = \phi_\ell(F_{\ell-1})$

with contrast-aware SpecialConv generating adaptive channel multipliers.

2.2.1 Global-Scale Attention Module (GSAM)

Global Feature-Aware (GFA): Computes softmax-based spatial attention.
Scale-Aware Feature Aggregation (SAFA): Cross-attention blends features across scales.

The GSAM output:

$F^{GSAM} = F + F_G + F_S$

2.2.2 Feature Aggregation

Concatenate upsampled $1/8×$ stream with $F^{GSAM}$ and pass through a $3 \times 3$ conv:

$F^{cat} = \mathrm{concat}(F^{GSAM},\,\mathrm{upsample}(F_L^{1/8}))$

$F^{agg} = \psi(F^{cat})$

2.3 Adaptive Residual Output

A SpecialConv layer followed by $\tanh$ produces the residual:

$R = \tanh(\sigma(F^{agg}))$

Enhanced output:

$I^{enh} = I + R$

The network is trained such that $I^{enh}$ yields maximum downstream task performance (Silva et al., 14 Jan 2026).

3. End-to-End Supervision and Loss Formulation

When paired with detection, classification, or tracking modules, AquaFeat+ is optimized end-to-end:

$\mathcal{L}_{total} = \lambda_{det}\mathcal{L}_{det} + \lambda_{cls}\mathcal{L}_{cls} + \lambda_{track}\mathcal{L}_{track}$

For single-task setups, only the relevant $\lambda$ is nonzero. This direct supervision targets features salient to downstream perception rather than merely human-pleasing restoration (Silva et al., 14 Jan 2026).

4. Empirical Performance and Benchmark Results

AquaFeat+ achieves consistent improvements across standard underwater vision tasks, as summarized in the tables below (FishTrack23 dataset).

Table 1. Object Detection (YOLOv8m backbone, test split):

Method	Precision	Recall	F1-Score	mAP50	mAP50-95
YOLOv8m	0.792	0.582	0.677	0.528	0.319
FeatEnHancer (YOLOv8m)	0.753	0.582	0.657	0.515	0.293
AquaFeat (YOLOv8m)	0.746	0.624	0.680	0.554	0.332
AquaFeat+ (YOLOv8m)	0.767	0.624	0.688	0.556	0.332

Table 2. Classification (YOLOv11s-cls on cropped boxes):

Method	Precision	Recall	Accuracy	F1-Score
YOLOv11s-cls	0.723	0.764	0.764	0.737
FeatEnHancer	0.746	0.779	0.779	0.752
ConvNeXt	0.716	0.619	0.862	0.646
AquaFeat	0.798	0.765	0.765	0.766
AquaFeat+	0.816	0.791	0.791	0.791

Table 3. Tracking (ByteTrack, FishTrack23 test):

Method	HOTA	MOTA	DetA	AssA	IDF1
YOLOv8m	52.75	53.78	51.42	54.41	65.10
FeatEnHancer (8m)	47.48	37.23	41.15	54.97	59.42
AquaFeat (8m)	54.72	55.76	50.93	59.14	68.41
AquaFeat+ (8m)	54.20	54.97	50.11	58.90	67.63
AquaFeat+ (10s)	55.21	55.01	50.90	60.19	68.09

AquaFeat+ consistently leads in F1-Score, classification F1, and HOTA, with competitive IDF1 values (Silva et al., 14 Jan 2026).

5. Comparative and Ablation Studies

Comparative analysis shows that AquaFeat+ surpasses both baseline backbones and prior enhancement modules, including the original AquaFeat and FeatEnHancer. Module ablations highlighted the critical role of:

Color Correction: Its removal led to a recall reduction of approximately 4%.
GSAM (Global-Scale Attention): Its omission resulted in a mAP50 drop of approximately 0.02 and HOTA decrease of ~1.5.
Adaptive Residual Output: Skipping the residual and outputting directly from $F^{agg}$ degraded detection F1 by ~0.01.

Each component thus provides a measurable performance benefit (Silva et al., 14 Jan 2026).

6. Limitations, Strengths, and Future Directions

Strengths

Task-Oriented Learning: The enhancement adaptively focuses on features that benefit automated perception tasks, not human visual preference.
Modularity: Compatible with a wide range of modern backbones and tracking heads.
Computational Efficiency: Insertion of only a minimal number of lightweight convolutional layers supports deployment in real-time or near real-time systems.

Limitations

The fixed color correction may be insufficient in extreme color-cast scenes.
Evaluation focused on fish-tracking; generalization to diverse underwater objects remains to be established.

Future Work

The development roadmap includes integrating a learnable color-correction layer, extending AquaFeat+ to depth estimation and semantic segmentation, and expanding evaluation to new, varied datasets (Silva et al., 14 Jan 2026).

AquaFeat+ represents a carefully integrated solution for machine-oriented enhancement in underwater vision pipelines. Its empirical gains across detection, classification, and tracking tasks result from aligning feature enhancement to the needs of downstream perception modules rather than low-level reconstructions, enabling more robust autonomous robotic operation in aquatic environments (Silva et al., 14 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

AquaFeat+: an Underwater Vision Learning-based Enhancement Method for Object Detection, Classification, and Tracking (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AquaFeat+.