AquaFeat+: Underwater Vision Enhancement
- AquaFeat+ is an underwater vision enhancement pipeline that improves machine-oriented visual perception in aquatic environments.
- It integrates a modular workflow involving deterministic color correction, hierarchical feature enhancement, and adaptive residual outputs tailored for downstream tasks.
- Benchmark results on the FishTrack23 dataset show improved metrics for object detection, classification, and tracking over prior methods.
AquaFeat+ is an underwater vision learning-based enhancement pipeline designed to boost the performance of automated vision tasks such as object detection, classification, and tracking in challenging aquatic environments. It operates upstream of standard computer vision architectures—specifically targeting machine-oriented feature quality rather than human perceptual fidelity. AquaFeat+ is composed of a modular sequence including color correction, hierarchical feature enhancement, and an adaptive residual output, all trained end-to-end according to the final application’s loss. Benchmark results on the FishTrack23 dataset demonstrate substantial improvements over prior methods and raw backbones, establishing AquaFeat+ as one of the most effective current approaches for underwater robotic visual perception (Silva et al., 14 Jan 2026).
1. System Architecture and Pipeline Integration
AquaFeat+ is architected as a plug-and-play module that preprocesses raw underwater video frames before input to downstream vision networks (e.g., YOLOv8, YOLOv11s-cls, ByteTrack). Its core workflow comprises three main modules:
- Color Correction: A deterministic white-balance adjustment that equalizes channel-wise intensities to reduce color cast in underwater imagery.
- Hierarchical Feature Enhancement: Multi-scale processing via an Underwater-Feature Enhancement Network (U-FEN), supported by a Global-Scale Attention Module (GSAM) for spatial and contextual feature integration.
- Adaptive Residual Output: This network branch produces a residual that is added to the original image, learning corrections and enhancements that complement the downstream perception losses.
The typical dataflow is:
1 2 3 4 5 6 7 8 9 |
Raw Frame ↓ Color Correction White-balanced Image ↓ Three-scale U-FEN ↓ GSAM ↓ Concatenation & 3×3 Conv ↓ SpecialConv + tanh (Residual) ↓ Add to Original Image Enhanced Frame → Vision Backbone |
2. Module Formulations and Mathematical Operations
AquaFeat+ uses specific mathematical operations in each processing stage:
2.1 Color Correction
For an RGB input frame , with channel means , the target median is:
Each channel is rescaled:
This non-learnable transformation normalizes color bias (Silva et al., 14 Jan 2026).
2.2 Hierarchical Feature Enhancement
Three downsampled versions (1×, ¼×, 1/8×) of are processed by U-FEN (six conv layers with SpecialConv, LeakyReLU, and skip connections). At each encoding stage :
with contrast-aware SpecialConv generating adaptive channel multipliers.
2.2.1 Global-Scale Attention Module (GSAM)
- Global Feature-Aware (GFA): Computes softmax-based spatial attention.
- Scale-Aware Feature Aggregation (SAFA): Cross-attention blends features across scales.
The GSAM output:
2.2.2 Feature Aggregation
Concatenate upsampled $1/8×$ stream with and pass through a conv:
2.3 Adaptive Residual Output
A SpecialConv layer followed by produces the residual:
Enhanced output:
The network is trained such that yields maximum downstream task performance (Silva et al., 14 Jan 2026).
3. End-to-End Supervision and Loss Formulation
When paired with detection, classification, or tracking modules, AquaFeat+ is optimized end-to-end:
For single-task setups, only the relevant is nonzero. This direct supervision targets features salient to downstream perception rather than merely human-pleasing restoration (Silva et al., 14 Jan 2026).
4. Empirical Performance and Benchmark Results
AquaFeat+ achieves consistent improvements across standard underwater vision tasks, as summarized in the tables below (FishTrack23 dataset).
Table 1. Object Detection (YOLOv8m backbone, test split):
| Method | Precision | Recall | F1-Score | mAP50 | mAP50-95 |
|---|---|---|---|---|---|
| YOLOv8m | 0.792 | 0.582 | 0.677 | 0.528 | 0.319 |
| FeatEnHancer (YOLOv8m) | 0.753 | 0.582 | 0.657 | 0.515 | 0.293 |
| AquaFeat (YOLOv8m) | 0.746 | 0.624 | 0.680 | 0.554 | 0.332 |
| AquaFeat+ (YOLOv8m) | 0.767 | 0.624 | 0.688 | 0.556 | 0.332 |
Table 2. Classification (YOLOv11s-cls on cropped boxes):
| Method | Precision | Recall | Accuracy | F1-Score |
|---|---|---|---|---|
| YOLOv11s-cls | 0.723 | 0.764 | 0.764 | 0.737 |
| FeatEnHancer | 0.746 | 0.779 | 0.779 | 0.752 |
| ConvNeXt | 0.716 | 0.619 | 0.862 | 0.646 |
| AquaFeat | 0.798 | 0.765 | 0.765 | 0.766 |
| AquaFeat+ | 0.816 | 0.791 | 0.791 | 0.791 |
Table 3. Tracking (ByteTrack, FishTrack23 test):
| Method | HOTA | MOTA | DetA | AssA | IDF1 |
|---|---|---|---|---|---|
| YOLOv8m | 52.75 | 53.78 | 51.42 | 54.41 | 65.10 |
| FeatEnHancer (8m) | 47.48 | 37.23 | 41.15 | 54.97 | 59.42 |
| AquaFeat (8m) | 54.72 | 55.76 | 50.93 | 59.14 | 68.41 |
| AquaFeat+ (8m) | 54.20 | 54.97 | 50.11 | 58.90 | 67.63 |
| AquaFeat+ (10s) | 55.21 | 55.01 | 50.90 | 60.19 | 68.09 |
AquaFeat+ consistently leads in F1-Score, classification F1, and HOTA, with competitive IDF1 values (Silva et al., 14 Jan 2026).
5. Comparative and Ablation Studies
Comparative analysis shows that AquaFeat+ surpasses both baseline backbones and prior enhancement modules, including the original AquaFeat and FeatEnHancer. Module ablations highlighted the critical role of:
- Color Correction: Its removal led to a recall reduction of approximately 4%.
- GSAM (Global-Scale Attention): Its omission resulted in a mAP50 drop of approximately 0.02 and HOTA decrease of ~1.5.
- Adaptive Residual Output: Skipping the residual and outputting directly from degraded detection F1 by ~0.01.
Each component thus provides a measurable performance benefit (Silva et al., 14 Jan 2026).
6. Limitations, Strengths, and Future Directions
Strengths
- Task-Oriented Learning: The enhancement adaptively focuses on features that benefit automated perception tasks, not human visual preference.
- Modularity: Compatible with a wide range of modern backbones and tracking heads.
- Computational Efficiency: Insertion of only a minimal number of lightweight convolutional layers supports deployment in real-time or near real-time systems.
Limitations
- The fixed color correction may be insufficient in extreme color-cast scenes.
- Evaluation focused on fish-tracking; generalization to diverse underwater objects remains to be established.
Future Work
The development roadmap includes integrating a learnable color-correction layer, extending AquaFeat+ to depth estimation and semantic segmentation, and expanding evaluation to new, varied datasets (Silva et al., 14 Jan 2026).
AquaFeat+ represents a carefully integrated solution for machine-oriented enhancement in underwater vision pipelines. Its empirical gains across detection, classification, and tracking tasks result from aligning feature enhancement to the needs of downstream perception modules rather than low-level reconstructions, enabling more robust autonomous robotic operation in aquatic environments (Silva et al., 14 Jan 2026).