FishTrack23: Underwater Tracking & Classification Dataset
- FishTrack23 is an underwater multi-object tracking and fish identification dataset featuring real-world videos with challenges like low lighting, color casts, and turbidity.
- It employs detailed annotation protocols with 2D bounding boxes and tracking IDs across diverse fish taxa, facilitating rigorous evaluation for detection, tracking, and classification methods.
- The dataset supports comprehensive metrics such as precision, recall, mAP, and MOTA, establishing a robust standard for aquatic perception in robotics and fisheries research.
FishTrack23 is an underwater multi-object tracking and fish identification dataset constructed from real-world marine video recordings. Designed to benchmark object detection, tracking, and fine-grained classification algorithms in challenging aquatic environments, FishTrack23 provides annotated video samples marked by severe visual degradation due to low lighting, significant color casts, and turbidity. The dataset is used extensively as an evaluation standard for machine learning-based aquatic @@@@1@@@@, particularly in the context of robotics and automated fishery research applications (Silva et al., 14 Jan 2026).
1. Dataset Structure and Content
FishTrack23 is organized as an ensemble collection of underwater videos targeting multi-object tracking (MOT) in naturalistic, real-world settings. Video sources are rigorously filtered: sequences containing erroneous or duplicate annotations such as plants or fishing lures, or videos entirely lacking fish, are excluded a priori. From the curated set, annotated frames are subsampled for training and validation via a one-in-twenty sampling strategy, while all annotated frames are retained for testing, supplemented by a 5% subset of unannotated frames to facilitate continuous video reconstruction at inference time.
The dataset is enumerated as follows:
| Split | # Images | Notes |
|---|---|---|
| Training | 5,149 | 1 in 20 annotated frames |
| Validation | 1,098 | 1 in 20 annotated frames |
| Testing | 14,575 | All annotated + 5% unannotated |
Frame resolutions are non-uniform, inherited from original video sources. Annotation density differs by video, with original annotation sampling conducted at either 5 Hz or 10 Hz, yielding test footage durations between 23 and 46 minutes.
2. Environmental Parameters and Visual Challenges
FishTrack23 represents a spectrum of underwater visual degradations:
- Lighting: Consistently low, with substantial red-channel attenuation yielding strong color casts.
- Turbidity: High particulate matter induces scattered light and further visual distortion.
- Variability: The dataset spans numerous environmental conditions, which are not quantified in detail (e.g., exact lux values or scattering coefficients are not reported).
This variety is intended to reflect operational challenges in marine robotics and fish monitoring. No quantitative measures of lighting or turbidity are provided in the public description (Silva et al., 14 Jan 2026).
3. Annotation Protocol and Classes
Object Detection & Tracking
Fish detection and tracking annotations consist of two components applied to every frame containing fish:
- 2D Bounding Boxes: Axis-aligned rectangles demarcate the spatial extent of each observed fish.
- Tracking IDs: Each fish receives a unique identifier, stable across subsequent frames, to enable multi-object tracking evaluations.
Annotation storage is inferred to follow the FishTrack23 official release, typically per-frame lists—either in text or JSON—comprising tuples.
Species Classification
Original video annotations identify 73 distinct fish taxa, but to address class imbalance, fish are grouped into the following labels:
- Lutjanus campechanus
- Micropterus salmoides
- Pagrus pagrus
- "unspecified fish" (aggregating the remaining 70 taxa)
Each bounding box is cropped per frame to form a single-specimen image, used for standalone species classification tasks. Associated metadata includes a (class label, crop image ID) pair; no instance segmentation masks are available.
4. Split Protocols and Usage Scenarios
Detection and Tracking
- Training: 5,149 images (annotated frames, subsampled 1 in 20)
- Validation: 1,098 images (same protocol as training)
- Testing: 14,575 images (full set of annotated and 5% unannotated frames; inference reconstructs the corresponding test video)
- Benchmark Algorithms: Evaluations are reported on YOLOv8m, YOLOv10s for detection and ByteTrack for multi-object tracking.
Classification
Cropped fish images from detection bounding boxes are divided:
- Training: 80% of cropped instances from the union of training and validation splits
- Validation: 20% remainder of same pool
- Testing: Crops generated from all test frames
- Classifier Backbone: YOLOv11s-cls
No cross-validation methodology is reported. Detection is evaluated using static images; tracking is assessed on reconstructed videos via the TrackEval toolkit.
5. Evaluation Metrics
FishTrack23 employs both standard detection and MOT evaluation criteria:
Object Detection
- Precision:
- Recall:
- F1-Score:
- [email protected]: Mean Average Precision at IoU threshold 0.5
- [email protected]:0.95: Mean AP averaged over IoU thresholds in
Tracking
- MOTA: Multiple Object Tracking Accuracy,
- HOTA: Higher Order Tracking Accuracy (aggregates detection and association performance)
- IDF1:
FishTrack23 benchmarking aligns with established COCO and multi-object tracking conventions.
6. Preprocessing and Data Handling
FishTrack23 emphasizes feature fidelity over superficial image enhancement. Frame preprocessing incorporates the following:
- Color Correction: A non-trainable white-balance operation adjusts RGB channel means to the median, standardizing illumination and color balance.
- Frame Sampling: Annotation-driven, as detailed previously.
- Task-specific Cropping: Classification utilizes bounding box crops; detection/tracking models operate on native full-frame images.
- Augmentation: No geometric (flips, rotations) or photometric (jitter, noise) augmentations are reported for FishTrack23. Enhancement is executed in-network by the AquaFeat+ pipeline, comprising color correction and hierarchical feature enhancement modules.
A plausible implication is that improvement in perception tasks on FishTrack23 is attributable not to external data augmentation but to deep end-to-end enhancement methods within the perception model itself.
7. Significance and Applications
FishTrack23 is positioned as a rigorous testbed for underwater vision systems operating under non-ideal, real-world visual regimes. Its use in the AquaFeat+ project demonstrates its centrality for evaluating algorithms intended for deployment in robotic inspection, fisheries monitoring, and marine biodiversity assessment scenarios. The dataset’s annotation density, challenging conditions, and structured protocols support comparative benchmarking across object detection, tracking, and classification domains, corresponding to practical requirements of aquatic perception modules in both research and applied contexts (Silva et al., 14 Jan 2026).