- The paper introduces a Gradient-Guided Learning Network (GGL-Net) that uses gradient magnitude images to improve edge detection in infrared small target detection.
- The dual-branch feature extraction and Two-Way Guided Fusion Module combine spatial and channel attention to effectively integrate multi-scale details.
- Experimental results on NUAA-SIRST and NUDT-SIRST datasets demonstrate significant IoU gains and reduced false alarm rates compared to existing state-of-the-art methods.
Gradient-Guided Learning Network for Infrared Small Target Detection
Introduction
Infrared small target detection is a challenging task due to the inherent lack of distinguishable features, low signal-to-noise ratio, complicated background clutter, and limited labeled data. Existing deep and non-deep learning methods suffer from inaccurate localization, edge ambiguity, and susceptibility to background interference. The proposed Gradient-Guided Learning Network (GGL-Net) directly addresses these deficiencies by introducing a unique gradient magnitude image stream and designing a novel architecture that emphasizes edge fidelity and robust multi-scale fusion.
Methodology
Gradient-Guided Learning Paradigm
GGL-Net pioneers the explicit use of gradient magnitude images in deep learning-based infrared small target detection. This approach enhances the perceptibility of edge structures, thereby mitigating the problem of inaccurate boundary localization that is prevalent in CNN-based methods focused on textured feature extraction.
The network architecture employs a dual-branch design: the main branch processes the original infrared image, while a supplementary branch processes gradient magnitude images. The Gradient Supplementary Module (GSM) encodes raw gradient information into progressively deeper network layers, using a combination of G_Block and residual connections. The main branch comprises five stages, each consisting of deep convolutional stacks and SE attention modules to optimize channel dependencies, with a compression ratio set to 4.
Two-Way Guided Fusion Module (TGFM)
TGFM integrates multi-scale features more effectively using spatial and channel attention mechanisms. Low-level features provide spatial guidance and detail information to high-level features via SAM, while high-level features supply semantic enrichment and channel guidance to low-level features via CAM. This bi-directional fusion enables comprehensive feature integration, improving both semantic coherence and detail preservation.
Loss Function
SoftIoU loss is used to address the severe class imbalance between background and target pixels, optimizing segmentation accuracy for small target regions.
Experimental Evaluation
Datasets and Metrics
The performance of GGL-Net is benchmarked on two datasets:
- NUAA-SIRST (real, 427 images, 96 for testing, resized to 512×512)
- NUDT-SIRST (synthetic, multiple backgrounds, varying splits, resized to 256×256)
Evaluation metrics include IoU, normalized IoU (nIoU), probability of detection (Pd), false alarm rate (Fa), and 3D ROC analysis.
Ablation Studies
Extensive ablation analyses validate each architectural component:
- The dual-branch feature extraction mechanism shows that the combination of original and gradient magnitude images is superior, with "Original+Gradient" improving IoU by 1.98% over a single branch.
- GSM with residual structure outperforms simple addition, boosting IoU by 1.28%.
- TGFM with two-way attention increases IoU by 0.99% compared to direct addition.
Comparative Results
On NUAA-SIRST, GGL-Net outperforms eight state-of-the-art detectors:
- IoU/nIoU improvements over top methods (DNA-Net, ALCL-Net, MLCL-Net, ALCNet) range from 2.78% to 7.53% in IoU and up to 7.97% in nIoU.
- Inference speed is competitive, being 3.5x faster than recent DNA-Net with superior accuracy.
On NUDT-SIRST, GGL-Net yields the highest accuracy in both 1:1 and 7:3 splits:
- For 1:1, IoU increases by 1.54% and nIoU by 1.41% over ALCL-Net.
- For 7:3, both IoU and nIoU improve by 1.73%.
- False alarm rates are substantially reduced.
Implications and Future Directions
The introduction of gradient magnitude imagery within a deep network framework constitutes a distinct architectural innovation for small target detection, directly addressing edge accuracy challenges. The dual-branch design and TGFM establish new baselines for both single-frame segmentation accuracy and efficient semantic-detail integration. Practically, this has ramifications for precision-critical applications such as military tracking, infrared guidance, and airborne early warning, where reliable detection of targets amidst noise and clutter are paramount.
Theoretically, the success of gradient-guided feature propagation suggests broader applicability for multi-modal approaches in low-feature, high-clutter scenarios. Future work may explore:
- Extension to temporal data for multi-frame tracking.
- Adaptive attention mechanisms for dynamic background suppression.
- Generalization to other sensing modalities where edge information is critical but intrinsic features are weak.
Conclusion
GGL-Net advances infrared small target detection by leveraging gradient magnitude image streams, dual-branch feature extraction, and two-way guided multi-scale fusion. Comprehensive experiments confirm state-of-the-art performance in both real and synthetic environments, with significant improvements in edge localization and overall detection rates. This architecture offers a robust foundation for further development in both research and real-world deployment of detection systems (2512.09497).