Gradient-Guided Learning Network for Infrared Small Target Detection

Published 10 Dec 2025 in cs.CV | (2512.09497v1)

Abstract: Recently, infrared small target detection has attracted extensive attention. However, due to the small size and the lack of intrinsic features of infrared small targets, the existing methods generally have the problem of inaccurate edge positioning and the target is easily submerged by the background. Therefore, we propose an innovative gradient-guided learning network (GGL-Net). Specifically, we are the first to explore the introduction of gradient magnitude images into the deep learning-based infrared small target detection method, which is conducive to emphasizing the edge details and alleviating the problem of inaccurate edge positioning of small targets. On this basis, we propose a novel dual-branch feature extraction network that utilizes the proposed gradient supplementary module (GSM) to encode raw gradient information into deeper network layers and embeds attention mechanisms reasonably to enhance feature extraction ability. In addition, we construct a two-way guidance fusion module (TGFM), which fully considers the characteristics of feature maps at different levels. It can facilitate the effective fusion of multi-scale feature maps and extract richer semantic information and detailed information through reasonable two-way guidance. Extensive experiments prove that GGL-Net has achieves state-of-the-art results on the public real NUAA-SIRST dataset and the public synthetic NUDT-SIRST dataset. Our code has been integrated into https://github.com/YuChuang1205/MSDA-Net

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a Gradient-Guided Learning Network (GGL-Net) that uses gradient magnitude images to improve edge detection in infrared small target detection.
The dual-branch feature extraction and Two-Way Guided Fusion Module combine spatial and channel attention to effectively integrate multi-scale details.
Experimental results on NUAA-SIRST and NUDT-SIRST datasets demonstrate significant IoU gains and reduced false alarm rates compared to existing state-of-the-art methods.

Gradient-Guided Learning Network for Infrared Small Target Detection

Introduction

Infrared small target detection is a challenging task due to the inherent lack of distinguishable features, low signal-to-noise ratio, complicated background clutter, and limited labeled data. Existing deep and non-deep learning methods suffer from inaccurate localization, edge ambiguity, and susceptibility to background interference. The proposed Gradient-Guided Learning Network (GGL-Net) directly addresses these deficiencies by introducing a unique gradient magnitude image stream and designing a novel architecture that emphasizes edge fidelity and robust multi-scale fusion.

Methodology

Gradient-Guided Learning Paradigm

GGL-Net pioneers the explicit use of gradient magnitude images in deep learning-based infrared small target detection. This approach enhances the perceptibility of edge structures, thereby mitigating the problem of inaccurate boundary localization that is prevalent in CNN-based methods focused on textured feature extraction.

Dual-Branch Feature Extraction Network

The network architecture employs a dual-branch design: the main branch processes the original infrared image, while a supplementary branch processes gradient magnitude images. The Gradient Supplementary Module (GSM) encodes raw gradient information into progressively deeper network layers, using a combination of G_Block and residual connections. The main branch comprises five stages, each consisting of deep convolutional stacks and SE attention modules to optimize channel dependencies, with a compression ratio set to 4.

Two-Way Guided Fusion Module (TGFM)

TGFM integrates multi-scale features more effectively using spatial and channel attention mechanisms. Low-level features provide spatial guidance and detail information to high-level features via SAM, while high-level features supply semantic enrichment and channel guidance to low-level features via CAM. This bi-directional fusion enables comprehensive feature integration, improving both semantic coherence and detail preservation.

Loss Function

SoftIoU loss is used to address the severe class imbalance between background and target pixels, optimizing segmentation accuracy for small target regions.

Experimental Evaluation

Datasets and Metrics

The performance of GGL-Net is benchmarked on two datasets:

NUAA-SIRST (real, 427 images, 96 for testing, resized to 512×512)
NUDT-SIRST (synthetic, multiple backgrounds, varying splits, resized to 256×256)

Evaluation metrics include IoU, normalized IoU (nIoU), probability of detection (Pd), false alarm rate (Fa), and 3D ROC analysis.

Ablation Studies

Extensive ablation analyses validate each architectural component:

The dual-branch feature extraction mechanism shows that the combination of original and gradient magnitude images is superior, with "Original+Gradient" improving IoU by 1.98% over a single branch.
GSM with residual structure outperforms simple addition, boosting IoU by 1.28%.
TGFM with two-way attention increases IoU by 0.99% compared to direct addition.

Comparative Results

On NUAA-SIRST, GGL-Net outperforms eight state-of-the-art detectors:

IoU/nIoU improvements over top methods (DNA-Net, ALCL-Net, MLCL-Net, ALCNet) range from 2.78% to 7.53% in IoU and up to 7.97% in nIoU.
Inference speed is competitive, being 3.5x faster than recent DNA-Net with superior accuracy.

On NUDT-SIRST, GGL-Net yields the highest accuracy in both 1:1 and 7:3 splits:

For 1:1, IoU increases by 1.54% and nIoU by 1.41% over ALCL-Net.
For 7:3, both IoU and nIoU improve by 1.73%.
False alarm rates are substantially reduced.

Implications and Future Directions

The introduction of gradient magnitude imagery within a deep network framework constitutes a distinct architectural innovation for small target detection, directly addressing edge accuracy challenges. The dual-branch design and TGFM establish new baselines for both single-frame segmentation accuracy and efficient semantic-detail integration. Practically, this has ramifications for precision-critical applications such as military tracking, infrared guidance, and airborne early warning, where reliable detection of targets amidst noise and clutter are paramount.

Theoretically, the success of gradient-guided feature propagation suggests broader applicability for multi-modal approaches in low-feature, high-clutter scenarios. Future work may explore:

Extension to temporal data for multi-frame tracking.
Adaptive attention mechanisms for dynamic background suppression.
Generalization to other sensing modalities where edge information is critical but intrinsic features are weak.

Conclusion

GGL-Net advances infrared small target detection by leveraging gradient magnitude image streams, dual-branch feature extraction, and two-way guided multi-scale fusion. Comprehensive experiments confirm state-of-the-art performance in both real and synthetic environments, with significant improvements in edge localization and overall detection rates. This architecture offers a robust foundation for further development in both research and real-world deployment of detection systems (2512.09497).

Markdown Report Issue