- The paper introduces TINet, which employs transformation invariance and a strong baseline with FPN and depth-wise convolution for improved few-shot object detection in remote sensing images.
- It demonstrates significant performance gains in average precision and robustness across challenging datasets like DIOR, NWPU VHR-10.v2, and HRRSD.
- Extensive ablation studies validate the effectiveness of various transformation methods and consistency regularizations, paving the way for more reliable detection outcomes.
Introduction
Few-shot object detection (FSOD) in remote sensing images (RSIs) faces significant challenges due to the large-scale and orientation variations of objects. Traditional object detection methods require extensive annotated data which is often impractical for RSIs given their diverse categories and class imbalances. This research paper proposes a novel FSOD approach leveraging transformation invariance to address these issues effectively.
Strong Baseline Approach
The paper introduces a modified FSOD method termed as 'Strong Baseline' which integrates a Feature Pyramid Network (FPN) to adapt to scale variations effectively. This approach improves upon existing FSOD methods by using prototype features to enhance query features, thereby facilitating better adaptability to RSIs.
Key Components of the Strong Baseline:
- Feature Pyramid Network (FPN): Enables multi-scale feature extraction, crucial for handling scale variations prevalent in RSIs.
- Depth-wise Convolution: Enhances the interaction between query features and support features, promoting better adaptation to diverse object scales.
- Increased IoU Threshold: Adjusts the non-maximum suppression IoU threshold from 0.7 to 0.9 in the region proposal network, preventing removal of bounding boxes due to mistakes in novel categories.




Figure 1: (a) Number of object instances per class in the DIOR dataset. (b-e) Comparison of the detection results of the Strong Baseline and TINet.
The TINet is introduced to handle spatial misalignments caused by orientation variations. This network applies transformations to the query image to ensure geometric invariance, achieving consistent bounding box predictions across varying poses.
TINet Architecture:
- Transformation of Inputs: The network processes both the original query image and a transformed version to enforce consistency in predictions.
- Consistency Loss: Incorporates both classification and regression consistency losses, ensuring alignment of spatial features between query and support branches.
- Depth-wise Convolution: Used for effective feature aggregation, making TINet robust against orientation perturbations.
Experimental Evaluation
Extensive experiments were conducted on the NWPU VHR-10.v2, DIOR, and HRRSD datasets. The TINet demonstrated state-of-the-art performance in few-shot object detection across all datasets.
Key Findings:
- Performance Gains: TINet achieved remarkable improvements in average precision (AP) metrics compared to existing FSOD methods, especially under challenging conditions with substantial orientation variations.
- Robustness: TINet consistently outperformed other methods, showcasing superior adaptability to varying object scales and orientations.

Figure 2: Comparison of confusion matrix between the Strong Baseline and TINet. (a) Confusion matrix of the Strong Baseline. (b) Confusion matrix of TINet.
Ablation Studies
The effectiveness of various components was validated through ablation studies. Alternative transformations and consistency regularizations were examined to highlight their impact on performance.
- Transformation Methods: Diagonal flipping was found to be more effective than horizontal or vertical flips due to less distortion of object appearance.
- Consistency Regularization: L2​ loss outperformed JSD and KLD regularizations in enforcing prediction consistency, particularly beneficial in low-shot settings.
Conclusion
The proposed TINet effectively addresses the limitations of conventional FSOD approaches in RSIs by offering enhanced feature alignment through transformation invariance. The network's architecture ensures robust performance even in complex scenarios with significant scale and orientation variations. Future work may explore additional geometric transformations to further improve detection outcomes.
By advancing FSOD in RSIs, the TINet paves the way for more efficient and accurate remote sensing data analysis, crucial for applications like environmental monitoring and resource surveys.