Frustratingly Simple Few-Shot Object Detection

Published 16 Mar 2020 in cs.CV | (2003.06957v1)

Abstract: Detecting rare objects from a few examples is an emerging problem. Prior works show meta-learning is a promising approach. But, fine-tuning techniques have drawn scant attention. We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods. However, the high variance in the few samples often leads to the unreliability of existing benchmarks. We revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparisons and build new benchmarks based on three datasets: PASCAL VOC, COCO and LVIS. Again, our fine-tuning approach establishes a new state of the art on the revised benchmarks. The code as well as the pretrained models are available at https://github.com/ucbdrive/few-shot-object-detection.

Abstract PDF Upgrade to Chat

Citations (496)

View on Semantic Scholar

Summary

The paper demonstrates that fine-tuning only the final layers can improve accuracy by 2-20 points over complex meta-learning techniques.
The method employs a two-stage process: training a full detector on base classes followed by selective fine-tuning on a balanced set of base and novel classes.
The approach revises evaluation protocols and achieves state-of-the-art results on benchmarks like LVIS, COCO, and PASCAL VOC while reducing variance from limited samples.

Frustratingly Simple Few-Shot Object Detection

This paper, "Frustratingly Simple Few-Shot Object Detection," presents a critical examination of the applicability of fine-tuning techniques in the context of few-shot object detection, significantly contrasting with the traditionally favored meta-learning approaches. The authors highlight a pivotal discovery: fine-tuning merely the last layer of existing object detectors on rare classes can outperform meta-learning methods by 2-20 points on current benchmarks, occasionally doubling the accuracy of previous methods.

Key Contributions

Fine-Tuning Methodology: The paper introduces a two-stage approach that first trains a complete object detector, such as Faster R-CNN, on base classes. Subsequently, only the final layers are fine-tuned on a balanced subset, encompassing both base and novel classes. This results in improved generalization to novel classes while retaining performance on base classes.
Evaluation Revisions: The authors identify critical issues in existing evaluation protocols, noting the high variance due to limited sample sizes, leading to unreliable comparisons. They propose revised protocols, including multiple runs with distinct training samples for stable accuracy estimations, applied to new benchmarks derived from datasets like PASCAL VOC, COCO, and LVIS.
Numerical Performance: On revised benchmarks, their approach established new state-of-the-art results, improving rare class precision on the LVIS dataset by ~4 points and common classes by ~2 points, with negligible loss for frequent classes.

Comparative Analysis

The research methodically compares its approach with previous meta-learning-based methods (e.g., FSRW, Meta R-CNN, MetaDet) by demonstrating superior performance in various few-shot detection tasks. Notably, the introduction of instance-level feature normalization inspired by existing work in few-shot classification contributed significantly to performance improvements.

Implications and Speculations

The implications of this work are twofold. Practically, it offers a more computationally efficient and straightforward alternative in few-shot object detection by leveraging existing detectors through selective fine-tuning. Theoretically, it challenges the prevailing notion that sophisticated meta-learning is invariably superior for few-shot tasks, prompting reconsideration of model complexity versus efficacy in novel class detection.

Future Directions

Given the transformative results presented, future work can explore the integration of this fine-tuning approach with other sophisticated techniques to further enhance few-shot learning capabilities. Additionally, investigating the framework's applicability to broader AI challenges, such as real-time detection tasks or integration with edge computing devices, could yield significant advancements. Moreover, extending this study to include diverse domain-specific datasets can further validate the method's robustness and adaptability.

In conclusion, this study provides a compelling argument for re-evaluating the role of simplicity and selective fine-tuning in developing effective few-shot object detection methods, defying the expectation that complexity and novelty necessarily correlate with performance improvements.