Improving Object Detection via Local-global Contrastive Learning
Abstract: Visual domain gaps often impact object detection performance. Image-to-image translation can mitigate this effect, where contrastive approaches enable learning of the image-to-image mapping under unsupervised regimes. However, existing methods often fail to handle content-rich scenes with multiple object instances, which manifests in unsatisfactory detection performance. Sensitivity to such instance-level content is typically only gained through object annotations, which can be expensive to obtain. Towards addressing this issue, we present a novel image-to-image translation method that specifically targets cross-domain object detection. We formulate our approach as a contrastive learning framework with an inductive prior that optimises the appearance of object instances through spatial attention masks, implicitly delineating the scene into foreground regions associated with the target object instances and background non-object regions. Instead of relying on object annotations to explicitly account for object instances during translation, our approach learns to represent objects by contrasting local-global information. This affords investigation of an under-explored challenge: obtaining performant detection, under domain shifts, without relying on object annotations nor detector model fine-tuning. We experiment with multiple cross-domain object detection settings across three challenging benchmarks and report state-of-the-art performance. Project page: https://local-global-detection.github.io
- Abienย Fred Agarap. Deep learning using rectified linear units (relu). CoRR, abs/1803.08375, 2018.
- Dunit: Detection-based unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4787โ4796, 2020.
- Demystifying mmd gans. In ArXiv, volume abs/1801.01401, 2018.
- Contrastive mean teacher for domain adaptive object detectors. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23839โ23848, 2023.
- Relation matters: Foreground-aware graph-based relational reasoning for domain adaptive object detection. IEEE Trans. Pattern Anal. Mach. Intell., 45(3):3677โ3694, 2023.
- Harmonizing transferability and discriminability for adapting object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8869โ8878, 2020.
- I3net: Implicit instance-invariant network for adapting one-stage object detectors. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- A simple framework for contrastive learning of visual representations. In Halย Daumรฉ III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1597โ1607. PMLR, 13โ18 Jul 2020.
- Improved baselines with momentum contrastive learning. In ArXiv, volume abs/2003.04297, 2020.
- The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Unbiased mean teacher for cross-domain object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
- Generative adversarial nets. In Z.ย Ghahramani, M.ย Welling, C.ย Cortes, N.ย Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volumeย 27. Curran Associates, Inc., 2014.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271โ21284, 2020.
- Efficient visual pretraining with contrastive detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems, volumeย 30, 2017.
- Mic: Masked image consistency for context-enhanced domain adaptation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11721โ11732, 2022.
- Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In European Conference on Computer Vision, pages 733โ748. Springer, 2020.
- Qs-attn: Query-selected attention for contrastive learning in I2I translation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Image-to-image translation with conditional adversarial networks. In 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Memory-guided unsupervised image-to-image translation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6554โ6563, Los Alamitos, CA, USA, jun 2021. IEEE Computer Society.
- Memory-guided unsupervised image-to-image translation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017.
- Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18239โ18248, 2022.
- Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, 2019.
- Instaformer: Instance-aware image-to-image translation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18321โ18331, 2022.
- Diversify and match: A domain adaptive representation learning paradigm for object detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12448โ12457, 2019.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- SCAN: cross domain object detection with semantic conditioned adaptation. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 1421โ1428. AAAI Press, 2022.
- Sigma: Semantic-complete graph matching for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5291โ5300, 2022.
- Sigma++: Improved semantic-complete graph matching for domain adaptive object detection. In IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1โ18, 2023.
- Domain-invariant disentangled network for generalizable object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8771โ8780, 2021.
- Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936โ944, 2017.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014.
- Towards robust adaptive object detection under noisy annotations. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- T.ย Breuel M.-Y.ย Liu and J.ย Kautz. Unsupervised image-to-image translation networks. In 31st International Conference on Neural Information Processing Systems, 2017.
- Ishan Misra and Laurens vanย der Maaten. Self-supervised learning of pretext-invariant representations. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6706โ6716, 2019.
- Synergizing between self-training and adversarial learning for domain adaptive object detection. In ArXiv, volume abs/2110.00249, 2021.
- Representation learning with contrastive predictive coding, Jan 2019.
- Unsupervised domain adaptation of object detectors: A survey. In arXiv preprint arXiv:2105.13502, 2021.
- Image-to-image translation: Methods and applications. In IEEE Transactions on Multimedia, volumeย 24, pages 3859โ3881. IEEE, 2021.
- Contrastive learning for unpaired image-to-image translation. In Computer VisionโECCV 2020: 16th European Conference, Glasgow, UK, August 23โ28, 2020, Proceedings, Part IX 16, pages 319โ345. Springer, 2020.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024โ8035. Curran Associates, Inc., 2019.
- Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. In Advances in Neural Information Processing Systems, volumeย 33, pages 3407โ3418, 2020.
- Wasserstein barycenter and its application to texture mixing. In Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29โJune 2, 2011, Revised Selected Papers 3, pages 435โ446. Springer, 2012.
- Seeking similarities over differences: Similarity-based domain alignment for adaptive object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9204โ9213, 2021.
- A.ย L. Rodriguez and K.ย Mikolajczyk. Domain adaptation for object detection via style consistency. In British Machine Vision Conference, 2019.
- Domain adaptation for object detection via style consistency. In 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019, page 232. BMVA Press, 2019.
- Automatic adaptation of object detectors to new domains using self-training. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Strong-weak distribution alignment for adaptive object detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Semantic foggy scene understanding with synthetic data. In International booktitle of Computer Vision, volume 126, page 973โ992, 2018.
- Cdtd: A large-scale cross-domain benchmark for instance-level image-to-image translation and domain adaptive object detection. In International booktitle of Computer Vision, pages 761โ780, 2021.
- Towards instance-level image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3683โ3692, 2019.
- Prior-based domain adaptive object detection for hazy and rainy conditions. In European Conference on Computer Vision, page 763โ780, 2020.
- Knowledge mining and transferring for domain adaptive object detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9113โ9122, 2021.
- Instance normalization: The missing ingredient for fast stylization. In ArXiv, volume abs/1607.08022, 2016.
- Laurens vanย der Maaten and Geoffrey Hinton. Viualizing data using t-sne. In booktitle of Machine Learning Research, volumeย 9, pages 2579โ2605, 11 2008.
- Unsupervised semantic segmentation by contrasting object mask proposals. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Towards online domain adaptive object detection. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 478โ488, 2022.
- Continual adaptation of visual representations via domain randomization and meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4443โ4453, 2021.
- Afan: Augmented feature alignment network for cross-domain object detection. In IEEE Transactions on Image Processing, volumeย 30, page 4046โ4056, 2021.
- Instance-wise hard negative example generation for contrastive learning in unpaired image-to-image translation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14000โ14009, 2021.
- Dense contrastive learning for self-supervised visual pre-training. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Domain-specific suppression for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9603โ9612, 2021.
- Instance-invariant domain adaptive object detection via progressive disentanglement. In IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1โ1, 2021.
- Detectron2: A pytorch-based modular object detection library. Meta AI, 10:3, 2019.
- Detco: Unsupervised contrastive learning for object detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Unpaired image-to-image translation with shortest path regularization. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Cross-domain detection via graph-induced prototype alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12355โ12364, 2020.
- Instance localization for self-supervised detection pretraining. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Cycle-consistent domain adaptive faster rcnn. In IEEE Access, volumeย 7, pages 123903โ123911. IEEE, 2019.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
- Da-detr: Domain adaptive detection transformer with information fusion. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23787โ23798, 2021.
- Panoptic-aware image-to-image translation. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023.
- Self-supervised visual representation learning from hierarchical grouping. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NeurIPS 2020, Red Hook, NY, USA, 2020. Curran Associates Inc.
- Task-specific inconsistency alignment for domain adaptive object detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- The spatially-correlative loss for various image translation tasks. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Multi-granularity alignment domain adaptation for object detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9571โ9580, 2022.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
- Adapting object detectors via selective cross-domain alignment. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Local aggregation for unsupervised learning of visual embeddings. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6001โ6011, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.