Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

Published 3 May 2024 in cs.CV | (2405.02155v1)

Abstract: This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on multiple datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our model can significantly improve classification accuracy compared to single-model approaches, achieving AUROC scores above 96% across all test datasets, and notably surpassing 99% on the CIFAR-10 dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  2. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  3. Zest: Zero-shot learning from text descriptions using textual similarity and visual summarization. arXiv preprint arXiv:2010.03276, 2020.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Multi-label zero-shot learning with graph convolutional networks. Neural Networks, 132:333–341, 2020.
  7. J. Gao and C. S. Xu. Ci-gnn: Building a category-instance graph for zero-shot video classification. IEEE Transactions on Multimedia, 22(12):3088–3100, 2020.
  8. S. Sankaranarayanan and Y. Balaji. Meta learning for domain generalization. In Meta Learning With Medical Imaging and Health Informatics Applications, pages 75–86. Elsevier, 2023.
  9. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5542–5551, 2018.
  10. Generalized zero-shot learning with deep calibration network. Advances in neural information processing systems, 31, 2018.
  11. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020.
  12. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  13. Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 769–778, 2023.
  14. Image-free classifier injection for zero-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19072–19081, 2023.
  15. Chils: Zero-shot image classification with hierarchical label sets. In International Conference on Machine Learning, pages 26342–26362. PMLR, 2023.
  16. Multimodal fake news detection via clip-guided learning. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2825–2830. IEEE, 2023.
  17. N. K. Lahajal et al. Enhancing image retrieval: A comprehensive study on photo search using the clip mode. arXiv preprint arXiv:2401.13613, 2024.
  18. Extending clip for category-to-image retrieval in e-commerce. In European Conference on Information Retrieval, pages 289–303. Springer, 2022.
  19. Vita-clip: Video and text adaptive clip via multimodal prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23034–23044, 2023.
  20. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
  21. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  22. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3041–3050, 2023.
  23. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
  24. Improving language understanding by generative pre-training. 2018.
  25. J. A. Baktash and M. Dawodi. Gpt-4: A review on advancements and opportunities in natural language processing. arXiv preprint arXiv:2305.03195, 2023.
  26. A comprehensive study of chatgpt: Advancements, limitations and ethical considerations in natural language processing and cybersecurity. Information, 14(8):462, 2023.
  27. C. E. Haupt and M. Marks. Ai-generated medical advice—gpt and beyond. JAMA, 329(16):1349–1350, 2023.
  28. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. arXiv preprint arXiv:2310.09909, 2023.
  29. J. J. Huallpa et al. Exploring the ethical considerations of using chat gpt in university education. Periodicals of Engineering and Natural Sciences, 11(4):105–115, 2023.
  30. Gemini pro defeated by gpt-4v: Evidence from education. arXiv preprint arXiv:2401.08660, 2023.
  31. A. M. Perlman. The implications of chatgpt for legal services and society. Available at SSRN 4294197, 2022.
  32. Dall-e: Creating images from text. UGC Care Group I Journal, 8(14):71–75, 2021.
  33. N. Rane. Role and challenges of chatgpt and similar generative artificial intelligence in arts and humanities. Available at SSRN 4603208, 2023.
  34. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6023–6032, 2019.
  35. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems, 33:6256–6268, 2020.
  36. Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15211–15222, 2023.
  37. Learning open set network with discriminative reciprocal points. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III, pages 507–522. Springer, 2020.
  38. Hybrid models for open set recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III, pages 102–117. Springer, 2020.
  39. Pmal: Open set recognition via robust prototype mining. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1872–1880, 2022.
  40. Zero-shot out-of-distribution detection based on the pre-trained model clip. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6568–6576, 2022.
  41. Open-set recognition: A good closed-set classifier is all you need? 2021.
  42. W. Cho and J. Choo. Towards accurate open-set recognition via background-class regularization. In European Conference on Computer Vision, pages 658–674. Springer, 2022.
  43. Class-specific semantic reconstruction for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4214–4228, 2022.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.