LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Abstract: Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories. In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content. Firstly, inspired by the rich implicit knowledge in large-scale models (e.g., LLMs, LLMs), LTGC leverages the power of these models to parse and reason over the original tail data to produce diverse tail-class content. We then propose several novel designs for LTGC to ensure the quality of the generated data and to efficiently fine-tune the model using both the generated and original data. The visualization demonstrates the effectiveness of the generation module in LTGC, which produces accurate and diverse tail data. Additionally, the experimental results demonstrate that our LTGC outperforms existing state-of-the-art methods on popular long-tailed benchmarks.
- Balanced product of calibrated experts for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19967–19977, 2023.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Towards language models that can see: Computer vision through the lens of natural language. arXiv preprint arXiv:2306.16410, 2023.
- Counterfactuals uncover the modular structure of deep generative models. In International Conference on Learning Representations, 2019.
- A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106:249–259, 2018.
- Learning imbalanced datasets with label-distribution-aware margin loss. arXiv preprint arXiv:1906.07413, 2019.
- Minigpt-v2: Large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478, 2023.
- Imagine by reasoning: A reasoning-based implicit semantic data augmentation for long-tailed classification. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 356–364, 2022.
- Remix: rebalanced mixup. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 95–110. Springer, 2020.
- Holistic analysis of hallucination in gpt-4v(ision): Bias and interference challenges, 2023.
- Parametric contrastive learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 715–724, 2021.
- Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Lpt: Long-tailed prompt tuning for image classification. arXiv preprint arXiv:2210.01033, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6626–6636, 2021.
- The inaturalist challenge 2017 dataset. CoRR, abs/1707.06642, 2017.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002.
- Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217, 2019.
- Exploring balanced feature spaces for representation learning. In International Conference on Learning Representations, 2020.
- Text2video-zero: Text-to-image diffusion models are zero-shot video generators. arXiv preprint arXiv:2303.13439, 2023.
- Cost-sensitive learning of deep feature representations from imbalanced data. IEEE transactions on neural networks and learning systems, 29(8):3573–3587, 2017.
- Metasaug: Meta semantic augmentation for long-tailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5212–5221, 2021.
- Targeted supervised contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6918–6928, 2022.
- Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence, 2018.
- Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2537–2546, 2019.
- Retrieval augmented classification for long-tail visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6959–6969, 2022.
- Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314, 2020.
- Lmc: Large model collaboration with cross-assessment for training-free open-set object recognition. arXiv preprint arXiv:2309.12780, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2023.
- Invariant feature learning for generalized long-tailed classification. In European Conference on Computer Vision, pages 709–726. Springer, 2022.
- Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In European Conference on Computer Vision, pages 73–91. Springer, 2022.
- Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pages 242–264. IGI global, 2010.
- Effective data augmentation with diffusion models. arXiv preprint arXiv:2302.07944, 2023.
- Visual query tuning: Towards effective usage of intermediate representations for parameter and memory efficient transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7725–7735, 2023.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Contrastive learning based hybrid networks for long-tailed image classification. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 943–952, 2021a.
- Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 943–952, 2021b.
- Self-supervised learning disentangled group representation as feature. Advances in Neural Information Processing Systems, 34:18225–18240, 2021c.
- Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809, 2020.
- Robogen: Towards unleashing infinite data for automated robot learning via generative simulation, 2023.
- Learning to model the tail. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 7032–7042, 2017a.
- Learning to model the tail. Advances in neural information processing systems, 30, 2017b.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 247–263. Springer, 2020.
- Intriguing properties of adversarial training at scale. arXiv preprint arXiv:1906.03787, 2019.
- Rethinking the value of labels for improving class-imbalanced learning. Advances in neural information processing systems, 33:19290–19301, 2020.
- The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 2023.
- Contextual object detection with multimodal large language models. arXiv preprint arXiv:2305.18279, 2023.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Test-agnostic long-tailed recognition by test-time aggregating diverse experts with self-supervision. arXiv preprint arXiv:2107.09249, 2021a.
- Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596, 2021b.
- Mdcs: More diverse experts with consistency self-distillation for long-tailed recognition. arXiv preprint arXiv:2308.09922, 2023.
- Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16489–16498, 2021.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Balanced contrastive learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6908–6917, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.