Papers
Topics
Authors
Recent
Search
2000 character limit reached

CLEAR: Cross-Transformers with Pre-trained Language Model is All you need for Person Attribute Recognition and Retrieval

Published 10 Mar 2024 in cs.CV | (2403.06119v2)

Abstract: Person attribute recognition and attribute-based retrieval are two core human-centric tasks. In the recognition task, the challenge is specifying attributes depending on a person's appearance, while the retrieval task involves searching for matching persons based on attribute queries. There is a significant relationship between recognition and retrieval tasks. In this study, we demonstrate that if there is a sufficiently robust network to solve person attribute recognition, it can be adapted to facilitate better performance for the retrieval task. Another issue that needs addressing in the retrieval task is the modality gap between attribute queries and persons' images. Therefore, in this paper, we present CLEAR, a unified network designed to address both tasks. We introduce a robust cross-transformers network to handle person attribute recognition. Additionally, leveraging a pre-trained LLM, we construct pseudo-descriptions for attribute queries and introduce an effective training strategy to train only a few additional parameters for adapters, facilitating the handling of the retrieval task. Finally, the unified CLEAR model is evaluated on five benchmarks: PETA, PA100K, Market-1501, RAPv2, and UPAR-2024. Without bells and whistles, CLEAR achieves state-of-the-art performance or competitive results for both tasks, significantly outperforming other competitors in terms of person retrieval performance on the widely-used Market-1501 dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Symbiotic adversarial learning for attribute-based person search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 230–247. Springer, 2020.
  2. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 357–366, 2021.
  3. Enhance via decoupling: Improving multi-label classifiers with variational feature augmentation. In 2021 IEEE International Conference on Image Processing (ICIP), pages 1329–1333. Institute of Electrical and Electronics Engineers (IEEE), 2021.
  4. Upar challenge: Pedestrian attribute recognition and attribute-based person retrieval–dataset, design, and results. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 166–175, 2023.
  5. Upar challenge 2024: Pedestrian attribute recognition and attribute-based person retrieval-dataset, design, and results. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 359–367, 2024.
  6. Pedestrian attribute recognition at far distance. In Proceedings of the 22nd ACM international conference on Multimedia, pages 789–792, 2014.
  7. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2019.
  8. Person search by text attribute query as zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3652–3661, 2019.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  10. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 729–739, 2019.
  11. Spatial transformer networks, 2016.
  12. Asmr: Learning attribute-based person search with adaptive semantic margin regularizer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12016–12025, 2021.
  13. Rethinking of pedestrian attribute recognition: A reliable evaluation under zero-shot pedestrian identity setting. arXiv preprint arXiv:2107.03576, 2021.
  14. Learning disentangled attribute representations for robust pedestrian attribute recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1069–1077, 2022.
  15. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pages 111–115. IEEE, 2015.
  16. Identity-aware textual-visual matching with latent co-attention. In Proceedings of the IEEE International Conference on Computer Vision, pages 1890–1899, 2017.
  17. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In 2018 IEEE international conference on multimedia and expo (ICME), pages 1–6. IEEE, 2018.
  18. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE transactions on image processing, 28(4):1575–1590, 2018.
  19. Improving person re-identification by attribute and identity learning. Pattern recognition, 95:151–161, 2019.
  20. Hydraplus-net: Attentive deep features for pedestrian analysis. In Proceedings of the IEEE international conference on computer vision, pages 350–359, 2017.
  21. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  22. Improving language understanding by generative pre-training. 2018.
  23. Deep imbalanced attribute classification using visual attention aggregation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 680–697, 2018.
  24. Upar: Unified pedestrian attribute recognition and person retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 981–990, 2023.
  25. Relation-aware pedestrian attribute recognition with graph convolutional networks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12055–12062, 2020.
  26. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4997–5006, 2019.
  27. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  28. Adversarial attribute-image person re-identification. arXiv preprint arXiv:1712.01493, 2017.
  29. Improving pedestrian attribute recognition with multi-scale spatial calibration. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.