Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images

Published 25 Jan 2024 in cs.CV and cs.CY | (2401.14322v1)

Abstract: Capturing the diversity of people in images is challenging: recent literature tends to focus on diversifying one or two attributes, requiring expensive attribute labels or building classifiers. We introduce a diverse people image ranking method which more flexibly aligns with human notions of people diversity in a less prescriptive, label-free manner. The Perception-Aligned Text-derived Human representation Space (PATHS) aims to capture all or many relevant features of people-related diversity, and, when used as the representation space in the standard Maximal Marginal Relevance (MMR) ranking algorithm, is better able to surface a range of types of people-related diversity (e.g. disability, cultural attire). PATHS is created in two stages. First, a text-guided approach is used to extract a person-diversity representation from a pre-trained image-text model. Then this representation is fine-tuned on perception judgments from human annotators so that it captures the aspects of people-related similarity that humans find most salient. Empirical results show that the PATHS method achieves diversity better than baseline methods, according to side-by-side ratings from human annotators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. A View From Somewhere: Human-Centric Face Representations. arXiv preprint arXiv:2303.17176 (2023).
  2. The Reasonable Effectiveness of Diverse Evaluation Data. arXiv preprint arXiv:2301.09406 (2023).
  3. Diversity and emotion: The new frontiers in organizational behavior research. Journal of management 28, 3 (2002), 307–338.
  4. #SeeItBeIt: What Families are Seeing on TV. The Geena Davis Institute on Gender in Media.
  5. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems 29 (2016).
  6. Max-sum diversification, monotone submodular functions and dynamic updates. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems. 155–166.
  7. Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 335–336.
  8. Fair and diverse DPP-based data summarization. In International conference on machine learning. PMLR, 716–725.
  9. How to be fair and diverse? arXiv preprint arXiv:1610.07183 (2016).
  10. L Elisa Celis and Vijay Keswani. 2020. Implicit diversity in image summarization. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–28.
  11. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research 11, 3 (2010).
  12. DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3043–3054.
  13. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), Vol. 1. IEEE, 539–546.
  14. Huang-Cheng Chou and Chi-Chun Lee. 2019. Every rating matters: Joint learning of subjective labels and individual annotators for speech emotion classification. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5886–5890.
  15. Debiasing vision-language models via biased prompts. arXiv preprint arXiv:2302.00070 (2023).
  16. Flexibly fair representation learning by disentanglement. In International conference on machine learning. PMLR, 1436–1445.
  17. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1153–1162.
  18. Felix Danbold and Miguel M Unzueta. 2020. Drawing the diversity line: Numerical thresholds of diversity vary by group status. Journal of Personality and Social Psychology 118, 2 (2020), 283.
  19. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics 10 (2022), 92–110.
  20. Body image diversity in the media: A content analysis of women’s fashion magazines. Health Promotion Journal of Australia 29, 3 (2018), 251–256.
  21. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
  22. On the genealogy of machine learning datasets: A critical history of ImageNet. Big Data & Society 8, 2 (2021), 20539517211035955.
  23. CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 2342–2351. https://doi.org/10.1145/3531146.3534647
  24. Harrison Edwards and Amos Storkey. 2015. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897 (2015).
  25. Fair diffusion: Instructing text-to-image generation models on fairness. arXiv preprint arXiv:2302.10893 (2023).
  26. Religious discrimination in the workplace: A review and examination of current and future trends. Journal of Business and Psychology 28 (2013), 439–454.
  27. Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 363 (nov 2022), 28 pages. https://doi.org/10.1145/3555088
  28. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.
  29. Modeling Subjective Affect Annotations with Multi-Task Learning. Sensors 22, 14 (2022). https://doi.org/10.3390/s22145245
  30. In Defense of the Triplet Loss for Person Re-Identification. arXiv:1703.07737 [cs.CV]
  31. Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3. Springer, 84–92.
  32. Recent research on team and organizational diversity: SWOT analysis and implications. Journal of management 29, 6 (2003), 801–830.
  33. Yushi Jing and Shumeet Baluja. 2008. Visualrank: Applying pagerank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (2008), 1877–1890.
  34. Chen Karako and Putra Manggala. 2018. Using image fairness representations in diversity-based re-ranking for recommendations. In Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization. 23–28.
  35. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd annual acm conference on human factors in computing systems. 3819–3828.
  36. Age diversity, age discrimination climate and performance consequences—a cross organizational study. Journal of organizational behavior 32, 2 (2011), 264–290.
  37. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192 (2023).
  38. Does clip bind concepts? probing compositionality in large image models. arXiv preprint arXiv:2212.10537 (2022).
  39. What is gender, anyway: a review of the options for operationalising gender. Psychology & sexuality 12, 4 (2021), 332–344.
  40. Rich Ling. 2020. Confirmation bias in the era of mobile news consumption: the social and psychological dimensions. Digital Journalism 8, 5 (2020), 596–604.
  41. The variational fair autoencoder. arXiv preprint arXiv:1511.00830 (2015).
  42. Learning adversarially fair and transferable representations. In International Conference on Machine Learning. PMLR, 3384–3393.
  43. Generating and Controlling Diversity in Image Search. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 3908–3916. https://doi.org/10.1109/WACV51458.2022.00396
  44. Logdet divergence-based metric learning with triplet constraints and its applications. IEEE Transactions on Image Processing 23, 11 (2014), 4920–4931.
  45. Safiya Umoja Noble. 2018. Algorithms of Oppression. NYU Press.
  46. Investigating user perception of gender bias in image search: the role of sexism. In The 41st International ACM SIGIR conference on research & development in information retrieval. 933–936.
  47. New frontiers in diversity research: Conceptions of diversity and their theoretical and practical implications. (2015).
  48. Vinay Uday Prabhu and Abeba Birhane. 2020. Large image datasets: A pyrrhic win for computer vision? arXiv preprint arXiv:2006.16923 (2020).
  49. Image re-ranking based on topic diversity. IEEE Transactions on Image Processing 26, 8 (2017), 3734–3747.
  50. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  51. Aneeta Rattan and Nalini Ambady. 2013. Diversity ideologies and intergroup relations: An examination of colorblindness and multiculturalism. European Journal of Social Psychology 43, 1 (2013), 12–21.
  52. Linear adversarial concept erasure. In International Conference on Machine Learning. PMLR, 18400–18421.
  53. Muslim women’s workplace experiences: Implications for strategic diversity initiatives. Equality, Diversity and Inclusion: An International Journal 32, 1 (2012), 49–67.
  54. Consensus and Subjectivity of Skin Tone Annotation for ML Fairness. arXiv preprint arXiv:2305.09073 (2023).
  55. A step toward more inclusive people annotations for fairness. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 916–925.
  56. Representation Online Matters: Practical End-to-End Diversification in Search and Recommender Systems. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 1735–1746.
  57. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  58. Learning controllable fair representations. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2164–2173.
  59. Generating and controlling diversity in image search. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 411–419.
  60. Visual diversification of image search results. In Proceedings of the 18th international conference on World wide web. 341–350.
  61. Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1386–1393.
  62. Multimodal graph-based reranking for web image search. IEEE transactions on image processing 21, 11 (2012), 4649–4661.
  63. The Independent Compositional Subspace Hypothesis for the Structure of CLIP’s Last Layer. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models.
  64. Gender classification and bias mitigation in facial images. In Proceedings of the 12th ACM Conference on Web Science. 106–114.
  65. Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 547–558.
  66. Learning to rank using user clicks and visual features for image retrieval. IEEE transactions on cybernetics 45, 4 (2014), 767–779.
  67. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022).
  68. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 2, 3 (2022), 5.
  69. CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable and Controllable Text-Guided Face Manipulation. In ACM SIGGRAPH 2023 Conference Proceedings. 1–9.
Citations (5)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.