Benchmarking the Fairness of Image Upsampling Methods
Abstract: Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics$\unicode{x2013}$inspired by their supervised fairness counterparts$\unicode{x2013}$to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results. All experiments can be reproduced using our provided repository.
- A reductions approach to fair classification. In International conference on machine learning. PMLR, 60–69.
- T. W. Anderson and D. A. Darling. 1954. A Test of Goodness of Fit. J. Amer. Statist. Assoc. 49, 268 (1954), 765–769. http://www.jstor.org/stable/2281537
- Carlotta Balestra and Lara Fleischer. 2018. Diversity statistics in the OECD: How do OECD countries collect data on ethnic, racial and indigenous identity? (2018).
- Fair normalizing flows. arXiv preprint arXiv:2106.05937 (2021).
- Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77–91.
- Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22367–22377.
- Fair generative modeling via weak supervision. In International Conference on Machine Learning. PMLR, 1887–1898.
- Alexandra Chouldechov. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data (2017), 153–163.
- Fair regression with wasserstein barycenters. Advances in Neural Information Processing Systems 33 (2020), 7321–7331.
- Fairness in deep learning: A computational perspective. IEEE Intelligent Systems 36, 4 (2020), 25–34.
- Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
- Harrison Edwards and Amos Storkey. 2015. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897 (2015).
- Canadian Institute for Health Information. 2022. Guidance on the use of standards for race-based and indigenous identity data collection and health reporting in Canada.
- Introduction: Race, genetics, and disease: Questions of evidence, matters of consequence. Social Studies of Science 38, 5 (2008), 643–656.
- Ramán Grosfoguel. 2004. Race and ethnicity or racialized ethnicities? Identities within global coloniality. Ethnicities 4, 3 (2004), 315–336.
- Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016).
- DiffiT: Diffusion Vision Transformers for Image Generation. arXiv preprint arXiv:2312.02139 (2023).
- Junia Howell and Michael O. Emerson. 2017. So What “Should” We Use? Evaluating the Impact of Five Racial Measures on Markers of Social Inequality. Sociology of Race and Ethnicity 3, 1 (2017), 14–30. https://doi.org/10.1177/2332649216648465 arXiv:https://doi.org/10.1177/2332649216648465
- Magnet: Uniform sampling from deep generative network manifolds without retraining. In International Conference on Learning Representations.
- Linda M Hunt and Mary S Megyesi. 2008. The ambiguous meanings of the racial/ethnic categories routinely used in human genetics research. Social science & medicine 66, 2 (2008), 349–361.
- A survey on generative adversarial networks: Variants, applications, and training. ACM Computing Surveys (CSUR) 54, 8 (2021), 1–49.
- Imperfect imaganation: Implications of gans exacerbating biases on facial data augmentation and snapchat selfie lenses. arXiv preprint arXiv:2001.09528 (2020).
- Fairness for image generation with uncertain sensitive attributes. In International Conference on Machine Learning. PMLR, 4721–4732.
- Kimmo Karkkainen and Jungseock Joo. 2021. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1548–1558.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations. https://openreview.net/forum?id=Hk99zCeAb
- Training generative adversarial networks with limited data. Advances in neural information processing systems 33 (2020), 12104–12114.
- Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852–863.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
- Denoising diffusion restoration models. Advances in Neural Information Processing Systems 35 (2022), 23593–23606.
- Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 3819–3828.
- Estimating skin tone and effects on classification performance in dermatology datasets. arXiv preprint arXiv:1910.13268 (2019).
- Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
- Studying bias in gans through the lens of race. In European Conference on Computer Vision. Springer, 344–360.
- A survey on bias and fairness in machine learning. ACM computing surveys (CSUR) 54, 6 (2021), 1–35.
- Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition. 2437–2445.
- Diversity in faces. arXiv preprint arXiv:1901.10436 (2019).
- Making a “completely blind” image quality analyzer. IEEE Signal processing letters 20, 3 (2012), 209–212.
- Jack Morse. 2017. Google’s AI has some seriously messed up opinions about homosexuality. mashable (2017).
- Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning. PMLR, 8162–8171.
- General fair empirical risk minimization. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- Diatom autofocusing in brightfield microscopy: a comparative study. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 3. 314–317 vol.3. https://doi.org/10.1109/ICPR.2000.903548
- Dana Pessach and Erez Shmueli. 2022. A Review on Fairness in Machine Learning. ACM Comput. Surv. 55, 3, Article 51 (feb 2022), 44 pages.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
- Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2287–2296.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
- Wendy D. Roth. 2016. The multiple dimensions of race. Ethnic and Racial Studies 39, 8 (2016), 1310–1338. https://doi.org/10.1080/01419870.2016.1140793 arXiv:https://doi.org/10.1080/01419870.2016.1140793
- Detecting Demographic Bias in Automatically Generated Personas. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6.
- Fairness gan. arXiv preprint arXiv:1805.09910 (2018).
- StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets. arXiv.org abs/2201.00273. https://arxiv.org/abs/2201.00273
- Richard T Schaefer. 2008. Encyclopedia of race, ethnicity, and society. Vol. 1. Sage.
- Tom Simonite. 2015. Porbing the Dark Side of Google’s Ad-Targeting System. MIT Technology Review (2015).
- Yang Song and Stefano Ermon. 2020. Improved techniques for training score-based generative models. Advances in neural information processing systems 33 (2020), 12438–12448.
- Improving the fairness of deep generative models without retraining. arXiv preprint arXiv:2012.04842 (2020).
- Debiasing Image-to-Image Translation Models.. In BMVC. 182.
- Fair Generative Models via Transfer Learning. arXiv preprint arXiv:2212.00926 (2022).
- Kevin Truong. [n. d.]. This Image of a White Barack Obama Is AI’s Racial Bias Problem In a Nutshell. Vice News ([n. d.]). https://www.vice.com/en/article/7kpxyy/this-image-of-a-white-barack-obama-is-ais-racial-bias-problem-in-a-nutshell
- Meta Balanced Network for Fair Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2022), 8433–8448. https://doi.org/10.1109/TPAMI.2021.3103191
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- Max Welling and Yee W Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11). 681–688.
- Frank Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 6 (1945), 80–83. http://www.jstor.org/stable/3001968
- Fairgan: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 570–575.
- Diffusion models: A comprehensive survey of methods and applications. Comput. Surveys 56, 4 (2023), 1–39.
- Inclusive gan: Improving data and minority coverage in generative models. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. Springer, 377–393.
- Fairness in generative modeling: do it unsupervised!. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 320–323.
- Learning fair representations. In International conference on machine learning. PMLR, 325–333.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
- Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5810–5818.
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on.
- MEDFAIR: Benchmarking fairness for medical imaging. arXiv preprint arXiv:2210.01725 (2022).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.