From Efficiency to Equity: Measuring Fairness in Preference Learning
Abstract: As AI systems, particularly generative models, increasingly influence decision-making, ensuring that they are able to fairly represent diverse human preferences becomes crucial. This paper introduces a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models. We validate our approach using two datasets: a custom visual preference dataset (AI-EDI-Space) and the Jester Jokes dataset. Our analysis reveals variations in model performance across users, highlighting potential epistemic injustices. We explore pre-processing and in-processing techniques to mitigate these inequalities, demonstrating a complex relationship between model efficiency and fairness. This work contributes to AI ethics by providing a framework for evaluating and improving epistemic fairness in preference learning models, offering insights for developing more inclusive AI systems in contexts where diverse human preferences are crucial.
- Robust sparse voting. In International Conference on Artificial Intelligence and Statistics, pages 991–999. PMLR, 2024.
- Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1):15–24, 2015.
- Kenneth J. Arrow. A Difficulty in the Concept of Social Welfare. Journal of Political Economy, 58(4):328–346, August 1950.
- Anthony B Atkinson et al. On the measurement of inequality. Journal of economic theory, 2(3):244–263, 1970.
- Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
- Fine-tuning language models to find agreement among humans with diverse preferences. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Fairness in Recommendation Ranking through Pairwise Comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2212–2220, Anchorage AK USA, July 2019. ACM.
- Maxmin-RLHF: Towards equitable alignment of large language models with diverse human preferences. In ICML 2024 Workshop on Models of Human Feedback for AI Alignment, 2024.
- Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals. Journal of Machine Learning Research, 20:1–59, 2019.
- Safe RLHF: Safe reinforcement learning from human feedback. In The Twelfth International Conference on Learning Representations, 2024.
- Deep Learning the City: Quantifying Urban Perception at a Global Scale. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, Lecture Notes in Computer Science, pages 196–212, Cham, 2016. Springer International Publishing.
- Generalized bradley-terry models for score estimation from paired comparisons. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20379–20386, 2024.
- Miranda Fricker. Epistemic injustice: Power and the ethics of knowing. Oxford University Press, 2007.
- Corrado Gini. On the measure of concentration with special reference to income and statistics, colorado college publication. General series, 208(1), 1936.
- Eigentaste: A Constant Time Collaborative Filtering Algorithm. Information Retrieval, 4(2):133–151, 2001.
- Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
- Epistemic injustice in generative ai. arXiv preprint arXiv:2408.11441, 2024.
- A survey on fairness without demographics. Transactions on Machine Learning Research, 2024.
- Large language models as superpositions of cultural perspectives, 2023.
- Simon Kuznets. Economic growth and income inequality. In The gap between rich and poor, pages 25–37. Routledge, 2019.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics.
- John Stuart Mill. On liberty. 1859.
- Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2408–2415, 2012.
- Webgpt: Browser-assisted question-answering with human feedback, 2022.
- Pairwise fairness for ranking and regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5248–5255, 2020.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards, 2023.
- John Rawls. A theory of justice. 1971.
- Why don’t you do it right? analysing annotators’ disagreement in subjective tasks. In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2428–2441, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics.
- How unfair is private learning ? In The 38th Conference on Uncertainty in Artificial Intelligence, May 2022.
- Utilities and the Issue of Fairness in a Decision Theoretic Model for Selection. Journal of Educational Measurement, 13(1):59–76, 1976.
- Anthony F Shorrocks. The class of additively decomposable inequality measures. Econometrica: Journal of the Econometric Society, pages 613–625, 1980.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Imagereward: Learning and evaluating human preferences for text-to-image generation, 2023.
- Constructive large language models alignment with diverse feedback, 2023.
- On diversified preferences of large language model alignment, 2024.
- Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.