Fairness-Aware Structured Pruning in Transformers
Abstract: The increasing size of LLMs has introduced challenges in their training and inference. Removing model components is perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based LLMs. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in terms of time and resources, as it does not require fine-tuning the final pruned, and fairer, model. Our findings demonstrate a reduction in gender bias by 19%, 19.5%, 39.5%, 34.7%, 23%, and 8% for DistilGPT-2, GPT-2, GPT-Neo of two different sizes, GPT-J, and Llama 2 models, respectively, in comparison to the biased model, with only a slight decrease in performance.
- Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2664–2674. Online: Association for Computational Linguistics.
- Pruning Neural Machine Translation for Speed Using Group Lasso. In Proceedings of the Sixth Conference on Machine Translation, 1074–1086. Online: Association for Computational Linguistics.
- Pruning neural machine translation for speed using group lasso. In Proceedings of the sixth conference on machine translation, 1074–1086.
- On Attention Redundancy: A Comprehensive Study. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 930–945. Online: Association for Computational Linguistics.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
- Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1004–1015.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Semantics Derived Automatically from Language Corpora Contain Human-Like Biases. Science, 356(6334): 183–186.
- On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 561–570. Dublin, Ireland: Association for Computational Linguistics.
- LaMDA: Language models for dialog applications.
- Measuring Fairness with Biased Rulers: A Comparative Study on Bias Metrics for Pre-trained Language Models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1693–1706. Seattle, United States: Association for Computational Linguistics.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL, 4171–4186.
- Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, 862–872.
- Measuring and mitigating unintended bias in text classification. In Conference on AI, Ethics, and Society.
- Reducing Transformer Depth on Demand with Structured Dropout. In International Conference on Learning Representations.
- Layer-wise Model Pruning based on Mutual Information. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 3079–3090. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In International Conference on Learning Representations.
- Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 122–133.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
- Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28.
- The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5555–5577. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, 166–172.
- Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2470–2480.
- A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5849–5859. Online: Association for Computational Linguistics.
- Dice Loss for Data-imbalanced NLP Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 465–476. Online: Association for Computational Linguistics.
- Jumainrassic-1: Technical details and evaluation.
- BRIO: Bringing Order to Abstractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2890–2903. Dublin, Ireland: Association for Computational Linguistics.
- On Measuring Social Biases in Sentence Encoders. In Conference of the North American Chapter of the Association for Computational Linguistics.
- An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models. In Annual Meeting of the Association for Computational Linguistics.
- Pointer Sentinel Mixture Models. In ICLR.
- Are Sixteen Heads Really Better than One? In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5356–5371.
- CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1953–1967.
- Exploring Sparsity in Recurrent Neural Networks. In International Conference on Learning Representations.
- When BERT Plays the Lottery, All Tickets Are Winning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3208–3229. Online: Association for Computational Linguistics.
- Language Models are Unsupervised Multitask Learners. OpenAI Blog, 1(8): 9.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
- Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 784–789. Melbourne, Australia: Association for Computational Linguistics.
- SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Conference on Empirical Methods in Natural Language Processing.
- Model compression for domain adaptation through causal effect estimation. Transactions of the Association for Computational Linguistics, 9: 1355–1373.
- Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 8–14.
- On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, 77: 101429.
- “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 9180–9211.
- Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv preprint arXiv:2201.11990.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5797–5808. Florence, Italy: Association for Computational Linguistics.
- GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small. In NeurIPS ML Safety Workshop.
- Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032.
- Named Entity Recognition as Dependency Parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6470–6476. Online: Association for Computational Linguistics.
- Should We Attend More or Less? Modulating Attention for Fairness. arXiv preprint arXiv:2305.13088.
- Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness. In AAAI Conference on Artificial Intelligence.
- Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 3238–3248. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
- Fast and Accurate Neural CRF Constituency Parsing. In Bessiere, C., ed., Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 4046–4053. International Joint Conferences on Artificial Intelligence Organization. Main track.
- Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 15–20. New Orleans, Louisiana: Association for Computational Linguistics.
- To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.