Fairness-Aware Structured Pruning in Transformers

Published 24 Dec 2023 in cs.CL, cs.CY, and cs.LG | (2312.15398v1)

Abstract: The increasing size of LLMs has introduced challenges in their training and inference. Removing model components is perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based LLMs. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in terms of time and resources, as it does not require fine-tuning the final pruned, and fairer, model. Our findings demonstrate a reduction in gender bias by 19%, 19.5%, 39.5%, 34.7%, 23%, and 8% for DistilGPT-2, GPT-2, GPT-Neo of two different sizes, GPT-J, and Llama 2 models, respectively, in comparison to the biased model, with only a slight decrease in performance.

Abstract PDF HTML Upgrade to Chat

References (58)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces FASP, a method that targets bias-inducing attention heads to reduce model biases without significant loss in language modeling performance.
It quantifies the impact of each attention head on bias metrics and demonstrates substantial gender bias reduction in models like GPT-2.
The research underscores the value of fairness in model optimization, paving the way for broader applications and further exploration of bias mitigation techniques.

Fairness-Aware Structured Pruning in Transformers

The paper "Fairness-Aware Structured Pruning in Transformers" critically examines the challenge of balancing fairness and performance in LLMs through structured pruning. The study addresses a key limitation in current pruning methodologies, which predominantly focus on maximizing efficiency without adequately considering the implications on model fairness. This oversight can lead to unfair representations and outputs, particularly impacting underrepresented communities such as women, Black people, and LGBTQ+ individuals.

Main Contributions

The authors present a novel pruning approach aimed at enhancing fairness while maintaining performance, focusing specifically on pre-trained transformer-based models. Key contributions of this work are:

Investigation of Pruning Impact: The paper explores the consequences of existing attention head pruning strategies on biases present across different LLMs, revealing that these methods often fail to improve fairness.
Quantification of Bias: An innovative method is proposed to quantify the bias impact of attention heads. This involves evaluating the change in bias metrics before and after the removal of an attention head, thereby determining its contribution to overall model bias.
Fairness-Aware Pruning: The study introduces a structured pruning method that intentionally targets heads contributing to bias, only providing they are not critical for language modeling accuracy. This dual-focus methodology is called Fairness-Aware Structured Pruning (FASP).
Empirical Evaluation: Experimental validation showcases FASP's robustness, achieving significant reductions in gender bias across multiple transformer models, including DistilGPT-2 and GPT-2, without substantial degradation in performance as measured by language modeling perplexity.
Effect on Various Biases: The paper also explores how targeting gender bias affects other biases relating to religion, race, sexual orientation, and nationality. It observes a correlation between reductions in gender bias and other social biases, highlighting the interconnected nature of these biases within model architectures.

Implications and Future Directions

The implications of this research are substantial for the responsible deployment of LLMs in sensitive applications, suggesting that bias mitigation can be effectively integrated into model optimization processes. The methodology introduces a practical strategy to enhance model fairness without resorting to costly retraining processes.

Looking forward, the work opens avenues for further research into:

Generalization of the proposed pruning approach across various architectures and domains.
Development of comprehensive fairness metrics encompassing a broader range of biases.
Exploration of real-world applications to validate the efficacy of the approach in practice.

Moreover, the positive correlation observed between different social biases implies an opportunity to develop unified strategies for bias mitigation, extending beyond the scope of gender bias alone. Additionally, the structured pruning approach could inform future efforts in developing fair AI models that are both efficient and equitable, thus contributing to fairer AI systems on a broader scale.

Markdown Report Issue