Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
Abstract: A binary decision task, like yes-no questions or answer verification, reflects a significant real-world scenario such as where users look for confirmation about the correctness of their decisions on specific issues. In this work, we observe that LLMs exhibit a negative bias in the binary decisions of complex reasoning tasks. Based on our observations and the rationale about attention-based model dynamics, we propose a negative attention score (NAS) to systematically and quantitatively formulate negative bias. Based on NAS, we identify attention heads that attend to negative tokens provided in the instructions as answer candidate of binary decisions, regardless of the question in the prompt, and validate their association with the negative bias. Additionally, we propose the negative attention score alignment (NASA) method, which is a parameter-efficient fine-tuning technique to address the extracted negatively biased attention heads. Experimental results from various domains of reasoning tasks and large model search space demonstrate that NASA significantly reduces the gap between precision and recall caused by negative bias while preserving their generalization abilities.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Explaining how transformers use context to build predictions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5486–5513, Toronto, Canada. Association for Computational Linguistics.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- Preserving pre-trained features helps calibrate fine-tuned language models. In The Eleventh International Conference on Learning Representations.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Self-refine: Iterative refinement with self-feedback. In Thirty-seventh Conference on Neural Information Processing Systems.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- ♫ musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817.
- Do large language models latently perform multi-hop reasoning? arXiv preprint arXiv:2402.16837.
- HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
- Metamath: Bootstrap your own mathematical questions for large language models. In The Twelfth International Conference on Learning Representations.
- Whispers that shake foundations: Analyzing and mitigating false premise hallucinations in large language models. arXiv preprint arXiv:2402.19103.
- Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
- Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.