Machine Unlearning of Pre-trained Large Language Models
Abstract: This study investigates the concept of the `right to be forgotten' within the context of LLMs. We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning methods. Through rigorous evaluation using curated datasets from arXiv, books, and GitHub, we establish a robust benchmark for unlearning performance, demonstrating that these methods are over $105$ times more computationally efficient than retraining. Our results show that integrating gradient ascent with gradient descent on in-distribution data improves hyperparameter robustness. We also provide detailed guidelines for efficient hyperparameter tuning in the unlearning process. Our findings advance the discourse on ethical AI practices, offering substantive insights into the mechanics of machine unlearning for pre-trained LLMs and underscoring the potential for responsible AI development.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862.
- Machine unlearning. In S&P, pages 141–159.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Yinzhi Cao and Junfeng Yang. 2015. Towards making systems forget with machine unlearning. In S&P, pages 463–480.
- Membership inference attacks from first principles. In S&P, pages 1897–1914.
- Quantifying memorization across neural language models. In ICLR.
- The secret sharer: Evaluating and testing unintended memorization in neural networks. In USENIX Security, pages 267–284.
- Extracting training data from large language models. In USENIX Security, pages 2633–2650.
- Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. abs/2301.11578.
- Jiaao Chen and Diyi Yang. 2023. Unlearn what you want to forget: Efficient unlearning for llms. arXiv:2310.20150. EMNLP.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Efficient model updates for approximate unlearning of graph-structured data. In ICLR.
- Rishav Chourasia and Neil Shah. 2023. Forget unlearning: Towards true data-deletion in machine learning. In ICML, pages 6028–6073.
- Deep reinforcement learning from human preferences. In NeurIPS, pages 4299–4307.
- Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. In AAAI, pages 7210–7217.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm.
- Sanitizing sentence embeddings (and labels) for local differential privacy. In Proceedings of the ACM Web Conference 2023, pages 2349–2359.
- Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 2665–2679.
- Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284.
- Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407.
- Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. arXiv:2310.02238.
- Formalizing data deletion in the context of the right to be forgotten. In EUROCRYPT, pages 373–402.
- Making AI forget you: Data deletion in machine learning. In NeurIPS, pages 3513–3526.
- Eternal sunshine of the spotless net: Selective forgetting in deep networks. In CVPR, pages 9301–9309.
- Certified data removal from machine learning models. In ICML, pages 3832–3842.
- Adaptive machine unlearning. In NeurIPS, pages 16319–16330.
- Measuring massive multitask language understanding. In ICLR.
- Parameter-efficient transfer learning for NLP. In ICML, pages 2790–2799.
- Lora: Low-rank adaptation of large language models. In ICLR.
- Yiyang Huang and Clément L. Canonne. 2023. Tight bounds for machine unlearning via differential privacy. arXiv:2309.00886.
- Approximate data deletion from machine learning models. In AISTATS, pages 2008–2016.
- Measuring forgetting of memorized training examples. In ICLR.
- Knowledge unlearning for mitigating privacy risks in language models. In ACL, pages 14389–14408.
- Model sparsification can simplify machine unlearning. In NeurIPS (Spotlight). ArXiv:2304.04934.
- Measuring catastrophic forgetting in neural networks. In AAAI, pages 3390–3398.
- The lipschitz constant of self-attention. In ICML, pages 5562–5571.
- Openassistant conversations - democratizing large language model alignment.
- Privacy adhering machine un-learning in NLP. arXiv:2212.09573.
- Towards unbounded machine unlearning. arXiv:2302.09880.
- Meticulously selecting 1% of the dataset for pre-training! generating differentially private images data with semantics query. arXiv preprint arXiv:2311.12850.
- Large language models can be strong differentially private learners. In International Conference on Learning Representations.
- Holistic evaluation of language models.
- Breaking the trilemma of privacy, utility, efficiency via controllable machine unlearning.
- Estimating the carbon footprint of bloom, a 176b parameter language model. arXiv:2211.02001.
- Tofu: A task of fictitious unlearning for llms. arXiv preprint arXiv:2401.06121.
- Secure split learning against property inference, data reconstruction, and feature space hijacking attacks. arXiv preprint arXiv:2304.09515.
- Ilya Mironov. 2017. Rényi differential privacy. In CSF, pages 263–275.
- Training language models to follow instructions with human feedback. In NeurIPS.
- SSSE: efficiently erasing samples from trained machine learning models. arXiv:2107.03860.
- RealTimeData. 2024. github_latest.
- Remember what you want to forget: Algorithms for machine unlearning. In NeurIPS, pages 18075–18086.
- Chenze Shao and Yang Feng. 2022. Overcoming catastrophic forgetting beyond continual learning: Balanced training for neural machine translation. In ACL, pages 2023–2036.
- Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789.
- Membership inference attacks against machine learning models. In S&P, pages 3–18.
- Learning to summarize with human feedback. In NeurIPS.
- Fast yet effective machine unlearning. CoRR, abs/2111.08947.
- Memorization without overfitting: Analyzing the training dynamics of large language models. In NeurIPS.
- TogetherComputer. 2023. Redpajama: An open source recipe to reproduce llama training dataset.
- Llama: Open and efficient foundation language models. arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288.
- Selective forgetting: Advancing machine unlearning techniques and evaluation in language models. arXiv preprint arXiv:2402.05813.
- Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- Machine unlearning: A survey. ACM Comput. Surv., 56(1):9:1–9:36.
- Towards code watermarking with dual-channel transformations. arXiv preprint arXiv:2309.00860.
- Large language model unlearning. In Socially Responsible Language Modelling Research.
- Differential privacy for text analytics via natural text sanitization. arXiv preprint arXiv:2106.01221.
- Synthetic text generation with differential privacy: A simple and practical recipe. arXiv preprint arXiv:2210.14348.
- Right to be forgotten in the era of large language models: Implications, challenges, and solutions. arXiv:2307.03941.
- Machine unlearning methodology base on stochastic teacher network. arXiv:2308.14322.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.