Model Editing at Scale leads to Gradual and Catastrophic Forgetting

Published 15 Jan 2024 in cs.CL, cs.AI, and cs.IR | (2401.07453v4)

Abstract: Editing knowledge in LLMs is an attractive capability to have which allows us to correct incorrectly learnt facts during pre-training, as well as update the model with an ever-growing list of new facts. While existing model editing techniques have shown promise, they are usually evaluated using metrics for reliability, specificity and generalization over one or few edits. We argue that for model editing to have practical utility, we must be able to make multiple edits to the same model. With this in mind, we evaluate the current model editing methods at scale, focusing on two state of the art methods: ROME and MEMIT. We find that as the model is edited sequentially with multiple facts, it continually forgets previously edited facts and the ability to perform downstream tasks. This forgetting happens in two phases -- an initial gradual but progressive forgetting phase followed by abrupt or catastrophic forgetting phase. Both gradual and catastrophic forgetting limit the usefulness of model editing methods at scale -- the former making model editing less effective as multiple edits are made to the model while the latter caps the scalability of such model editing methods. Our analysis also highlights other key limitations of ROME and MEMIT at scale. With our work, we push for the development and evaluation of model editing methods keeping scalability in mind.

Abstract PDF HTML Upgrade to Chat

References (20)

Citations (33)

View on Semantic Scholar

Summary

The paper identifies that scaling model editing leads to both gradual and catastrophic forgetting, with sequential edits degrading earlier updates.
It evaluates methods like ROME and MEMIT on models such as GPT2-XL and GPT-J, showing that editing proficiency declines after a threshold.
The study underscores the need for refined evaluation metrics to preserve core NLP functionalities during large-scale model edits.

Model Editing at Scale and its Implications

The paper "Model Editing at Scale leads to Gradual and Catastrophic Forgetting" explores the concept of model editing in LLMs, with a specific focus on scaling this editing capability. The research specifically scrutinizes two state-of-the-art model editing methods—Rank-One Memory Editing (ROME) and Mass-Editing Memory in a Transformer (MEMIT)—for scalability, editing proficiency, fact forgetting, and downstream performance.

Introduction

Model editing in LLMs allows for the correction of inaccurately learned facts and the updating of models with new information without requiring full retraining, which is costly and time-consuming. Existing techniques like ROME and MEMIT show potential in controlled settings but are often evaluated based on reliability, specificity, and generalization with a limited number of edits. This work posits that practical utility of model editing requires robustness in performing multiple sequential edits to the same model. This study focuses on this aspect of scalability, examining the limitations of ROME and MEMIT when subjected to multiple sequential edits, using editing proficiency, fact forgetting, and downstream task performance as evaluation metrics.

Model Editing Techniques and Datasets

The study examines ROME and MEMIT while using MEND and fine-tuning (FT-C) as baselines. ROME utilizes Rank-One updates to strategically change model weights, whereas MEMIT adjusts weights of multiple layers to ensure success across diverse model architectures. The analysis evaluates both methods using GPT2-XL (1.5B) and GPT-J (6B) models. The CounterFact dataset is used for conducting edits due to its challenging counterfactual nature.

Evaluation of ROME

Editing Proficiency at Scale

ROME and MEMIT demonstrate effective knowledge editing at scale, outperforming methods like MEND, as evidenced by consistent success in sequentially performed edits (Figure 1). Initially, ROME maintains a near-perfect efficacy in implementing new edits until a certain threshold where performance begins to decline gradually, which indicates a reduction in editability.

Figure 1: The editing proficiency across different model editing methods on GPT-J (6B), highlighting the efficacy, paraphrase, and specificity scores as a function of the number of edits.

Gradual and Catastrophic Forgetting

Sequential editing leads to two significant phases of forgetting: gradual forgetting and catastrophic forgetting (Figure 2). Gradual forgetting refers to a continuing decline in the model's ability to recall earlier changes and perform downstream tasks, while catastrophic forgetting involves a swift, profound degradation in memory and functional capacity, triggered by a single destabilizing edit. This poses severe limitations for model editing scalability.

Figure 2: Editing proficiency of FT-C, MEND, and ROME on GPT-J (6B), showing gradual and catastrophic forgetting.

Figure 3: Compares the forgetting rate between ROME and MEMIT.

\subsection{Downstream Evaluation of Edited Models}

Figure \ref{fig:downstream} illustrates the degradation observed in downstream NLP tasks across different methods of model editing. The practical utility of model editing depends critically on its impact on LLM core functionalities, indicating that preserving downstream performance is crucial when making large-scale knowledge edits.

Figure 3: Compares the forgetting rate between ROME and MEMIT.

\section{Conclusion} This paper highlights the limitations of current model editing techniques when applied at scale, revealing two distinct phases of forgetting: gradual and catastrophic. The study calls for refined evaluation measures that extend beyond common knowledge editing metrics to include the impact on the model's innate NLP capabilities. The work emphasizes the need for future research on model editing methods that address issues of scalability, including both gradual and catastrophic forgetting, and maintain the original model's functional capabilities.