Papers
Topics
Authors
Recent
Search
2000 character limit reached

Exact and Approximate Unlearning

Updated 20 February 2026
  • Exact and approximate unlearning is a set of methods enabling ML models to forget selected training data for privacy compliance.
  • Exact unlearning guarantees complete removal by matching retrained outputs, while approximate unlearning minimizes residual influence within a bounded error.
  • Techniques like full retraining, sharding, influence functions, and gradient rollbacks are used to balance computational cost, auditability, and model integrity.

Exact and Approximate Unlearning

Machine unlearning comprises a suite of algorithmic techniques that enable a machine learning model to “forget” the influence of specified training data—typically in response to data deletion or privacy requests—such that the resulting model behaves as if the forgotten data had never been used during training. Methods are generally classified into exact unlearning, which guarantees functional or distributional equivalence to retraining the model on the retained data, and approximate unlearning, which aims to efficiently reduce or minimize the residual data influence—typically evaluated under some metric of closeness to retraining but without a provable guarantee of erasure. This distinction has foundational implications for privacy, auditability, computational cost, and the validity of unlearning claims in legal or regulatory regimes.

1. Formal Definitions and Foundational Distinctions

Let D\mathcal{D} be a training dataset, Dr\mathcal{D}_r the retained subset, and Df\mathcal{D}_f the subset to be forgotten. Let AA denote the complete machine learning procedure, including model architecture, optimizer, hyperparameters, and random seeds. The model parameters after training on D\mathcal{D} are θD=A(D)\theta_{\mathcal{D}} = A(\mathcal{D}), and after retraining on Dr\mathcal{D}_r, denoted θDr=A(Dr)\theta_{\mathcal{D}_r} = A(\mathcal{D}_r).

Exact unlearning requires that the unlearning mechanism UU produces a model θ\theta^- such that

θ=U(A(D),D,Df)=A(Dr)\theta^- = U(A(\mathcal{D}), \mathcal{D}, \mathcal{D}_f) = A(\mathcal{D}_r)

meaning that the unlearned model is identically distributed (and, for deterministic AA, identical in parameters and outputs) to what would have been obtained had Df\mathcal{D}_f never participated in training. This guarantee applies to all test inputs and output statistics (Thudi et al., 2021, Tran et al., 18 Apr 2025, Xu et al., 2023, Yang et al., 2024).

Approximate unlearning relaxes this requirement, demanding only that the unlearning process yields a model θu\theta_u close to θDr\theta_{\mathcal{D}_r} under a prespecified metric dd,

d(θu,θDr)ϵd(\theta_u, \theta_{\mathcal{D}_r}) \leq \epsilon

where dd may denote 2\ell_2 parameter distance or output-space divergences such as KL-divergence,

d(θ,θ)=θθ2orDKL(p(θ)p(θ))d(\theta, \theta') = \|\theta - \theta'\|_2 \quad \text{or} \quad D_{KL}(p(\cdot|\theta)\|p(\cdot|\theta'))

(Thudi et al., 2021, Tran et al., 18 Apr 2025, Xu et al., 2023). The choice of ϵ\epsilon reflects an empirical or analytic upper bound on residual influence.

A key critique is that, particularly for deep learning under stochastic optimization, there may exist disjoint datasets yielding identical models, rendering the standard ϵ\epsilon-closeness definition vacuous—one could claim to have unlearned without modifying the model at all (Thudi et al., 2021).

2. Algorithmic Paradigms for Exact and Approximate Unlearning

Exact Unlearning

  • Full retraining: Discard the original model and train a new model on Dr\mathcal{D}_r from scratch with the same random seed and setup. Guarantees complete removal of the forgotten data's influence, but is often computationally prohibitive for large models or frequent deletions (Thudi et al., 2021, Tran et al., 18 Apr 2025, Xu et al., 2023).
  • Sharding and isolation (SISA): Partition data into disjoint shards and sequential slices, train independently, and store intermediate model checkpoints. Upon deletion, only the affected shard and subsequent slices are retrained; the rest of the model remains untouched. This approach achieves exact unlearning while amortizing retraining cost, at the expense of increased storage and aggregation overhead (Xu et al., 2023, Yang et al., 2024, Chowdhury et al., 2024).
  • Parameter isolation and modular architectures: Approaches such as LegoNet (fixed-encoder, multiple adapters) or S3T (sequence-aware, sharded, and sliced fine-tuning) enable efficient exact unlearning by retraining or removing only the model components uniquely affected by the deleted data (Yu et al., 2022, Chowdhury et al., 2024).
  • Algorithmic stability and total-variation (TV) stability: Algorithms designed with intrinsic strong stability to deletions, e.g., TV-stable noisy-SGD, enable Las Vegas–style unlearning with rigorous risk bounds and only occasional retraining, proportional to the TV-stability parameter (Ullah et al., 2021).

Approximate Unlearning

  • Influence function updates: Use first-order or Newton approximations, typically involving the Hessian or Fisher matrix, to estimate and subtract the forgotten point’s influence on the learned parameter (Xu et al., 2023, Tran et al., 18 Apr 2025, Li et al., 2024). The canonical influence update is:

θu=θDHθD1θ(θD;x)\theta_u = \theta_{\mathcal{D}} - H^{-1}_{\theta_{\mathcal{D}}} \nabla_\theta \ell(\theta_{\mathcal{D}}; x^*)

  • Gradient-based rollbacks and fine-tuning: Modify the trained model by performing a small number of gradient steps (often with modified or random labels) on the retained data, the forgotten data, or their union (e.g., catastrophic forgetting, boundary-shrink, knowledge distillation) (Tran et al., 18 Apr 2025, Xu et al., 2023).
  • Scrubbing and functional reoptimization: Iteratively minimize an objective balancing residual memory of the forgotten data and performance on the retained set, potentially through KL-based scrubbing, Fisher projection, or parameter regularization terms (Xu et al., 2023, Li et al., 2024).
  • Model sparsification: Empirically and theoretically, sparsity (through pruning) improves the effectiveness of approximate unlearning by reducing the approximation gap and localizing data influence to a small parameter subspace (Jia et al., 2023).

A summary of canonical exact and approximate strategies is provided in the following table:

Method Paradigm Exactness Guarantee
Full retraining Exact Yes, parameter/output equivalence
SISA Exact Yes, if retrain schedule is followed
Influence functions Approximate No, error bounded by Taylor expansion
Gradient rollbacks Approximate No, empirical 2\ell_2/output bound
Modular architectures Exact (LegoNet, S3T) Yes, within component granularity
TV-stable SGD Exact (w.p. 1ρ1-\rho) Yes, with bounded risk, randomized

3. Theoretical Limitations and Auditability

A central finding is that unlearning definitions or claims grounded solely in closeness of model parameters, outputs, or even training trajectories are fundamentally non-auditable due to forgeability (Thudi et al., 2021). Specifically:

  • For SGD and mean-sampler algorithms, it is possible (with high probability as batch size increases) to construct two disjoint datasets yielding the same final weights. Thus,

θ,D,D=D{x}: θ=A(D)=A(D)\exists \theta, \mathcal{D}, \mathcal{D}' = \mathcal{D} \setminus \{x^*\}:~ \theta = A(\mathcal{D}) = A(\mathcal{D}')

  • Attempts to verify unlearning through "proof-of-unlearning" logs or model distance are defeated by the possibility of forging logs via alternative batch traces that exclude the forgotten data but yield indistinguishable models (Thudi et al., 2021).
  • Only unlearning claims made at the algorithmic level—i.e., assertions and evidence that a specific, externally auditable unlearning procedure was executed without access to Df\mathcal{D}_f—are meaningful. Achieving this may require verifiable computation primitives, cryptographic audit logs, or trusted execution environments (Thudi et al., 2021).

As a consequence, approximate unlearning based solely on ϵ\epsilon-closeness is mathematically vacuous in the presence of forging; exact unlearning claims cannot be audited from models or logs, but only from process-level attestations.

4. Measurement, Diagnostics, and Evaluation Metrics

The evaluation of unlearning effectiveness, particularly for approximate methods, is nontrivial:

  • Verification error (2\ell_2 gap): The norm difference between the unlearned model and an exactly retrained model on Dr\mathcal{D}_r is a primary metric subsuming others (Thudi et al., 2021). However, this cannot be computed without retraining.
  • Membership inference attack (MIA) resistance: Ability of an attacker to infer the presence or absence of the forgotten data from the model's outputs or gradients. While widely used, MIAs are fundamentally binary and computationally costly, and fail to capture the continuum of unlearning completeness in approximate methods (Wang et al., 6 Jun 2025).
  • Interpolated Approximate Measurement (IAM): A recent framework that quantifies sample-level unlearning completeness by interpolating the generalization–fitting gap between models, yielding continuous scores for under- and over-unlearning. IAM achieves strong performance under both exact and approximate settings and is robust to data/model shifts (Wang et al., 6 Jun 2025).

Advanced diagnostics are critical because empirical studies reveal that approximate unlearners are prone to both under-unlearning (residual influence remains) and over-unlearning (unintentional loss of information about retained data) (Wang et al., 6 Jun 2025, Tran et al., 18 Apr 2025). IAM and other continuous scoring systems are advocated for systematic measurement and safeguard deployment.

5. Fairness, Robustness, and Structural Guarantees

Unlearning algorithms affect not only privacy and data influence, but also fairness (class-wise feature variance) and adversarial robustness:

  • Fairness-gap metric: Defined as the maximal difference in class-wise feature variance at a specified layer,

ϵl=maxcσclmincσcl\epsilon^l = \max_c \sigma_c^l - \min_c \sigma_c^l

where higher ϵl\epsilon^l indicates greater model sensitivity and fairness violations (Tran et al., 18 Apr 2025).

  • Empirical findings: Methods that track or preserve the original fairness-gap are more robust to adversarial attacks; approximate unlearning methods often inflate the fairness-gap, increasing vulnerability (Tran et al., 18 Apr 2025).
  • Layer-wise unlearning: Focusing unlearning updates on intermediate and final layers can efficiently restore original fairness and robustness with sub-linear computational/memory cost (Tran et al., 18 Apr 2025).
  • These findings recommend explicit joint monitoring of accuracy, fairness-gap, and adversarial robustness when evaluating unlearning algorithms for both privacy and trustworthiness.

6. Specialized and Hybrid Frameworks

Emerging developments address domain- and resource-specific requirements for unlearning:

  • Model merging for scalable exact unlearning: SIFT-Masks achieves exact unlearning at scale (hundreds of tasks) via sign-constrained fine-tuning and task-local masks, enabling O(1)O(1) unlearning per deletion while recovering most task performance (Kuo et al., 6 Apr 2025).
  • Resource-constrained (edge) exact unlearning: Approaches such as CAUSE combine sharding, adaptive pruning, and memory-efficient replacement to implement exact unlearning with orders-of-magnitude reductions in speed, energy, and storage requirements for devices with limited compute capacity (Xia et al., 2024).
  • Hybrid strategies: Adaptive frameworks dynamically select between exact and approximate unlearning modes based on estimated retraining workload, with lightweight correction of approximate updates to regain accuracy when full retraining is infeasible (Li et al., 2024).
  • Bayesian unlearning: Exact and approximate Bayesian unlearning algorithms use posterior modification (reverse KL or adjusted likelihood bounds) to balance forgetting and retention, with robust performance across likelihood models (Nguyen et al., 2020).

Additionally, several works focus on the overparameterized regime, where classical definitions based on loss minimization or interpolation are insufficient; minimum-complexity interpolation, regularized orthogonal gradient perturbations, and relabeling-based alternating optimization are proposed to achieve exact or near-exact unlearning (Yang et al., 2024, Block et al., 28 May 2025).

7. Open Challenges, Future Directions, and Practical Implications

  • Auditability remains a central challenge: Verifiable, cryptographically-audited, algorithmic-level guarantees are necessary to support legal and regulatory compliance for unlearning claims (Thudi et al., 2021).
  • Closing the exact–approximate gap: Model sparsification, improved diagnostics, and hybrid algorithms help reduce but do not eliminate the trade-off between computational efficiency and strict data removal (Jia et al., 2023, Li et al., 2024, Wang et al., 6 Jun 2025).
  • Generalization beyond convex and linear models: Many principled guarantees rely on convexity or overparameterized linear regimes; robust methods for non-convex deep networks and transformers are an ongoing area of investigation (Yang et al., 2024, Block et al., 28 May 2025, Chowdhury et al., 2024).
  • Robustness and fairness: Monitoring fairness-gap and adversarial accuracy is essential to avoid degraded performance in high-stakes or regulated environments (Tran et al., 18 Apr 2025).
  • Practical systems and operational integration: Parameter-efficient fine-tuning, modular architectures, and partitioned sequence training enable scalable deployments, but at the cost of increased offline storage or permutation selection complexity (Yu et al., 2022, Chowdhury et al., 2024, Xia et al., 2024).

Ongoing research is directed at scalable, robust, and auditable exact unlearning, closing the measurement and diagnostic gap in practical settings, and incorporating unlearning guarantees into standard machine learning pipelines and privacy-preserving deployments.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Exact and Approximate Unlearning.