Exact and Approximate Unlearning

Updated 20 February 2026

Exact and approximate unlearning is a set of methods enabling ML models to forget selected training data for privacy compliance.
Exact unlearning guarantees complete removal by matching retrained outputs, while approximate unlearning minimizes residual influence within a bounded error.
Techniques like full retraining, sharding, influence functions, and gradient rollbacks are used to balance computational cost, auditability, and model integrity.

Machine unlearning comprises a suite of algorithmic techniques that enable a machine learning model to “forget” the influence of specified training data—typically in response to data deletion or privacy requests—such that the resulting model behaves as if the forgotten data had never been used during training. Methods are generally classified into exact unlearning, which guarantees functional or distributional equivalence to retraining the model on the retained data, and approximate unlearning, which aims to efficiently reduce or minimize the residual data influence—typically evaluated under some metric of closeness to retraining but without a provable guarantee of erasure. This distinction has foundational implications for privacy, auditability, computational cost, and the validity of unlearning claims in legal or regulatory regimes.

1. Formal Definitions and Foundational Distinctions

Let $\mathcal{D}$ be a training dataset, $\mathcal{D}_r$ the retained subset, and $\mathcal{D}_f$ the subset to be forgotten. Let $A$ denote the complete machine learning procedure, including model architecture, optimizer, hyperparameters, and random seeds. The model parameters after training on $\mathcal{D}$ are $\theta_{\mathcal{D}} = A(\mathcal{D})$ , and after retraining on $\mathcal{D}_r$ , denoted $\theta_{\mathcal{D}_r} = A(\mathcal{D}_r)$ .

Exact unlearning requires that the unlearning mechanism $U$ produces a model $\theta^-$ such that

$\mathcal{D}_r$ 0

meaning that the unlearned model is identically distributed (and, for deterministic $\mathcal{D}_r$ 1, identical in parameters and outputs) to what would have been obtained had $\mathcal{D}_r$ 2 never participated in training. This guarantee applies to all test inputs and output statistics (Thudi et al., 2021, Tran et al., 18 Apr 2025, Xu et al., 2023, Yang et al., 2024).

Approximate unlearning relaxes this requirement, demanding only that the unlearning process yields a model $\mathcal{D}_r$ 3 close to $\mathcal{D}_r$ 4 under a prespecified metric $\mathcal{D}_r$ 5,

$\mathcal{D}_r$ 6

where $\mathcal{D}_r$ 7 may denote $\mathcal{D}_r$ 8 parameter distance or output-space divergences such as KL-divergence,

$\mathcal{D}_r$ 9

(Thudi et al., 2021, Tran et al., 18 Apr 2025, Xu et al., 2023). The choice of $\mathcal{D}_f$ 0 reflects an empirical or analytic upper bound on residual influence.

A key critique is that, particularly for deep learning under stochastic optimization, there may exist disjoint datasets yielding identical models, rendering the standard $\mathcal{D}_f$ 1-closeness definition vacuous—one could claim to have unlearned without modifying the model at all (Thudi et al., 2021).

2. Algorithmic Paradigms for Exact and Approximate Unlearning

Exact Unlearning

Full retraining: Discard the original model and train a new model on $\mathcal{D}_f$ 2 from scratch with the same random seed and setup. Guarantees complete removal of the forgotten data's influence, but is often computationally prohibitive for large models or frequent deletions (Thudi et al., 2021, Tran et al., 18 Apr 2025, Xu et al., 2023).
Sharding and isolation (SISA): Partition data into disjoint shards and sequential slices, train independently, and store intermediate model checkpoints. Upon deletion, only the affected shard and subsequent slices are retrained; the rest of the model remains untouched. This approach achieves exact unlearning while amortizing retraining cost, at the expense of increased storage and aggregation overhead (Xu et al., 2023, Yang et al., 2024, Chowdhury et al., 2024).
Parameter isolation and modular architectures: Approaches such as LegoNet (fixed-encoder, multiple adapters) or S3T (sequence-aware, sharded, and sliced fine-tuning) enable efficient exact unlearning by retraining or removing only the model components uniquely affected by the deleted data (Yu et al., 2022, Chowdhury et al., 2024).
Algorithmic stability and total-variation (TV) stability: Algorithms designed with intrinsic strong stability to deletions, e.g., TV-stable noisy-SGD, enable Las Vegas–style unlearning with rigorous risk bounds and only occasional retraining, proportional to the TV-stability parameter (Ullah et al., 2021).

Approximate Unlearning

Influence function updates: Use first-order or Newton approximations, typically involving the Hessian or Fisher matrix, to estimate and subtract the forgotten point’s influence on the learned parameter (Xu et al., 2023, Tran et al., 18 Apr 2025, Li et al., 2024). The canonical influence update is:

$\mathcal{D}_f$ 3

Gradient-based rollbacks and fine-tuning: Modify the trained model by performing a small number of gradient steps (often with modified or random labels) on the retained data, the forgotten data, or their union (e.g., catastrophic forgetting, boundary-shrink, knowledge distillation) (Tran et al., 18 Apr 2025, Xu et al., 2023).
Scrubbing and functional reoptimization: Iteratively minimize an objective balancing residual memory of the forgotten data and performance on the retained set, potentially through KL-based scrubbing, Fisher projection, or parameter regularization terms (Xu et al., 2023, Li et al., 2024).
Model sparsification: Empirically and theoretically, sparsity (through pruning) improves the effectiveness of approximate unlearning by reducing the approximation gap and localizing data influence to a small parameter subspace (Jia et al., 2023).

A summary of canonical exact and approximate strategies is provided in the following table:

Method	Paradigm	Exactness Guarantee
Full retraining	Exact	Yes, parameter/output equivalence
SISA	Exact	Yes, if retrain schedule is followed
Influence functions	Approximate	No, error bounded by Taylor expansion
Gradient rollbacks	Approximate	No, empirical $\mathcal{D}_f$ 4/output bound
Modular architectures	Exact (LegoNet, S3T)	Yes, within component granularity
TV-stable SGD	Exact (w.p. $\mathcal{D}_f$ 5)	Yes, with bounded risk, randomized

3. Theoretical Limitations and Auditability

A central finding is that unlearning definitions or claims grounded solely in closeness of model parameters, outputs, or even training trajectories are fundamentally non-auditable due to forgeability (Thudi et al., 2021). Specifically:

For SGD and mean-sampler algorithms, it is possible (with high probability as batch size increases) to construct two disjoint datasets yielding the same final weights. Thus,

$\mathcal{D}_f$ 6

Attempts to verify unlearning through "proof-of-unlearning" logs or model distance are defeated by the possibility of forging logs via alternative batch traces that exclude the forgotten data but yield indistinguishable models (Thudi et al., 2021).
Only unlearning claims made at the algorithmic level—i.e., assertions and evidence that a specific, externally auditable unlearning procedure was executed without access to $\mathcal{D}_f$ 7—are meaningful. Achieving this may require verifiable computation primitives, cryptographic audit logs, or trusted execution environments (Thudi et al., 2021).

As a consequence, approximate unlearning based solely on $\mathcal{D}_f$ 8-closeness is mathematically vacuous in the presence of forging; exact unlearning claims cannot be audited from models or logs, but only from process-level attestations.

4. Measurement, Diagnostics, and Evaluation Metrics

The evaluation of unlearning effectiveness, particularly for approximate methods, is nontrivial:

Verification error ( $\mathcal{D}_f$ 9 gap): The norm difference between the unlearned model and an exactly retrained model on $A$ 0 is a primary metric subsuming others (Thudi et al., 2021). However, this cannot be computed without retraining.
Membership inference attack (MIA) resistance: Ability of an attacker to infer the presence or absence of the forgotten data from the model's outputs or gradients. While widely used, MIAs are fundamentally binary and computationally costly, and fail to capture the continuum of unlearning completeness in approximate methods (Wang et al., 6 Jun 2025).
Interpolated Approximate Measurement (IAM): A recent framework that quantifies sample-level unlearning completeness by interpolating the generalization–fitting gap between models, yielding continuous scores for under- and over-unlearning. IAM achieves strong performance under both exact and approximate settings and is robust to data/model shifts (Wang et al., 6 Jun 2025).

Advanced diagnostics are critical because empirical studies reveal that approximate unlearners are prone to both under-unlearning (residual influence remains) and over-unlearning (unintentional loss of information about retained data) (Wang et al., 6 Jun 2025, Tran et al., 18 Apr 2025). IAM and other continuous scoring systems are advocated for systematic measurement and safeguard deployment.

5. Fairness, Robustness, and Structural Guarantees

Unlearning algorithms affect not only privacy and data influence, but also fairness (class-wise feature variance) and adversarial robustness:

Fairness-gap metric: Defined as the maximal difference in class-wise feature variance at a specified layer,

$A$ 1

where higher $A$ 2 indicates greater model sensitivity and fairness violations (Tran et al., 18 Apr 2025).

Empirical findings: Methods that track or preserve the original fairness-gap are more robust to adversarial attacks; approximate unlearning methods often inflate the fairness-gap, increasing vulnerability (Tran et al., 18 Apr 2025).
Layer-wise unlearning: Focusing unlearning updates on intermediate and final layers can efficiently restore original fairness and robustness with sub-linear computational/memory cost (Tran et al., 18 Apr 2025).
These findings recommend explicit joint monitoring of accuracy, fairness-gap, and adversarial robustness when evaluating unlearning algorithms for both privacy and trustworthiness.

6. Specialized and Hybrid Frameworks

Emerging developments address domain- and resource-specific requirements for unlearning:

Model merging for scalable exact unlearning: SIFT-Masks achieves exact unlearning at scale (hundreds of tasks) via sign-constrained fine-tuning and task-local masks, enabling $A$ 3 unlearning per deletion while recovering most task performance (Kuo et al., 6 Apr 2025).
Resource-constrained (edge) exact unlearning: Approaches such as CAUSE combine sharding, adaptive pruning, and memory-efficient replacement to implement exact unlearning with orders-of-magnitude reductions in speed, energy, and storage requirements for devices with limited compute capacity (Xia et al., 2024).
Hybrid strategies: Adaptive frameworks dynamically select between exact and approximate unlearning modes based on estimated retraining workload, with lightweight correction of approximate updates to regain accuracy when full retraining is infeasible (Li et al., 2024).
Bayesian unlearning: Exact and approximate Bayesian unlearning algorithms use posterior modification (reverse KL or adjusted likelihood bounds) to balance forgetting and retention, with robust performance across likelihood models (Nguyen et al., 2020).

Additionally, several works focus on the overparameterized regime, where classical definitions based on loss minimization or interpolation are insufficient; minimum-complexity interpolation, regularized orthogonal gradient perturbations, and relabeling-based alternating optimization are proposed to achieve exact or near-exact unlearning (Yang et al., 2024, Block et al., 28 May 2025).

7. Open Challenges, Future Directions, and Practical Implications

Auditability remains a central challenge: Verifiable, cryptographically-audited, algorithmic-level guarantees are necessary to support legal and regulatory compliance for unlearning claims (Thudi et al., 2021).
Closing the exact–approximate gap: Model sparsification, improved diagnostics, and hybrid algorithms help reduce but do not eliminate the trade-off between computational efficiency and strict data removal (Jia et al., 2023, Li et al., 2024, Wang et al., 6 Jun 2025).
Generalization beyond convex and linear models: Many principled guarantees rely on convexity or overparameterized linear regimes; robust methods for non-convex deep networks and transformers are an ongoing area of investigation (Yang et al., 2024, Block et al., 28 May 2025, Chowdhury et al., 2024).
Robustness and fairness: Monitoring fairness-gap and adversarial accuracy is essential to avoid degraded performance in high-stakes or regulated environments (Tran et al., 18 Apr 2025).
Practical systems and operational integration: Parameter-efficient fine-tuning, modular architectures, and partitioned sequence training enable scalable deployments, but at the cost of increased offline storage or permutation selection complexity (Yu et al., 2022, Chowdhury et al., 2024, Xia et al., 2024).

Ongoing research is directed at scalable, robust, and auditable exact unlearning, closing the measurement and diagnostic gap in practical settings, and incorporating unlearning guarantees into standard machine learning pipelines and privacy-preserving deployments.

References:

(Thudi et al., 2021) On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning
(Tran et al., 18 Apr 2025) Fairness and Robustness in Machine Unlearning
(Kuo et al., 6 Apr 2025) Exact Unlearning of Finetuning Data via Model Merging at Scale
(Wang et al., 6 Jun 2025) Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness
(Yang et al., 2024) MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes
(Xia et al., 2024) Edge Unlearning is Not "on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices
(Yu et al., 2022) LegoNet: A Fast and Exact Unlearning Architecture
(Li et al., 2024) A hybrid framework for effective and efficient machine unlearning
(Mahmud et al., 18 Apr 2025) DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
(Nguyen et al., 2020) Variational Bayesian Unlearning
(Block et al., 28 May 2025) Machine Unlearning under Overparameterization
(Jia et al., 2023) Model Sparsity Can Simplify Machine Unlearning
(Xu et al., 2023) Machine Unlearning: Solutions and Challenges
(Ullah et al., 2021) Machine Unlearning via Algorithmic Stability
(Thudi et al., 2021) Unrolling SGD: Understanding Factors Influencing Machine Unlearning
(Chowdhury et al., 2024) Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning