ε-Label Differential Privacy

Updated 3 February 2026

ε-Label Differential Privacy is a variant that privatizes only dataset labels while keeping features public, ensuring targeted privacy in supervised learning.
It employs mechanisms like randomized response, clustering, and additive noise to balance statistical efficiency with strong privacy guarantees.
The approach is applied in synthetic data generation, medical inference, and privacy-preserving advertising, offering practical benefits for model training and regularization.

ε-Label Differential Privacy ( $\epsilon$ -label DP) is a specialized variant of differential privacy designed for settings where only the labels in a dataset are considered sensitive, while features are assumed public or non-sensitive. This framework offers provable privacy guarantees tailored for supervised learning, synthetic data generation, and privacy-aware analytics, enabling strong label privacy with much improved statistical efficiency compared to full differential privacy. The formalism, mechanisms, and consequences of $\epsilon$ -label DP have been the subject of intensive research in both theoretical and applied machine learning.

1. Formal Definition and Core Principles

The central formalism of $\epsilon$ -label differential privacy is as follows. Let $X$ denote the (public) feature space and $Y$ the set of private labels. For datasets $D = (x_1, y_1),\ldots,(x_n, y_n)$ and $D'$ that differ only in the label of a single record, a randomized algorithm $\mathcal{A}$ is said to satisfy $\epsilon$ -label DP if, for all measurable $S$ in the algorithm's output space,

$\Pr[\mathcal{A}(D) \in S] \leq e^\epsilon \Pr[\mathcal{A}(D') \in S].$

This is a restriction of standard $(\epsilon,0)$ -differential privacy to label-neighboring datasets, i.e., only changes to the label of a single instance are considered. In the local model (“ $\epsilon$ -label LDP”), the requirement is that, for all $x$ and all $y,y' \in Y$ ,

$\Pr[M(x,y) \in S] \leq e^\epsilon \Pr[M(x, y') \in S].$

This strictly relaxes full DP, with the operational distinction that only the label is randomized, not the feature.

Key properties include:

Post-processing invariance: Any computation on the privatized output retains the $\epsilon$ -label DP guarantee.
Composability: Sequential application of two $\epsilon$ -label DP mechanisms with budgets $\epsilon_1$ and $\epsilon_2$ results in overall $(\epsilon_1+\epsilon_2)$ -label DP.

This formulation is adopted and analyzed in (Beimel et al., 2014, Wu et al., 2022, 2405.15150, Malek et al., 2021), among other works.

2. Mechanisms for Achieving ε-Label Differential Privacy

A diverse range of mechanisms has been developed to achieve $\epsilon$ -label DP, each with tradeoffs in utility, computational cost, and privacy semantics.

2.1 Randomized Response and Variants

The classical randomized response (RR) mechanism, when applied to labels, outputs the true label with probability $e^\epsilon/(e^\epsilon+K-1)$ and any other label uniformly at random. This achieves sharp $\epsilon$ -label DP for categorical labels (Ghazi et al., 2021).

Enhancements include:

RR with Prior (Ghazi et al., 2021): Incorporates a prior distribution over labels to maximize the probability of retaining the correct label, subject to maintaining $\epsilon$ -label DP, via a top- $k$ truncation.
Two-Stage or Multi-Stage Procedures: Initial learning with privatized labels is used to estimate improved priors, which then feed into a second stage of RR-based privatization for further utility gain. Empirical evidence shows up to 20% improvement in accuracy for vision benchmarks with two-stage procedures versus uniform RR (Ghazi et al., 2021).

2.2 Clustering-Based and Vector Approximation Mechanisms

Label DP clustering methods first partition instance features (which are non-private) and then resample or perturb labels within each cluster (Esfandiari et al., 2021). When clusters are large and homogeneous, privacy-utility tradeoff improves dramatically; small or inhomogeneous clusters limit privacy or degrade accuracy.

Vector approximation methods privatize $K$ -class labels as binary vectors in $\{0,1\}^K$ instead of scalar labels, maintaining higher label signal and improving high-class-count learning under $\epsilon$ -label DP (2405.15150).

2.3 Additive Noise and Aggregation

ALIBI mechanism (Malek et al., 2021): Adds Laplace noise to each coordinate of the one-hot label vector. Bayesian inference is then used iteratively to denoise during model learning, achieving pure $\epsilon$ -label DP.
Aggregation mechanisms (Brahmbhatt et al., 2023): Randomly weighted aggregation of features and labels into bag-level statistics, optionally combined with additive noise, ensures $(\epsilon,\delta)$ -label DP for regression while avoiding explicit label perturbation for each record.

2.4 Regression-Oriented: RR on Bins, RPWithPrior, Optimal Unbiased Randomizers

RR-On-Bins (Ghazi et al.) (Ghazi et al., 2022): Discretizes labels into bins, applies RR over bins; the number and placement of bins is optimized for given loss and prior.
RPWithPrior (Liu et al., 30 Jan 2026): Introduces the first non-additive, continuous RR mechanism for regression, focusing probability mass around the true label. Outperforms discretization-based RR methods.
Optimal Unbiased Randomizers (Badanidiyuru et al., 2023): Via a small linear program with unbiasedness constraints, constructs label randomization mechanisms that minimize variance and allow unbiased training of regression learners with guaranteed privacy.

Summary Table: Key Label DP Mechanisms | Mechanism Type | Privacy Guarantee | Output/Noise | |-----------------------------|-----------------------------|-----------------------------| | Randomized Response (RR) | Pure $\epsilon$ -label DP | Label flipping (categorical)| | RR with Prior | Pure $\epsilon$ -label DP | Flip using learned prior | | Vector Approximation | Pure $\epsilon$ -label (local DP) | Binary vector output | | ALIBI | Pure $\epsilon$ -label DP | Laplace noise on one-hot | | Clustering-based resampling | Pure/approx. $\epsilon$ -label DP | Label resampled in cluster| | RPWithPrior | Pure $\epsilon$ -label DP (regression) | Interval-based continuous randomization| | RR-On-Bins | Pure $\epsilon$ -label DP (regression) | Discretized bins | | Weighted Aggregation | $(\epsilon,\delta)$ -label DP | Bag aggregates (regression)|

3. Theoretical Properties, Generalization Bounds, and Sample Complexity

$\epsilon$ -label DP yields unique theoretical guarantees, including:

Sample Complexity: For PAC learning, label DP retains the optimal $O(\mathrm{VC}(C))$ sample complexity—the same order as non-private learning—unlike full DP, which may require exponentially more samples for complex tasks (see (Beimel et al., 2014)).
Risk Bounds: For regression and classification, minimax lower and upper bounds are established (Zhao et al., 20 Feb 2025). Under local $\epsilon$ -label DP, excess risk rates are polynomially faster than under full DP. For classification,

$\Omega\!\Bigl\{[n(\epsilon^2\wedge1)]^{-\frac{\beta(\gamma+1)}{2\beta+d}}\Bigr\}$

for $n$ samples, $\beta$ Hölder smoothness, $d$ feature dim, and margin $\gamma$ . In the central model, the gain is only a constant factor.

Semantic Guarantees: No adversary can gain more than $1 - 2/(1 + e^\epsilon)$ in additional label inference advantage over the Bayes classifier (Wu et al., 2022). At $\epsilon \to 0$ , the model reveals no more label information than the Bayes-optimal predictor using only features.

4. Practical Implementations and Empirical Results

Mechanisms are empirically validated on standardized ML tasks (CIFAR-10, CIFAR-100, MNIST, tabular data). Key findings include:

Regularization Benefits: Label-flipping, as in $\epsilon$ -label DP randomized response, regularizes both discriminative and generative models, improving generalization and sometimes yielding utility superior to the non-private baseline due to label noise-induced regularization (Cunningham et al., 2022).
Regression Accuracy: RPWithPrior achieves the lowest or tied-lowest MSE at all privacy budgets, outperforming Laplace, Gaussian, and RR-on-bins on real datasets (Liu et al., 30 Jan 2026).
Scalability: Aggregation mechanisms (Brahmbhatt et al., 2023) enable privacy-preserving regression without per-record noise, maintaining near-optimal utility.
High-Class Regimes: Vector approximation mechanisms achieve much better utility than RR as the number of classes grows large (2405.15150).

5. Semantics, Auditing, and Limits of Protection

Auditing Label Privacy: Observational auditing (Kalemaj et al., 18 Nov 2025) provides methodology to empirically certify claims of $(\epsilon,\delta)$ -label DP by analyzing outputs of the target model and probing label-inference success rates on synthetic canaries, calibrated against theoretical DP tail bounds.
Attacks and Limitations: Label DP only bounds the adversary's excess in label inference above Bayes-optimal performance (see Section 2.2 above); absolute attack accuracy may remain high if features are predictive. At $\epsilon=0$ , all inference power is dictated by public features alone (Wu et al., 2022).
Regime Selection: For strong protection, $\epsilon$ should be calibrated to the application’s baseline Bayes-risk and the permitted excess leakage, using the explicit formula in Section 2 above.

6. Advanced Topics: Central vs. Local DP, Aggregation, and Open Problems

Central vs Local: Local $\epsilon$ -label DP (LDP) yields dramatic improvements in learning rates versus full LDP; central label DP (CDP) confers only constant-factor gains over full central DP (Zhao et al., 20 Feb 2025).
Aggregation for Regression: Weighted LBA achieves $(\epsilon,\delta)$ -label DP for regression without per-sample label corruption by using random aggregation, enabling highly efficient privacy-preserving training (Brahmbhatt et al., 2023).
Open Problems: Optimizing algorithms for high-dimensional settings, extending to $(\epsilon,\delta)$ -approximate label DP or Rényi DP, empirical tightening of theory–practice gaps in PATE mechanisms, and improved mechanisms for structured (non-scalar) outputs are ongoing research directions.

7. Applications and Impact

$\epsilon$ -label differential privacy is now a foundational approach in privacy-sensitive supervised learning, private synthetic data (notably spatial), privacy-preserving advertising, and secure medical or demographic inference where features are (semi-)public and labels are the primary concern. The flexibility in mechanism design, favorable sample complexity, and regularization effects have facilitated its adoption in both academic and operational privacy-preserving ML. Calibration of $\epsilon$ is a key challenge, informed by Bayes risk and explicit semantic bounds (Wu et al., 2022), and post-hoc empirical auditing is increasingly adopted for real-world deployments (Kalemaj et al., 18 Nov 2025).