DP-EBMs: Private, Explainable Boosting

Updated 20 January 2026

DP-EBMs are interpretable GAMs that blend boosting methods with Gaussian noise injection to enforce (ε,δ)-differential privacy on sensitive tabular data.
They integrate additive models with privacy-preserving techniques like residual clipping and budget partitioning across boosting rounds to balance accuracy and privacy.
DP-EBMs offer both global and local interpretability, allowing experts to visualize and edit feature contributions in real-world applications such as healthcare and finance.

Differentially-Private Explainable Boosting Machines (DP-EBMs) are a class of interpretable statistical learning models that combine the additive transparency of Generalized Additive Models (GAMs) with rigorous (ε,δ)-differential privacy guarantees. DP-EBMs have emerged as state-of-the-art models for learning from sensitive tabular data in settings such as healthcare, finance, and social science, offering both high predictive accuracy and global/local interpretability. These models implement differential privacy by injecting Gaussian noise into carefully chosen algorithmic primitives during boosting-based GAM training. Rigorous analysis demonstrates that DP-EBMs provide strong privacy protection while incurring minimal loss in accuracy and preserving the ability to explain and edit learned models.

1. Model Definition and Architecture

Explainable Boosting Machines are specialized GAMs with the canonical form:

$g\bigl(\eta(x)\bigr) = \alpha + \sum_{j=1}^d f_j(x_j)$

where $\eta(x) = \mathbb{E}[Y | X=x]$ denotes the conditional response, $g(\cdot)$ is a link function (identity for regression, log-odds for binary classification), $\alpha$ is an intercept, and $f_j$ are univariate shape functions over each feature $x_j$ learned via sequential boosting (Nori et al., 2021). Each boosting round selects a feature, computes residuals, and fits a shallow tree (usually a single split, i.e., a decision stump), partitioning the feature into histogram bins and updating $f_j$ accordingly.

By construction, EBMs maintain strict additivity, allowing exact interpretation of both global effects (by plotting $f_j$ ) and local contributions (by evaluating $f_j(x_j)$ per example). Their practical value is underscored in domains requiring both interpretability and robust statistical guarantees on data usage.

2. Incorporating Differential Privacy: Noise Injection Mechanism

DP-EBMs ensure (ε,δ)-differential privacy by perturbing the sufficient statistics used to update each $f_j$ at every boosting iteration with Gaussian noise, following the Gaussian Mechanism (Niu et al., 2022, Nori et al., 2021). At boosting round $t$ and feature $j$ , binned residual sums

$g_{j,\ell} = \sum_{i: x_{i,j} \in \ell} r_i,$

are first subject to clipping $r_i \in [-R, R]$ to tightly bound sensitivity. Then, for each bin ℓ, Gaussian noise

$\widetilde{g}_{j,\ell} = g_{j,\ell} + \mathcal{N}\left(0, \sigma^2\right),$

with

$\sigma^2 = \frac{2R^2 \ln(1.25/\delta')}{\varepsilon'^2}$

is added. The privacy budget (ε,δ) is partitioned evenly across all $T$ boosting rounds, i.e., $\varepsilon' = \varepsilon/T$ , $\delta' = \delta/T$ . The resulting privatized sufficient statistics are then used to update the $f_j$ functions.

The model does not privatize the choice of splitting points; privacy is spent only on the computed values, not on tree structure or bin assignments. Empirically, this approach gives very tight accuracy bounds and avoids compounding privacy loss over combinatorial choices (Nori et al., 2021).

3. DP-EBM Training Algorithm

A high-level version of the differentially-private training procedure is:

Initialization: Set $f_j^{(0)}(\cdot) \equiv 0$ and $\alpha^{(0)}=0$ for all $j$ ; select number of rounds $T$ , clipping bound $R$ , and learning rate $\eta$ .
Privacy Budgeting: Allocate $\varepsilon' = \varepsilon/T$ , $\delta' = \delta/T$ per round.
Boosting Loop ( $t=1$ to $T$ ):
- (a) Compute residuals $r_i^{(t)} = y_i - (\alpha^{(t-1)} + \sum_{j=1}^d f_j^{(t-1)}(x_{i,j}))$ .
- (b) Clip each $r_i$ to $[-R, R]$ .
- (c) Select feature $j_t$ .
- (d) Partition $x_{.,j_t}$ into $L$ histogram bins.
- (e) For each bin ℓ, compute $g_\ell = \sum \bar r_i$ for members, then add Gaussian noise to release $\widetilde g_\ell$ .
- (f) Update the shape function via $\Delta u_\ell = \frac{\widetilde g_\ell}{s_\ell + \lambda}$ (where $s_\ell$ is the bin count).
- (g) Update $f_{j_t}^{(t)}(x) = f_{j_t}^{(t-1)}(x) + \eta \cdot \Delta f_{j_t}^{(t)}(x)$ as appropriate.
- (h) Retain other $f_j$ unchanged if $j \neq j_t$ .
Output: $[\alpha^{(T)}, \{f_j^{(T)}\}]$ .

For classification, residuals are replaced with gradients of the log-loss function. Binning of continuous features is performed with differentially-private quantile binning, consuming a small portion of the total $\varepsilon$ budget (Nori et al., 2021).

4. Privacy Guarantees and Analytical Composition

Each boosting round is an application of the Gaussian Mechanism, achieving $(\varepsilon', \delta')$ -DP per round based on the advanced composition theorem [Dwork–Roth]. Over all $T$ rounds, DP-EBM composes these guarantees to achieve overall $(\varepsilon, \delta)$ -DP (Niu et al., 2022):

$\sigma = \frac{R\sqrt{2 \ln(1.25 \, T/\delta)}}{\varepsilon}$

is sufficient for all updates, ensuring the desired global guarantee.

More refined analysis employs Gaussian Differential Privacy (GDP), leveraging tighter composition bounds [Dong et al. ’19]. Under this framework, each iteration corresponds to $\mu$ -GDP, and the cumulative effect is expressed as

$\mu_{\text{total}} = \sqrt{E K} / \sigma,$

where $E$ is the number of epochs and $K$ is the number of features (Nori et al., 2021). GDP-to- $(\varepsilon, \delta)$ translation is then used for final accounting.

No further data access or post-hoc fitting is performed after the noisy sums, satisfying post-processing invariance.

5. Interpretability and Post-Training Model Editing

DP-EBMs provide exact global and local interpretability, as each $f_j(x_j)$ can be directly plotted and interrogated. This transparency is preserved regardless of the injected noise (Nori et al., 2021). The post-processing property of DP allows for post-training editing—such as monotonicity enforcement via isotonic regression or manual smoothing of noisy shape functions—without any additional privacy cost. This is relevant for applications where expert correction of spurious artifacts (e.g., non-monotonicities induced by noise) is necessary prior to model deployment.

6. Empirical Evaluation and Statistical Trade-offs

Empirical results demonstrate that DP-EBMs maintain high accuracy even under strong privacy regimes, significantly outperforming prior art such as DPBoost and private linear/logistic regression on standard tabular datasets (Nori et al., 2021). Illustrative results on the Adult Income dataset (AUROC):

ε	DPBoost	DP Logistic	DP-EBM (classic)	DP-EBM (GDP)	Non-private EBM
0.5	0.558	0.488	0.873 ± 0.007	0.875 ± 0.005	0.923 ± 0.003
1.0	0.566	0.471	0.880 ± 0.006	0.883 ± 0.005	0.923 ± 0.003
4.0	0.734	0.549	0.889 ± 0.004	0.889 ± 0.004	0.923 ± 0.003

For regression, DP-EBMs remain close to non-private RMSE and outperform alternatives at all privacy levels.

On mean squared error (MSE) decomposition for doubly-robust CATE estimation, variance increases markedly as $\varepsilon \to 0$ (≈10×), with bias increasing only modestly (≈2×) (Niu et al., 2022). Representative table on a voting-turnout dataset ( $n=16,000$ ):

ε	MSE	Bias²	Var
16	0.018	0.012	0.006
8	0.022	0.014	0.008
4	0.035	0.017	0.018
2	0.064	0.020	0.044
1	0.120	0.025	0.095

This suggests that most privacy-induced accuracy loss arises from increased output variance rather than estimator bias.

Interpretability degrades with stronger privacy (small ε) as learned $f_j$ become less smooth and more "jumpy", but the directional, qualitative ordering of feature effects is generally maintained (Niu et al., 2022).

7. Practical Applications and Implementation

DP-EBMs have been deployed in domains where both privacy and interpretability are paramount, including healthcare, criminal justice, and finance (Nori et al., 2021). The methodology is implemented and distributed in the open-source InterpretML Python library, with automated handling of privacy budget allocation, per-round clipping, Gaussian noise addition, and composition accounting.

No statistical trade-off is incurred for post-hoc editing, ensuring that expert-in-the-loop corrections for domain alignment do not degrade privacy guarantees. Models remain fully auditable and explainable after privatized training, with no additional data access required.

References:

"Differentially Private Estimation of Heterogeneous Causal Effects" (Niu et al., 2022)
"Accuracy, Interpretability, and Differential Privacy via Explainable Boosting" (Nori et al., 2021)

Markdown Report Issue Upgrade to Chat

References (2)

Accuracy, Interpretability, and Differential Privacy via Explainable Boosting (2021)

Differentially Private Estimation of Heterogeneous Causal Effects (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Differentially-Private EBMs (DP-EBMs).

DP-EBMs: Private, Explainable Boosting

1. Model Definition and Architecture

2. Incorporating Differential Privacy: Noise Injection Mechanism

3. DP-EBM Training Algorithm

4. Privacy Guarantees and Analytical Composition

5. Interpretability and Post-Training Model Editing

6. Empirical Evaluation and Statistical Trade-offs

7. Practical Applications and Implementation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DP-EBMs: Private, Explainable Boosting

1. Model Definition and Architecture

2. Incorporating Differential Privacy: Noise Injection Mechanism

3. DP-EBM Training Algorithm

4. Privacy Guarantees and Analytical Composition

5. Interpretability and Post-Training Model Editing

6. Empirical Evaluation and Statistical Trade-offs

7. Practical Applications and Implementation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research