Demographic Parity in Machine Learning

Updated 7 February 2026

Demographic Parity (DP) is a fairness criterion that requires predicted outcomes to be statistically independent of sensitive attributes, ensuring equal treatment across groups.
Methodologies to achieve DP include constraint-based projections, optimal transport post-processing, and regularization-based techniques that adjust predictions to match group distributions.
Practical challenges of DP enforcement involve a statistical cost in model accuracy, careful choice of metrics for auditing fairness, and trade-offs in real-world decision-making applications.

Demographic Parity (DP) is a foundational group fairness criterion in machine learning, formalizing the requirement that algorithmic decisions or predictions be statistically independent of protected group membership. DP enforces that each protected group experiences equivalent rates of favorable outcomes, irrespective of underlying differences in feature distributions, label distributions, or historical bias. The criterion has shaped both theoretical research and practical pipelines for bias mitigation across automated decision-making, and is relevant to settings ranging from classic classification to regression, ranking, resource allocation, and influence maximization.

1. Formal Definitions and Mathematical Formulation

DP requires that the distribution of predicted outcomes be identical across all values of the sensitive attribute. In its most general form, for a joint random variable (Y, S, X)—where Y is the predicted label, S is the sensitive attribute (possibly vector-valued or multi-categorical), and X denotes non-sensitive features—demographic parity holds if for all y and all s:

$P(Y = y \mid S = s) = P(Y = y)$

This is equivalently stated as

$Y \perp S$

In the case of binary classification and binary sensitive attribute S∈{0,1}, the constraint simplifies to

$P(\hat Y=1 \mid S=0) = P(\hat Y=1 \mid S=1)$

For regression or general real-valued outputs, the constraint is full distributional parity:

$\mathcal{L}(f(X, S) \mid S=s) = \mathcal{L}(f(X, S) \mid S=s'), \quad \forall s,s'$

Marginal equality can also be enforced in moments, e.g., mean parity (weaker notion):

$\mathbb{E}[f(X, S)\mid S=s] = \mathbb{E}[f(X, S)] \quad \forall s$

The multiclass extension requires that, for all classes k,

$\max_k |P(\hat Y = k \mid S = s) - P(\hat Y = k \mid S = s')| \leq \rho$

for some fairness tolerance ρ, with ρ = 0 corresponding to strict DP (Say et al., 24 Nov 2025).

2. Methodological Frameworks for Achieving DP

a. Constraint-based Information Projections and Data Synthesis

The PUR methodology (Loukas et al., 2023) enforces DP by deriving minimally perturbed dataset distributions through information projection. Given empirical data f(y, s, x), the goal is to construct a new joint p(y, s, x) satisfying:

Parity (P): ∑ₓ p(y, s, x) = f(y)·f(s)
Utility (U): ∑_s p(y, s, x) = f(y, x)
Realism (R): ∑_y p(y, s, x) = f(s, x)

with p chosen as the unique minimizer of $D_{KL}(p \Vert f)$ . Iterative proportional fitting guarantees existence and uniqueness, and synthetic data from this projection can be used for model training, ensuring that downstream classifiers cannot “learn” group-dependent decision rates.

b. Post-processing and Optimal Transport Barycenters

For regression and continuous outcomes, optimal transport (barycenter) approaches construct post-processing maps that transform group-conditional prediction distributions into a common distribution, achieving DP (Fukuchi, 16 Jun 2025, Feng et al., 15 Jan 2026, Divol et al., 2024). In the linear Gaussian setting, closed-form rescaling and shifting of coefficients align all group distributions at the Wasserstein barycenter (Tierny et al., 14 Nov 2025, Chzhen et al., 2020).

c. In-processing and Regularization-based Techniques

Standard in-processing DP enforcement augments the empirical risk objective with a penalty, e.g., Difference-of-DP (DDP):

$\min_{\theta} L(\theta) + \lambda \cdot | P_\theta(\hat Y=1 \mid S=0) - P_\theta(\hat Y=1 \mid S=1) |$

with λ trading off accuracy and DP violation. Information-theoretic penalties (e.g., mutual information, MMD, maximal correlation) are used in lieu of or in addition to the direct DDP gap (Lei et al., 2024).

Distributionally robust optimization (DRO) methods, such as sensitive-attribute-based DRO (SA-DRO), perturb the group marginal within an f-divergence ball to prevent models from exploiting majority-minority skews in sensitive attributes (Lei et al., 2024).

d. Proxy-based Convex Relaxations

In individualized treatment rule settings, non-convex DP constraints (equality in treatment assignment rates) are approximated via linear or nonlinear proxies such as covariance or rank-based statistics, leading to tractable quadratic programs with theoretical guarantees (Cui et al., 28 Apr 2025).

e. Differential Privacy and DP

DP can be jointly enforced with differential privacy (e.g., via the DP2DP algorithm (Say et al., 24 Nov 2025)), using private estimates of conditional class probabilities and careful noise calibration. The joint privacy–fairness pipeline achieves statistical rates for DP violation matching non-private methods up to logarithmic factors.

3. Statistical and Computational Implications

Enforcing DP, especially under strong distributional constraints, can introduce a nontrivial statistical price. In linear regression with group-specific means and covariances, the minimax excess risk for DP-constrained estimators increases proportionally to the product of feature dimension d and number of groups M: $\Theta(dM/n)$ (Fukuchi et al., 2022). Strong group heterogeneity or finer group partitioning significantly escalates the difficulty of achieving low excess risk under DP.

In influence maximization over networks, strict DP constraints may render well-performing solutions infeasible or computationally intractable. However, relaxed (approximate) DP via randomization or bi-criteria approaches restores practical feasibility and allows favorable trade-offs between spread and fairness (Becker et al., 2023).

In federated or distributed settings, naive DP penalization may bias decision boundaries toward the majority sensitive group in system-heterogeneous data; robust strategies (e.g., SA-DRO) mitigate this collapse, especially for local-minority parties (Lei et al., 2024).

4. Metrics and Inference for DP Violation

Scalar metrics such as the “ΔDP gap” (absolute or relative difference in favorable outcome rates between groups) are commonly used in auditing:

$\Delta DP = |P(\hat Y=1 \mid S=0) - P(\hat Y=1 \mid S=1)|$

However, such summary statistics are necessary but not sufficient for full DP: different score distributions (e.g., multimodal shifts) may yield identical means or rates (Han et al., 2023). Distributional metrics such as the Area Between Probability density function Curves (ABPC) and Area Between Cumulative density function Curves (ABCC) provide sufficient (zero-iff-DP) and threshold-invariant measures, directly quantifying distance between group conditional predictive distributions via total variation or integrated CDF differences.

For multiclass settings, the “unfairness” metric is

$Y \perp S$ 0

As for inference, min-over-max estimators and their smoothed surrogates support asymptotic distributional theory, enable construction of valid confidence intervals, and permit hypothesis-testing for regulatory DP thresholds (e.g., US EEOC’s 0.8-rule) or between-group comparisons in A/B testing (2207.13797).

Table: Comparison of DP Metrics

Metric Type	Definition	DP Sufficiency
ΔDP (mean/rate diff)		P(ŷ=1
ABPC	∫₀¹	f₀(x) - f₁(x)
ABCC	∫₀¹	F₀(x) - F₁(x)
Wasserstein Distance	W₂( P(ŷ	S=0), P(ŷ

DP is recognized as the simplest statistical-independence group fairness definition but has limitations both mathematically and in regulatory interpretation. In the EU legal framework, DP corresponds to “formal equality”—detecting any disparate impact irrespective of lawful justification. More nuanced constructs, such as Conditional Demographic Disparity (CDD), implement contextual parity by conditioning on legitimate risk or merit strata (Koumeri et al., 2023). Strict pursuit of DP may, in certain long-run dynamic settings, even harm “social equality,” reducing all groups to a lower equilibrium by suppressing high-performing groups or overcompensating disadvantaged ones (Mozannar et al., 2018).

DP does not ensure equal treatment “in the process,” only equal “outcome” rates; explanation-based criteria (e.g., invariance in Shapley-value feature attributions) offer strictly stronger, treatment-level notions of fairness (Mougan et al., 2023).

6. Inductive Bias, Limitations, and Future Methodologies

DP-based learning induces a strong inductive bias: in imbalanced datasets, standard DP penalization pulls predictive distributions toward the majority group, leading to “majority-pull” phenomena and potential harm to minorities in federated or distributed settings (Lei et al., 2024). Robust (e.g., DRO) regularization corrects for this effect.

Link prediction, ranking, and other structured prediction contexts expose particular drawbacks of dyadic DP (e.g., masking of intra-group disparities, insensitivity to position/rank). Enhanced metrics such as rank-weighted normalized KL divergence and disaggregated exposure-aware algorithms have been proposed (Mattos et al., 9 Nov 2025).

DP enforcement interacts with model class and estimation assumptions. Post-processing is widely applicable and preserves model-agnosticism, but may incur statistical price proportional to the difficulty of estimating optimal transport or barycenter maps (Fukuchi, 16 Jun 2025). In high-dimensional or rich function classes, the fairness constraint may or may not dominate the achievable error rate, depending on the specifics of distributional shift, group sample sizes, and feature configuration.

Future research directions include extending barycentric and OT-based frameworks to other fairness notions (e.g., equalized odds, calibration), scalable methods for individual and multi-group intersectional fairness, DP-compatible privacy–fairness integration, and interpretability-centric fairness auditing.

7. Empirical Results and Practical Outcomes

Extensive empirical work demonstrates that theoretically grounded DP enforcement methods (information projection, optimal transport mapping, adversarial debiasing) succeed in removing both explicit and indirect group bias in standard datasets (Adult, COMPAS, Law School, census records, etc.) with marginal or controlled cost in predictive utility (Loukas et al., 2023, Han et al., 2023, Fukuchi, 16 Jun 2025, Fukuchi et al., 2022). Purely penalty-based DP enforcement can underperform or introduce subtle artifacts if not paired with distributionally robust estimation and auditing with sufficient (distribution-level) metrics.

DP-based pipelines are now available in open-source toolkits for policy, credit, health, hiring, and beyond; nonetheless, careful matching of fairness metric, model class, and auditing protocol remains essential for reliable and transparent real-world deployment.