FairRF: Multi-Objective Bias Mitigation

Updated 19 January 2026

FairRF is a multi-objective evolutionary framework that mitigates bias in ML classification by exploring trade-offs between fairness and predictive accuracy.
It optimizes Random Forest hyperparameters and employs data mutation strategies within an NSGA-II search to generate a Pareto front of model configurations.
Evaluated across diverse datasets and fairness scenarios, FairRF offers transparent insights for selecting models tailored to domain-specific fairness and effectiveness requirements.

FairRF is a multi-objective evolutionary approach for bias mitigation in machine learning classification tasks, designed to enable stakeholders to systematically explore the trade-off between software fairness and predictive effectiveness. FairRF employs a Random Forest base classifier, coupled with evolutionary hyperparameter and data mutation optimization, to generate a Pareto front of model configurations – each representing a distinct balance between fairness and accuracy. This framework supports both single-group and intersectional fairness definitions, addressing critical requirements in sensitive domains such as finance, healthcare, and criminal justice (d'Alosio et al., 12 Jan 2026).

1. Motivation and Problem Setting

The proliferation of AI- and ML-based decision systems in high-stakes applications has intensified the demand for models that are both effective and fair with respect to sensitive attributes (e.g., race, sex, age). Existing bias mitigation methods are frequently black-box solutions offering a single, fixed trade-off between fairness (commonly measured by Statistical Parity Difference, SPD) and standard effectiveness metrics (e.g., accuracy), thus lacking transparency and flexibility for downstream decision-makers. Stakeholders may have context-dependent priorities; some domains require maximizing predictive correctness, while others necessitate minimizing disparate impact. FairRF directly addresses this need by recasting bias mitigation as multi-objective search, returning a Pareto front of optimal trade-offs for downstream selection (d'Alosio et al., 12 Jan 2026).

2. Methodological Core: Model and Search Space

The principal modeling component in FairRF is a Random Forest (RF)—specifically, scikit-learn's implementation—trained on tabular data with one or more binary sensitive features and a binary target. The hyperparameter search space comprises:

$n\_estimators \in \{10, 20, 50, 80, 100, 150, 200\}$
$criterion \in \{\text{gini}, \text{entropy}, \text{log\_loss}\}$
$max\_depth \in \{\text{None}, 10, 15, 20, 30, 40, 50\}$
$min\_samples\_split \in \{2, 3, 4\}$
$max\_features \in \{\text{sqrt}, \text{log2}, \text{None}\}$

A distinctive feature of FairRF is the “data mutation value” $m \in \{0.1, 0.2, ..., 1.0\}$ , which dictates the fraction of sensitive-feature bits randomly flipped during training (i.e., $x \leftarrow 1-x$ with probability $m$ ). This procedure augments the representation of privileged and unprivileged groups, aiming to improve the robustness of fairness optimization. No mutation is applied to the validation or test data. The search space for each individual in the evolutionary algorithm thus includes both these RF hyperparameters and the data mutation parameter (d'Alosio et al., 12 Jan 2026).

3. Multi-Objective Formulation and Fairness Metrics

FairRF operates in the context of bi-objective optimization. For a configuration $\theta$ (RF hyperparameters and $m$ ):

Effectiveness: $\displaystyle \max\ \text{Accuracy}(\theta) = \frac{TP+TN}{TP+TN+FP+FN}$
Fairness: $\displaystyle \min\ \Delta_{SPD}(\theta) = \left| \Pr(\hat y=1 \mid A=0) - \Pr(\hat y=1 \mid A=1) \right|$

Secondary, post-hoc fairness metrics include:

Equal Opportunity Difference (EOD): $\Delta_{EOD} = |TPR_{A=0} - TPR_{A=1}|$ , $TPR = \frac{TP}{TP+FN}$
Average Odds Difference (AOD): $\Delta_{AOD} = \frac{1}{2}|(TPR_0-TPR_1)+(FPR_0-FPR_1)|$ , $FPR = \frac{FP}{FP+TN}$

For intersectional fairness (multiple sensitive attributes, subgroups $s \in S$ ):

Worst-Case-Scenario SPD (WCS-SPD): $\max_s \Pr(\hat y=1|S=s) - \min_s \Pr(\hat y=1|S=s)$
Average SPD (AVG-SPD): $\frac{1}{|S|} \sum_{s} \Pr(\hat y=1|S=s)$

Analogous definitions are used for EOD and AOD at the intersectional level (d'Alosio et al., 12 Jan 2026).

4. Evolutionary Optimization Process

FairRF employs the NSGA-II multi-objective evolutionary algorithm (MOEA), as implemented in DEAP, to optimize over model hyperparameters and data mutation:

Population: 50 individuals per generation
Generations: 25
Crossover: single-point (probability 0.6)
Mutation: random parameter (probability 0.2)
Selection: non-dominated sorting and crowding distance

Each individual in the population represents a particular combination of RF hyperparameters and $m$ . Fitness is assessed using accuracy and $|\Delta_{SPD}|$ on the validation set. The final output consists of all non-dominated solutions (the Pareto front). Each point on the front provides a distinct balance between effectiveness and fairness; stakeholders may select models according to domain-specific constraints (d'Alosio et al., 12 Jan 2026).

5. Experimental Framework and Benchmarking

FairRF is evaluated on five tabular datasets (Adult Income, COMPAS, German credit, Bank marketing, MEPS medical expenditure) across 11 fairness-sensitive scenarios (each dataset × sensitive-attribute combo). For each scenario, evaluation employs:

Baselines: 26 total, including (a) FairRF variations with alternative base learners (LR, KNN, CART, SVM), (b) standard ML models with grid/random search, (c) state-of-the-art fairness-oriented baselines (RW, ADV, MAAT, DEMV, EOP), (d) FairHOME for intersectional fairness.
Intersectional fairness: WCS-SPD, AVG-SPD, WCS-EOD, AVG-EOD, WCS-AOD, AVG-AOD.
Evaluation protocol: 50% train, 20% validation (for fitness), 30% test (for final scoring); 20 repetitions per split.
Statistical analysis: Wilcoxon signed-rank test ( $\alpha=0.05$ ), Vargha-Delaney $A_{12}$ effect size.
Comparison criteria: Hypervolume (Pareto front coverage), Pareto-optimal solution count, standard performance and fairness metrics (d'Alosio et al., 12 Jan 2026).

6. Empirical Results and Analysis

FairRF demonstrates high effectiveness in balancing fairness and predictive accuracy:

Against algorithm variations, FairRF with the RF base achieves the largest Pareto hypervolume in 62.5% of scenarios and statistically dominates 65.6% of competitors.
FairRF outperforms grid-tuned RF, LR, and SVM for SPD in 7/8 cases, with improvements also in EOD and AOD metrics, and maintains or improves MCC in 56% of cases.
Comparing to state-of-the-art fairness methods, FairRF contributes a dominant proportion of Pareto-optimal solutions across all fairness-effectiveness metric pairs.
In intersectional fairness evaluations against FairHOME, FairRF produces more Pareto-optimal solutions in all combinations of effectiveness and intersectional fairness definitions (d'Alosio et al., 12 Jan 2026).

A plausible implication is that data-mutation, even with basic random bit-flipping, is an effective lever for improving group and intersectional fairness without substantial accuracy loss. Notably, optimizing only SPD and Accuracy during search can yield improvements on secondary fairness definitions as well.

7. Interpretability, Practical Usage, and Limitations

FairRF’s Pareto front enables practitioners to visualize and select models along the full spectrum of fairness–effectiveness trade-offs. This directly supports application-specific requirement engineering, permitting selection of “fairness first” or “accuracy first” solutions as needed. The search framework is extendable: different fairness or effectiveness metrics can replace those used in the canonical implementation, and the weights of objectives can be adjusted to emphasize domain priorities (d'Alosio et al., 12 Jan 2026).

Current limitations include the restriction to binary classification and reliance on SPD as the primary fairness search objective. Prospective extensions encompass multi-class tasks, the use of alternative fairness functions (e.g., EOD, AOD) as optimization objectives, and more sophisticated data mutation or reweighting strategies. The development of improved tools for navigating large Pareto sets (such as interactive visual analytics) is identified as a practical need for broad adoption among non-technical stakeholders.

Further details, replication artifacts, and code are available through the associated GitHub repository and Zenodo DOI (d'Alosio et al., 12 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

FairRF: Multi-Objective Search for Single and Intersectional Software Fairness (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FairRF.