Anchor-based Fair Clustering Framework

Updated 20 November 2025

The paper presents AFCF—a scalable algorithm that ensures exact per-cluster fairness by matching demographic proportions via novel anchor selection and constrained optimization.
It employs the FDAS mechanism to select representative anchors that maintain both spatial coverage and group balance, significantly reducing computational overhead.
The framework utilizes an ADMM-based solver to efficiently handle fairness-preserving label propagation, achieving linear scalability on large datasets.

The Anchor-based Fair Clustering Framework (AFCF) enables linear-time scalable fair clustering on large datasets, rigorously preserving demographic group fairness properties while drastically accelerating existing fair clustering algorithms. AFCF integrates novel fair sampling for anchor selection, a fairness-preserving label propagation mechanism grounded in constrained optimization, and an efficient ADMM solver, demonstrating consistent empirical efficacy across large benchmark datasets (Wei et al., 13 Nov 2025).

1. Fair Anchor Selection: FDAS Mechanism

AFCF achieves both spatial and demographic representativeness of a subset of anchors through the Fair Directly Alternate Sampling (FDAS) algorithm. Given a dataset $\mathbf{X}\in\mathbb{R}^{d\times n}$ , a partition of the data into $t$ protected groups $\mathcal{G}=\{G_1, \dots, G_t\}$ , and group proportions $\rho_r=|G_r|/n$ , FDAS selects $m\ll n$ anchors according to the following two-phase procedure:

(A) Quota Computation. Each group receives a quota $q_r = \lfloor m \cdot \rho_r \rfloor$ , with the remainder $\Delta = m - \sum_{r=1}^t q_r$ allocated iteratively to those groups underrepresented relative to $m\rho_r$ . This guarantees $\sum_r q_r = m$ and for all $r$ , $|q_r - m\rho_r| < 1$ .

(B) Within-Group Spatial Coverage. For each group $r$ , points are scored via $s_i = \sum_{p=1}^d X_{p,i}$ , normalized to $s\leftarrow s/\max(s)$ . Iteratively, the highest scoring point within group $r$ is selected as an anchor. After each selection, scores are decayed as $s \leftarrow s \odot (1-s)/\max(s)$ to promote spatial dispersion within the group. The process continues until $q_r$ anchors are chosen from every group.

The FDAS approach ensures that the selected anchors reflect both the global group proportions and spatial distribution, with computational complexity $O(nd)$ , where $d$ is ambient dimensionality and $m\ll n$ (Wei et al., 13 Nov 2025).

2. Anchor Graph Construction and Fairness-Preserving Label Propagation

Post anchor selection, any fair clustering algorithm $\mathcal{F}$ is applied to the $m$ -anchor set, yielding cluster labels $\ell\in\{1, \dots, c\}^m$ . The challenge is then to transfer these cluster assignments, preserving fairness, to the full dataset. This is mediated by constructing an $m\times n$ nonnegative affinity matrix $\mathbf{Z}$ so that cluster label propagation maintains demographic parity.

The propagation problem is formalized as: $\min_{\mathbf{Z}\in\mathbb{R}^{m\times n}} \|\mathbf{X} - \mathbf{H}\mathbf{Z}\|_F^2 + \alpha\|\mathbf{Z}\|_F^2$ subject to

$\mathbf{Z}_{:,i} \in \Delta^m$ (the $m$ -simplex for each $i$ ),
For each cluster $l$ and group $r$ ,

$\sum_{j\in \mathcal{C}_l}\sum_{i\in G_r} Z_{j,i} = t_{l,r},\qquad t_{l,r} = |\{j \in \mathcal{C}_l \cap G_r\}|\cdot \frac{n}{m}$

where $\mathbf{H}$ is the anchor feature matrix, $\mathcal{C}_l$ indexes anchors from cluster $l$ , and $G_r$ indexes data for group $r$ .

Fairness Preservation: The joint group-label constraint enforces that the final per-cluster group proportions on all $n$ data points match exactly those observed among the anchor assignments: $\text{balance}(\mathcal{C}_{\text{final}}) = \text{balance}(\mathcal{C}_{\text{anchor}})$ where $\rho_r^{(l)} = \frac{|\mathcal{C}_l\cap G_r|}{|\mathcal{C}_l|}$ and balance is defined as $\min_l\min_{r\neq r'} \frac{\rho_r^{(l)}}{\rho_{r'}^{(l)}}$ .

Label propagation computes final soft assignments $\mathbf{Y} = \mathbf{Z}^\top \mathbf{L}$ (with $\mathbf{L}$ the anchor cluster one-hot matrix), and hard cluster labels by $\hat{y}_i = \arg\max_l Y_{i,l}$ (Wei et al., 13 Nov 2025).

3. ADMM-Based Optimization

To efficiently solve the constrained quadratic problem, AFCF employs an Alternating Direction Method of Multipliers (ADMM) framework. Introducing slack variable $\mathbf{E}$ and dual variable $\mathbf{\Lambda}$ , the augmented Lagrangian is

$\mathcal{L}_\rho(\mathbf{Z}, \mathbf{E}, \mathbf{\Lambda}) = \|\mathbf{X} - \mathbf{H}\mathbf{Z}\|_F^2 + \alpha\|\mathbf{Z}\|_F^2 + \langle \mathbf{\Lambda}, \mathbf{Z} - \mathbf{E}\rangle + \frac{\rho}{2} \|\mathbf{Z} - \mathbf{E}\|_F^2$

Iterative updates alternately minimize for $\mathbf{Z}$ (simplex-constrained QPs), update $\mathbf{E}$ (closed form within each block to enforce the fairness constraint), and perform dual ascent on $\mathbf{\Lambda}$ . Each ADMM iteration costs $O(nm^2)$ , with $m$ (the number of anchors) typically $O(10$ --$100)$.

Convergence is measured via primal/dual residuals $r_k = \|\mathbf{Z}^{k+1} - \mathbf{E}^{k+1}\|_F$ and $s_k = \rho\|\mathbf{E}^{k+1} - \mathbf{E}^k\|_F$ , which empirically decrease as the algorithm proceeds. Adaptive stepsize schemes (e.g., updating $\rho$ every 10 steps) are used to optimize convergence (Wei et al., 13 Nov 2025).

4. Theoretical Guarantees

AFCF provides two formal guarantees:

(a) Fairness Equivalence: Under the formulated group-label joint constraint, the final clustering of all $n$ points recovers the {\em exact} per-cluster demographic group proportions present in the anchor clustering. This implies preservation of standard fairness metrics, including balance and disparate impact.

(b) Linear-Time Scalability: For fixed $d$ , $m$ , and cluster count $c$ , the total computational complexity of AFCF is

$O(nd + f(m) + nm^2 + nmc)$

where $f(m)$ is the complexity of the fair clustering subroutine on $m$ anchors. With $m\ll n$ , this yields overall linear scaling in the number of samples $n$ , a substantial reduction from the quadratic or higher costs of many existing fair clustering frameworks (Wei et al., 13 Nov 2025).

5. Empirical Evaluation

AFCF was benchmarked on five real-world datasets:

Dataset	Size	# Clusters	Sensitive Attribute
Law School	18,000	2	Gender
Credit Card	29,000	5	Gender
Bank	41,000	2	Marital Status
Zafar	100,000	2	Binary Sensitive
Census II	2,460,000	5	Gender

Performance metrics included clustering quality (Accuracy, Normalized Mutual Information) and fairness (Balance, Minimal Normalized Conditional Entropy). Representative state-of-the-art methods—SpFC, VFC, FFC, FMSC, and fairletFC—were integrated into the AFCF pipeline.

Key empirical findings:

Computational Speedup: On Census II, VFC alone required ≈1,500s; VFC-AF (AFCF version) executed in ≈918s. SpFC could not complete within 30 minutes on Bank, whereas SpFC-AF finished in 35s. In general, AFCF enabled one to two orders of magnitude acceleration.
Clustering Quality and Fairness Preservation: Clustering accuracy and NMI varied by only a few percentage points; fairness metrics such as balance and MNCE were preserved within 1–2% of anchor clustering levels, consistent with the theoretical guarantee.
Ablation Analysis: Substituting FDAS for random or vanilla DAS anchor sampling resulted in degenerate clusters or substantial fairness loss. Excluding the group-label joint constraint in the graph update ("AC" ablation) degraded balance by up to 10% (Wei et al., 13 Nov 2025).

6. Significance and Implications

AFCF decouples scalability from the core fair clustering algorithm: any fair clustering routine applied to the anchor subset inherits AFCF’s linear-time scalability and exact fairness preservation when extended to the whole dataset. This modularity allows rapid experimentation and deployment across large-scale, high-stakes environments requiring fairness guarantees in unsupervised learning. The systematic empirical and theoretical analysis demonstrates AFCF’s ability to bridge the computational gap in fair clustering, establishing it as a plug-and-play, practical framework for scalable fair learning (Wei et al., 13 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

A General Anchor-Based Framework for Scalable Fair Clustering (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anchor-based Fair Clustering Framework (AFCF).