FedCCA: Client-Centric Federated Learning

Updated 1 February 2026

Client-Centric Adaptation Federated Learning (FedCCA) is defined by integrating client-specific information to tailor model parameters, aggregation strategies, and optimization routines.
It employs techniques like conditional layers, adapter-based personalization, and attention-driven client selection to address challenges such as non-IID data and device heterogeneity.
Practical implementations demonstrate improved accuracy, reduced communication overhead, and faster convergence in dynamic federated environments.

Client-Centric Adaptation Federated Learning (FedCCA) refers to a broad class of federated learning paradigms characterized by explicit and systematic exploitation of client-specific information to adapt model parameters, participation, optimization, or aggregation strategies. FedCCA subsumes a range of algorithmic designs including conditional activation architectures, personalized model heads, attention-based selection or aggregation, server-driven initialization leveraging client histories, and asynchronous adaptive optimization. These frameworks are unified by prioritizing client-level adaptation mechanisms to address non-IID data, system heterogeneity, variable client availability, streaming distribution shifts, and privacy-centric constraints. The following sections synthesize foundational problem definitions, underlying methodologies, theoretical guarantees, prominent algorithmic instantiations, empirical findings, and open technical challenges as addressed in representative works (Li et al., 2024, &&&1&&&, Wang et al., 25 Jan 2026, Chang et al., 2024, Sun et al., 17 Jan 2025, Zhang et al., 2024).

1. Formal Problem Setting and Motivation

Federated learning aims to find a model $w$ minimizing an aggregate risk $\min_w F(w) = \sum_{i=1}^N p_i F_i(w)$ , where $F_i$ is the expected local loss over client $i$ 's private data. In practical deployments (cross-device, IoT, recommender systems), client data distributions $(\mathcal{D}_i)$ exhibit severe non-IIDness induced by domain shift, label imbalance, and streaming covariate drift. Furthermore, real-world system heterogeneity manifests as asynchronous participation, irregular local computation, and client churn.

Client-centric adaptation in FL targets these core difficulties:

Statistical heterogeneity: Ordinary server-driven aggregation induces gradient bias, poor generalization, and slow convergence under distributional skew.
System heterogeneity: Device availability and computational capacities vary.
Dynamism: Arrival/departure of clients, evolving objectives.
Privacy: Model personalization without raw data exchange.

Formulations in FedCCA protocols extend the federated objective with client-specific embeddings, adapters, encoders, or heads, and instantiate learning as $h(x;\theta, c_i)$ , $f_\theta(x) + f_{\phi_i}(x)$ , or attention-based aggregations weighted by distributional similarity metrics (Wang et al., 25 Jan 2026, Zhang et al., 2024). Some server-driven variants explicitly model time-varying mixtures, $P_m^t = \sum_{k=1}^K \alpha_{mk}^t P_k$ , and accommodate arbitrary client-initiated update regimes (Li et al., 2024, Chang et al., 2024).

2. Algorithmic Principles and Representative Designs

FedCCA encompasses several algorithmic archetypes, including the following:

Conditional Client-Specific Layers: FedCCA models employ conditional gated activation units (CGAU), introducing client-specific weights $V_f, V_g$ that modulate base layers and induce local feature shifts for each client. The effective update at layer $\ell$ is:

$z = \tanh(W_f^\top x + V_f^\top c_k) \odot \sigma(W_g^\top x + V_g^\top c_k)$

where $c_k$ encodes client identity, and only slices corresponding to local index $k$ are updated with private data (Rieger et al., 2020).

Adapter-Based Personalization: In recommendation scenarios, a frozen global backbone $f_\theta$ is wrapped with lightweight adapters $g_{\phi_i}$ (low-rank modules within MLP layers) and group-level adapters $g_{\phi_g}$ , fused via adaptive gates. Only adapter and gate parameters are locally trained and shared, enabling efficient privacy-preserving personalization (Zhang et al., 2024).
Client-Side Encoder and Attention Aggregation: For IoT and cross-domain setups, each client maintains a perpetually trained encoder $\phi_i$ , which produces an embedding (e.g., final FC layer weights) for attention score computation. The server aggregates updates for client $i$ using personalized attention weights:

$\theta^{(i)}_{t+1} = \sum_{j \in S_t^i} u^{(j,i)} \theta^{(j)}_t,$

where $u^{(j,i)} = B(||\phi_{fc}^i - \phi_{fc}^j||^2)$ , with $B(d) = 1 - \exp(-d/\sigma)$ (Wang et al., 25 Jan 2026).

Clustered, Mixture-Based, Client-Driven FL: CDFL maintains $K$ cluster models, each serving as a possible mode of the client data distribution. When a client initiates a model refresh, the server estimates the mixture coefficients $\hat\alpha_{mk}^t$ server-side via fit scores based on proxy set evaluation, distances, and a temperature-scaled softmax. Cluster models are updated by

$w_k^t = (1 - \beta^t_{mk}) w_k^{t-1} + \beta^t_{mk} v_m^t,$

with update fractions $\beta^t_{mk}$ determined by both mixture weights and staleness damping (Li et al., 2024).

Adaptive and Asynchronous Server Optimization: Client-centric adaptive optimization invokes per-client normalization and server-side Adam/AmsGrad/Adagrad update of the aggregated pseudo-gradient from buffered asynchronous client updates, tolerating arbitrary participation and local computation (Sun et al., 17 Jan 2025).

3. Distribution Estimation, Selection, and Aggregation

A central innovation in client-centric adaptation lies in how distribution estimation and peer selection are leveraged for model update and aggregation:

Server-Side Distribution Estimation: CDFL offloads the mixture estimation from client to server. The fit-score formulation

$s_k = c_1 \frac{\sum_{i \neq k} \ell_i}{\sum_i \ell_i} + c_2 \frac{\sum_{i \neq k} d_{1i}}{\sum_i d_{1i}} + (1 - c_1 - c_2) \frac{\sum_{i \neq k} d_{2i}}{\sum_i d_{2i}}$

is converted to weights via softmax, ensuring high-fidelity clustering and adaptation with minimal client compute (Li et al., 2024).

Dynamic Client Selection and Attention: FedCCA dynamically selects update partners per round for each client based on encoder proximity, using a mapping from embedding distance to attention score. This personalized selection ensures adaptation from only the most similar peers, reducing negative transfer and improving convergence under extreme non-IID (Wang et al., 25 Jan 2026).
Model Initialization for Dynamic Client Sets: In scenarios with temporal client churn, FedCCA initializes the current global model as a weighted average of prior saved global models, with weights depending on the similarity of pilot gradients ( $\rho_{k,t}$ ), thus accelerating adaptation to the present client cohort (Chang et al., 2024).

4. Theoretical Properties and Convergence Guarantees

FedCCA frameworks are accompanied by rigorous analytical underpinnings in representative works:

Convexity, Smoothness, and Heterogeneity Bounds: Under $L$ -smoothness, (strong) convexity, bounded gradients, and client-heterogeneity measures, FedCCA demonstrates provable contraction properties, with explicit recursive bounds for optimality-gap under evolving global objectives (Chang et al., 2024). The adaptation term—distance between consecutive optima—quantifies the penalty from client set drift.
Server-Side Adaptive Optimization Rates: For asynchronous, heterogeneous schedules, FedCCA achieves:

$\frac{1}{T} \sum_{t = 0}^{T-1} \mathbb{E}\|\nabla f(x_t)\|^2 = O\left(\sqrt{\frac{1}{mKT}}\right) + O\left(\frac{K^2}{T}\right) + O\left(\frac{\tau^2}{T}\right)$

where $m$ is server buffer, $K$ average local steps, $\tau$ maximum delay. As $T \gtrsim mK^5$ , convergence matches the best-known rates in asynchronous FL (Sun et al., 17 Jan 2025).

Mixture Model Convergence: In cluster-based protocols, time-averaged gradient norms over proxy distributions decrease with the amount of client participation and local work, with explicit bounds involving regularization and separation parameters, guaranteeing convergence for both cluster centers and mixture-distributed clients (Li et al., 2024).
Empirical Validation without Formal Theory: Some attention-aggregation or adapter-based protocols rely exclusively on empirical improvement without providing explicit convergence or personalization error bounds, though they inherit standard FedAvg rates under smoothness/bounded-variance (Wang et al., 25 Jan 2026, Zhang et al., 2024).

5. Practical Implementation, System Design, and Privacy

FedCCA protocols operationalize client-centric adaptation through architectural and procedural choices:

Asynchronous, Arbitrary Participation: Clients independently decide when and how much to compute and upload (arbitrary $K_i$ , delays), and the server aggregates using adaptive methods, mitigating straggler effects and making full use of edge computing (Sun et al., 17 Jan 2025, Li et al., 2024).
Single-Model Download and Lightweight Communication: Protocols such as CDFL ensure only one aggregated model (vs $K$ ) is ever sent to a client per update, drastically reducing bandwidth and compute (Li et al., 2024). Adapter-based recommendation systems similarly restrict communication to adapters/gate parameters, leaving large backbones frozen and unshared (Zhang et al., 2024).
Local Client Encoders and Personalized Aggregation: The perpetually trained client encoder ( $\phi_i$ ) is kept strictly private on-device, with no raw data or detailed features shared. Aggregation and selection occur on encoded statistics only (Wang et al., 25 Jan 2026).
Privacy Guarantees: Adapter-based FedCCA can enforce local differential privacy by injecting Laplace noise into shared updates, with direct tradeoffs observed between privacy and recommendation utility metrics (Zhang et al., 2024).

6. Empirical Results and Benchmarks

Experiments across representative works validate FedCCA's advantages:

Dataset/Setting	Baseline/Comparator	FedCCA Variant	Main Gains (Metric)
FashionMNIST, CIFAR-100	FedSoftAsync, Local	CDFL (Single-model) (Li et al., 2024)	+2–20% client/cluster accuracy, ×1 comm. eff.
KuaiRand, KuaiSAR	FedNCF, FedRecon	Adapter-based (Zhang et al., 2024)	+2.45 AUC, +7.95 Precision; 40% less bandwidth
MNIST, FashionMNIST	FedAvg, FedProx	Gradient-guided init. (Chang et al., 2024)	+3–40% accuracy after client-set shift
Digits, DomainNet, FEMNIST	FedAvg, FedProx, FedSR	Attention-based (Wang et al., 25 Jan 2026)	+1–3% accuracy, faster convergence
CIFAR-100, StackOverflow	FedSGD, FedAdam	Adaptive server opt. (Sun et al., 17 Jan 2025)	+7–20% accuracy, best-known rate

Ablation studies underscore the necessity of dynamic selection and multi-source aggregation in realizing full adaptation benefits. Most methods converge in 20–50 rounds for industrial-scale recommendation; parameter efficiency enables deployment on resource-constrained edge devices.

7. Limitations, Challenges, and Future Directions

Current limitations primarily include scalability of peer selection with large $N$ (embedding distance computations), susceptibility to group definition errors in adapter setups, lack of formal personalization error bounds in some architectures, and potential drift in frozen backbones if underlying distributions evolve. There is a recognized need for:

Derivation of formal convergence rates and regret bounds for attention-based and adapter-centric protocols.
Approximate or hierarchical client selection to maintain scalability in massive federations.
More lightweight, compressed encoder architectures.
Integration of differential privacy and secure aggregation for stronger privacy guarantees.
Extensions to asynchronous or resource-aware client-side adaptation (Wang et al., 25 Jan 2026, Zhang et al., 2024).

A plausible implication is that the explicit inclusion of client-driven adaptation mechanisms improves learning efficiency, generalization, and personalization under real-world non-IIDness and dynamism, and will become foundational in federated learning at scale.