FedPAE: Peer-Adaptive Ensemble Learning
- Peer-Adaptive Ensemble Learning (FedPAE) is a decentralized federated learning approach that builds personalized model ensembles through peer-to-peer sharing and evolutionary optimization.
- It overcomes model and statistical heterogeneity by allowing clients to independently choose architectures and asynchronously update models to form robust, diverse ensembles.
- Empirical evaluations on CIFAR datasets show FedPAE achieves competitive accuracy and scalability, dynamically favoring local models to prevent negative transfer.
Peer-Adaptive Ensemble Learning (FedPAE) is a fully decentralized federated learning (FL) paradigm designed to address the challenges of model heterogeneity, statistical heterogeneity, and asynchrony, without relying on a central server or architectural homogeneity among clients. In FedPAE, each client maintains complete autonomy regarding model architecture and updates, participates in decentralized peer communications, and constructs personalized model ensembles through multi-objective selection grounded in both predictive strength and diversity. The approach leverages a peer-to-peer model-sharing mechanism and evolutionary multi-objective optimization for ensemble selection, enabling robust personalization and scalability in environments characterized by non-IID data, heterogeneous models, and asynchronous operations (Mueller et al., 2024).
1. Decentralized Federated Learning Setting and Challenges
FedPAE operates over a network of clients, each indexed by , where each client retains a private data distribution and may independently select its model architecture (e.g., CNN, ResNet, DenseNet). The local model is represented as . Three principal challenges define the problem setting:
- Model heterogeneity: Clients may use distinct model families (), precluding classical parameter averaging.
- Statistical heterogeneity: Client data is generally non-IID, and a global model under-performs in such regimes.
- Asynchronous updates: Clients may join, leave, or update at different rates, resulting in staleness and inconsistency in model exchange.
The objective is to ensure each client can (1) share its local models with selected peers, (2) identify an optimal subset of peer models, (3) construct a personalized ensemble suited to its local data, and (4) operate without any centralized server while tolerating asynchrony (Mueller et al., 2024).
2. Algorithmic Structure of FedPAE
The FedPAE protocol unfolds in iterative (or continuous, for asynchronous scenarios) local cycles. The major steps, with their primary operations, are as follows:
- Local Training: Client trains distinct local models on its private dataset via empirical risk minimization.
- Peer Model Sharing: Each client shares its locally trained models with a designated set of peers (potentially all others), while simultaneously receiving peer models for consideration.
- Ensemble Selection (NSGA-II): The client constructs a “model bench” comprising both local and received models. It performs multi-objective ensemble optimization (via NSGA-II), targeting (a) ensemble predictive strength (mean local validation accuracy) and (b) ensemble diversity (average pairwise predictive disagreement). The solution yields a binary selection vector and ensemble weights .
- Personalized Inference: The client ensemble predictor is , where is the selected subset.
- Optional Model Update: Clients may further refine local models using ensemble soft predictions as targets.
This structure allows asynchronous, peer-to-peer collaboration without architectural or procedural centralization (Mueller et al., 2024).
3. Peer-Adaptive Ensemble Selection and Weighting
Each client employs an adaptive ensemble mechanism to maximize both predictive accuracy and ensemble diversity within its “model bench.” The process involves:
- Subset Selection: Select of fixed size maximizing:
- Ensemble strength: Average validation accuracy on .
- Ensemble diversity: Average pairwise disagreement on predictions.
- Optimization: NSGA-II evolutionary algorithm is used to identify the Pareto frontier of candidate ensembles, facilitating a principled tradeoff.
- Weight Assignment: Typically, for uniform weighting, or weights can be normalized by validation accuracy.
- Final Ensemble Construction: , with .
A key property is the mechanism’s ability to default to purely local ensembles when peer contributions would degrade accuracy, thus protecting against negative transfer. The process is repeated at each communication round or opportunistically as new peer models arrive (Mueller et al., 2024).
4. Asynchrony and Full Decentralization
FedPAE eliminates both the central parameter server and the need for synchronized rounds. Communication occurs directly among peers, typically in a gossip-style exchange. Each client maintains a local clock and transmits model updates as ready. Incoming models at client from peer are tagged with the sender’s clock; models are treated as “stale” if the update is delayed beyond a chosen threshold (), with downweighting or removal from ensemble consideration. This approach ensures robust operation even under non-uniform connectivity or computational resources. Mathematically, stale updates are indexed as , and selection modules enforce the staleness criterion in ensemble construction (Mueller et al., 2024).
5. Local Training Objectives and Personalized Regularization
Local models in FedPAE are primarily trained via classical empirical risk minimization, with the potential for personalized regularization. The objective function for client can be written as:
where is the predictive loss (e.g., cross-entropy), is a regularization factor, and the second term gently encourages parameter alignment with the weighted average of selected peers’ parameters. This encourages personalization while leveraging information sharing for regularization (Mueller et al., 2024).
6. Theoretical Properties and Complexity
Under standard smoothness and bounded variance assumptions, local empirical risk minimizers converge to stationary points. The peer-adaptive ensemble mechanism does not disrupt local model stability, since ensemble selection operates on pretrained models. The per-client computational and communication complexity is
where is local model count, is the number of local update steps, is the local dataset size, is NSGA-II population, is the number of generations, is the number of non-dominated solutions on the Pareto front, and is the number of validation samples (Mueller et al., 2024).
7. Empirical Evaluation and Practical Implications
FedPAE was evaluated on federated CIFAR-10 and CIFAR-100 (60,000 images per dataset) with clients, partitioned by Dirichlet() with to test statistical heterogeneity. All five model architectures—4-layer CNN, ResNet-18, DenseNet-121, GoogleNet, VGG-11—were included for model heterogeneity.
Performance comparison against both homogeneous (FedAvg, FedProx) and heterogeneous (FedKD, FML, FedGH, LG-FedAvg, FedDistill) baselines demonstrates that FedPAE achieves superior or comparable mean test accuracy, notably:
| Method | CIFAR-10 Dir(0.1) | CIFAR-100 Dir(0.1) |
|---|---|---|
| FedAvg | 0.668 ± 0.062 | 0.332 ± 0.017 |
| FedProx | 0.667 ± 0.062 | 0.330 ± 0.018 |
| FedKD | 0.870 ± 0.048 | 0.539 ± 0.028 |
| Local | 0.871 ± 0.046 | 0.556 ± 0.020 |
| FedPAE | 0.873 ± 0.047 | 0.558 ± 0.020 |
Scalability experiments (e.g., on CIFAR-100 Dir(0.1)) maintain high test accuracy (FedPAE: vs. for the best baseline, FedKD). Ablation studies reveal that with increasing heterogeneity (smaller ), FedPAE’s ensemble construction increasingly favors local models (e.g., 72% local model selection for Dir(0.1)). This demonstrates its automatic negative transfer avoidance (Mueller et al., 2024).
Key implications include natural balancing of collaboration and personalization, support for lightweight models on resource-constrained clients, and the removal of central server bottlenecks. Notable limitations are increased communication payload (due to exchange of multiple models per client) and non-trivial ensemble selection computation. Proposed future work includes peer clustering to optimize communication and dynamic, sample-wise ensemble selection (Mueller et al., 2024).