Papers
Topics
Authors
Recent
Search
2000 character limit reached

FedPAE: Peer-Adaptive Ensemble Learning

Updated 21 February 2026
  • Peer-Adaptive Ensemble Learning (FedPAE) is a decentralized federated learning approach that builds personalized model ensembles through peer-to-peer sharing and evolutionary optimization.
  • It overcomes model and statistical heterogeneity by allowing clients to independently choose architectures and asynchronously update models to form robust, diverse ensembles.
  • Empirical evaluations on CIFAR datasets show FedPAE achieves competitive accuracy and scalability, dynamically favoring local models to prevent negative transfer.

Peer-Adaptive Ensemble Learning (FedPAE) is a fully decentralized federated learning (FL) paradigm designed to address the challenges of model heterogeneity, statistical heterogeneity, and asynchrony, without relying on a central server or architectural homogeneity among clients. In FedPAE, each client maintains complete autonomy regarding model architecture and updates, participates in decentralized peer communications, and constructs personalized model ensembles through multi-objective selection grounded in both predictive strength and diversity. The approach leverages a peer-to-peer model-sharing mechanism and evolutionary multi-objective optimization for ensemble selection, enabling robust personalization and scalability in environments characterized by non-IID data, heterogeneous models, and asynchronous operations (Mueller et al., 2024).

1. Decentralized Federated Learning Setting and Challenges

FedPAE operates over a network of NN clients, each indexed by i{1,,N}i \in \{1, \dots, N\}, where each client ii retains a private data distribution Di\mathcal{D}_i and may independently select its model architecture Mi\mathcal{M}_i (e.g., CNN, ResNet, DenseNet). The local model is represented as fi(x;θi)f_i(x; \theta_i). Three principal challenges define the problem setting:

  • Model heterogeneity: Clients may use distinct model families (MiMj\mathcal{M}_i \neq \mathcal{M}_j), precluding classical parameter averaging.
  • Statistical heterogeneity: Client data Di\mathcal{D}_i is generally non-IID, and a global model under-performs in such regimes.
  • Asynchronous updates: Clients may join, leave, or update at different rates, resulting in staleness and inconsistency in model exchange.

The objective is to ensure each client can (1) share its local models with selected peers, (2) identify an optimal subset of peer models, (3) construct a personalized ensemble suited to its local data, and (4) operate without any centralized server while tolerating asynchrony (Mueller et al., 2024).

2. Algorithmic Structure of FedPAE

The FedPAE protocol unfolds in iterative (or continuous, for asynchronous scenarios) local cycles. The major steps, with their primary operations, are as follows:

  1. Local Training: Client ii trains MM distinct local models {fim(;θim)}m=1M\{ f_i^m(\cdot; \theta_i^m) \}_{m=1}^M on its private dataset Di\mathcal{D}_i via empirical risk minimization.
  2. Peer Model Sharing: Each client shares its locally trained models with a designated set of peers Pi\mathcal{P}_i (potentially all others), while simultaneously receiving peer models for consideration.
  3. Ensemble Selection (NSGA-II): The client constructs a “model bench” comprising both local and received models. It performs multi-objective ensemble optimization (via NSGA-II), targeting (a) ensemble predictive strength (mean local validation accuracy) and (b) ensemble diversity (average pairwise predictive disagreement). The solution yields a binary selection vector sis_i and ensemble weights {wij}\{ w_{ij} \}.
  4. Personalized Inference: The client ensemble predictor is f^i(x)=jBisijwijfj(x)\hat{f}_i(x) = \sum_{j \in \mathcal{B}_i} s_{ij} w_{ij} f_j(x), where Bi\mathcal{B}_i is the selected subset.
  5. Optional Model Update: Clients may further refine local models using ensemble soft predictions as targets.

This structure allows asynchronous, peer-to-peer collaboration without architectural or procedural centralization (Mueller et al., 2024).

3. Peer-Adaptive Ensemble Selection and Weighting

Each client employs an adaptive ensemble mechanism to maximize both predictive accuracy and ensemble diversity within its “model bench.” The process involves:

  • Subset Selection: Select SiPi{i}S_i \subseteq \mathcal{P}_i \cup \{i\} of fixed size kk maximizing:
    • Ensemble strength: Average validation accuracy on Di\mathcal{D}_i.
    • Ensemble diversity: Average pairwise disagreement on predictions.
  • Optimization: NSGA-II evolutionary algorithm is used to identify the Pareto frontier of candidate ensembles, facilitating a principled tradeoff.
  • Weight Assignment: Typically, wij=1/Siw_{ij} = 1/|S_i| for uniform weighting, or weights can be normalized by validation accuracy.
  • Final Ensemble Construction: f^i(x)=wiifi(x)+jPiwijfj(x)\hat{f}_i(x) = w_{ii} f_i(x) + \sum_{j \in \mathcal{P}_i} w_{ij} f_j(x), with jwij=1,wij0\sum_j w_{ij} = 1, w_{ij} \ge 0.

A key property is the mechanism’s ability to default to purely local ensembles when peer contributions would degrade accuracy, thus protecting against negative transfer. The process is repeated at each communication round or opportunistically as new peer models arrive (Mueller et al., 2024).

4. Asynchrony and Full Decentralization

FedPAE eliminates both the central parameter server and the need for synchronized rounds. Communication occurs directly among peers, typically in a gossip-style exchange. Each client maintains a local clock tit_i and transmits model updates as ready. Incoming models at client jj from peer ii are tagged with the sender’s clock; models are treated as “stale” if the update is delayed beyond a chosen threshold TT (τji>T\tau_{ji} > T), with downweighting or removal from ensemble consideration. This approach ensures robust operation even under non-uniform connectivity or computational resources. Mathematically, stale updates are indexed as fitiτji(x)f_i^{t_i - \tau_{ji}}(x), and selection modules enforce the staleness criterion in ensemble construction (Mueller et al., 2024).

5. Local Training Objectives and Personalized Regularization

Local models in FedPAE are primarily trained via classical empirical risk minimization, with the potential for personalized regularization. The objective function for client ii can be written as:

Li(θi)=E(x,y)Di[(fi(x;θi),y)]+λθijSiwijθj2,\mathcal{L}_i(\theta_i) = \mathbb{E}_{(x,y) \sim \mathcal{D}_i} [\ell(f_i(x; \theta_i), y)] + \lambda \left\| \theta_i - \sum_{j \in S_i} w_{ij} \theta_j \right\|^2,

where \ell is the predictive loss (e.g., cross-entropy), λ\lambda is a regularization factor, and the second term gently encourages parameter alignment with the weighted average of selected peers’ parameters. This encourages personalization while leveraging information sharing for regularization (Mueller et al., 2024).

6. Theoretical Properties and Complexity

Under standard smoothness and bounded variance assumptions, local empirical risk minimizers converge to stationary points. The peer-adaptive ensemble mechanism does not disrupt local model stability, since ensemble selection operates on pretrained models. The per-client computational and communication complexity is

O(MTD+PG+pfV),\mathcal{O}(M T D + P G + pf V),

where MM is local model count, TT is the number of local update steps, DD is the local dataset size, PP is NSGA-II population, GG is the number of generations, pfpf is the number of non-dominated solutions on the Pareto front, and VV is the number of validation samples (Mueller et al., 2024).

7. Empirical Evaluation and Practical Implications

FedPAE was evaluated on federated CIFAR-10 and CIFAR-100 (60,000 images per dataset) with N=20N=20 clients, partitioned by Dirichlet(α\alpha) with α{0.5,0.3,0.1}\alpha \in \{0.5, 0.3, 0.1\} to test statistical heterogeneity. All five model architectures—4-layer CNN, ResNet-18, DenseNet-121, GoogleNet, VGG-11—were included for model heterogeneity.

Performance comparison against both homogeneous (FedAvg, FedProx) and heterogeneous (FedKD, FML, FedGH, LG-FedAvg, FedDistill) baselines demonstrates that FedPAE achieves superior or comparable mean test accuracy, notably:

Method CIFAR-10 Dir(0.1) CIFAR-100 Dir(0.1)
FedAvg 0.668 ± 0.062 0.332 ± 0.017
FedProx 0.667 ± 0.062 0.330 ± 0.018
FedKD 0.870 ± 0.048 0.539 ± 0.028
Local 0.871 ± 0.046 0.556 ± 0.020
FedPAE 0.873 ± 0.047 0.558 ± 0.020

Scalability experiments (e.g., N=50N=50 on CIFAR-100 Dir(0.1)) maintain high test accuracy (FedPAE: 0.552±0.0280.552 \pm 0.028 vs. 0.554±0.0290.554 \pm 0.029 for the best baseline, FedKD). Ablation studies reveal that with increasing heterogeneity (smaller α\alpha), FedPAE’s ensemble construction increasingly favors local models (e.g., 72% local model selection for Dir(0.1)). This demonstrates its automatic negative transfer avoidance (Mueller et al., 2024).

Key implications include natural balancing of collaboration and personalization, support for lightweight models on resource-constrained clients, and the removal of central server bottlenecks. Notable limitations are increased communication payload (due to exchange of multiple models per client) and non-trivial ensemble selection computation. Proposed future work includes peer clustering to optimize communication and dynamic, sample-wise ensemble selection (Mueller et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Peer-Adaptive Ensemble Learning (FedPAE).