FedPAE: Peer-Adaptive Ensemble Learning

Updated 21 February 2026

Peer-Adaptive Ensemble Learning (FedPAE) is a decentralized federated learning approach that builds personalized model ensembles through peer-to-peer sharing and evolutionary optimization.
It overcomes model and statistical heterogeneity by allowing clients to independently choose architectures and asynchronously update models to form robust, diverse ensembles.
Empirical evaluations on CIFAR datasets show FedPAE achieves competitive accuracy and scalability, dynamically favoring local models to prevent negative transfer.

Peer-Adaptive Ensemble Learning (FedPAE) is a fully decentralized federated learning (FL) paradigm designed to address the challenges of model heterogeneity, statistical heterogeneity, and asynchrony, without relying on a central server or architectural homogeneity among clients. In FedPAE, each client maintains complete autonomy regarding model architecture and updates, participates in decentralized peer communications, and constructs personalized model ensembles through multi-objective selection grounded in both predictive strength and diversity. The approach leverages a peer-to-peer model-sharing mechanism and evolutionary multi-objective optimization for ensemble selection, enabling robust personalization and scalability in environments characterized by non-IID data, heterogeneous models, and asynchronous operations (Mueller et al., 2024).

1. Decentralized Federated Learning Setting and Challenges

FedPAE operates over a network of $N$ clients, each indexed by $i \in \{1, \dots, N\}$ , where each client $i$ retains a private data distribution $\mathcal{D}_i$ and may independently select its model architecture $\mathcal{M}_i$ (e.g., CNN, ResNet, DenseNet). The local model is represented as $f_i(x; \theta_i)$ . Three principal challenges define the problem setting:

Model heterogeneity: Clients may use distinct model families ( $\mathcal{M}_i \neq \mathcal{M}_j$ ), precluding classical parameter averaging.
Statistical heterogeneity: Client data $\mathcal{D}_i$ is generally non-IID, and a global model under-performs in such regimes.
Asynchronous updates: Clients may join, leave, or update at different rates, resulting in staleness and inconsistency in model exchange.

The objective is to ensure each client can (1) share its local models with selected peers, (2) identify an optimal subset of peer models, (3) construct a personalized ensemble suited to its local data, and (4) operate without any centralized server while tolerating asynchrony (Mueller et al., 2024).

2. Algorithmic Structure of FedPAE

The FedPAE protocol unfolds in iterative (or continuous, for asynchronous scenarios) local cycles. The major steps, with their primary operations, are as follows:

Local Training: Client $i$ trains $M$ distinct local models $\{ f_i^m(\cdot; \theta_i^m) \}_{m=1}^M$ on its private dataset $\mathcal{D}_i$ via empirical risk minimization.
Peer Model Sharing: Each client shares its locally trained models with a designated set of peers $\mathcal{P}_i$ (potentially all others), while simultaneously receiving peer models for consideration.
Ensemble Selection (NSGA-II): The client constructs a “model bench” comprising both local and received models. It performs multi-objective ensemble optimization (via NSGA-II), targeting (a) ensemble predictive strength (mean local validation accuracy) and (b) ensemble diversity (average pairwise predictive disagreement). The solution yields a binary selection vector $s_i$ and ensemble weights $\{ w_{ij} \}$ .
Personalized Inference: The client ensemble predictor is $\hat{f}_i(x) = \sum_{j \in \mathcal{B}_i} s_{ij} w_{ij} f_j(x)$ , where $\mathcal{B}_i$ is the selected subset.
Optional Model Update: Clients may further refine local models using ensemble soft predictions as targets.

This structure allows asynchronous, peer-to-peer collaboration without architectural or procedural centralization (Mueller et al., 2024).

3. Peer-Adaptive Ensemble Selection and Weighting

Each client employs an adaptive ensemble mechanism to maximize both predictive accuracy and ensemble diversity within its “model bench.” The process involves:

Subset Selection: Select $S_i \subseteq \mathcal{P}_i \cup \{i\}$ $S_{i} \subseteq P_{i} \cup {i}$ of fixed size $k$ $k$ maximizing:
- Ensemble strength: Average validation accuracy on $\mathcal{D}_i$ .
- Ensemble diversity: Average pairwise disagreement on predictions.
Optimization: NSGA-II evolutionary algorithm is used to identify the Pareto frontier of candidate ensembles, facilitating a principled tradeoff.
Weight Assignment: Typically, $w_{ij} = 1/|S_i|$ for uniform weighting, or weights can be normalized by validation accuracy.
Final Ensemble Construction: $\hat{f}_i(x) = w_{ii} f_i(x) + \sum_{j \in \mathcal{P}_i} w_{ij} f_j(x)$ , with $\sum_j w_{ij} = 1, w_{ij} \ge 0$ .

A key property is the mechanism’s ability to default to purely local ensembles when peer contributions would degrade accuracy, thus protecting against negative transfer. The process is repeated at each communication round or opportunistically as new peer models arrive (Mueller et al., 2024).

4. Asynchrony and Full Decentralization

FedPAE eliminates both the central parameter server and the need for synchronized rounds. Communication occurs directly among peers, typically in a gossip-style exchange. Each client maintains a local clock $t_i$ and transmits model updates as ready. Incoming models at client $j$ from peer $i$ are tagged with the sender’s clock; models are treated as “stale” if the update is delayed beyond a chosen threshold $T$ ( $\tau_{ji} > T$ ), with downweighting or removal from ensemble consideration. This approach ensures robust operation even under non-uniform connectivity or computational resources. Mathematically, stale updates are indexed as $f_i^{t_i - \tau_{ji}}(x)$ , and selection modules enforce the staleness criterion in ensemble construction (Mueller et al., 2024).

5. Local Training Objectives and Personalized Regularization

Local models in FedPAE are primarily trained via classical empirical risk minimization, with the potential for personalized regularization. The objective function for client $i$ can be written as:

$\mathcal{L}_i(\theta_i) = \mathbb{E}_{(x,y) \sim \mathcal{D}_i} [\ell(f_i(x; \theta_i), y)] + \lambda \left\| \theta_i - \sum_{j \in S_i} w_{ij} \theta_j \right\|^2,$

where $\ell$ is the predictive loss (e.g., cross-entropy), $\lambda$ is a regularization factor, and the second term gently encourages parameter alignment with the weighted average of selected peers’ parameters. This encourages personalization while leveraging information sharing for regularization (Mueller et al., 2024).

6. Theoretical Properties and Complexity

Under standard smoothness and bounded variance assumptions, local empirical risk minimizers converge to stationary points. The peer-adaptive ensemble mechanism does not disrupt local model stability, since ensemble selection operates on pretrained models. The per-client computational and communication complexity is

$\mathcal{O}(M T D + P G + pf V),$

where $M$ is local model count, $T$ is the number of local update steps, $D$ is the local dataset size, $P$ is NSGA-II population, $G$ is the number of generations, $pf$ is the number of non-dominated solutions on the Pareto front, and $V$ is the number of validation samples (Mueller et al., 2024).

7. Empirical Evaluation and Practical Implications

FedPAE was evaluated on federated CIFAR-10 and CIFAR-100 (60,000 images per dataset) with $N=20$ clients, partitioned by Dirichlet( $\alpha$ ) with $\alpha \in \{0.5, 0.3, 0.1\}$ to test statistical heterogeneity. All five model architectures—4-layer CNN, ResNet-18, DenseNet-121, GoogleNet, VGG-11—were included for model heterogeneity.

Performance comparison against both homogeneous (FedAvg, FedProx) and heterogeneous (FedKD, FML, FedGH, LG-FedAvg, FedDistill) baselines demonstrates that FedPAE achieves superior or comparable mean test accuracy, notably:

Method	CIFAR-10 Dir(0.1)	CIFAR-100 Dir(0.1)
FedAvg	0.668 ± 0.062	0.332 ± 0.017
FedProx	0.667 ± 0.062	0.330 ± 0.018
FedKD	0.870 ± 0.048	0.539 ± 0.028
Local	0.871 ± 0.046	0.556 ± 0.020
FedPAE	0.873 ± 0.047	0.558 ± 0.020

Scalability experiments (e.g., $N=50$ on CIFAR-100 Dir(0.1)) maintain high test accuracy (FedPAE: $0.552 \pm 0.028$ vs. $0.554 \pm 0.029$ for the best baseline, FedKD). Ablation studies reveal that with increasing heterogeneity (smaller $\alpha$ ), FedPAE’s ensemble construction increasingly favors local models (e.g., 72% local model selection for Dir(0.1)). This demonstrates its automatic negative transfer avoidance (Mueller et al., 2024).

Key implications include natural balancing of collaboration and personalization, support for lightweight models on resource-constrained clients, and the removal of central server bottlenecks. Notable limitations are increased communication payload (due to exchange of multiple models per client) and non-trivial ensemble selection computation. Proposed future work includes peer clustering to optimize communication and dynamic, sample-wise ensemble selection (Mueller et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

FedPAE: Peer-Adaptive Ensemble Learning for Asynchronous and Model-Heterogeneous Federated Learning (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Peer-Adaptive Ensemble Learning (FedPAE).