Papers
Topics
Authors
Recent
Search
2000 character limit reached

Federated LLMs

Updated 8 February 2026
  • Federated LLMs are decentralized neural networks trained across multiple clients, enabling privacy-preserving processing in regulated domains like healthcare and finance.
  • They utilize parameter-efficient techniques such as LoRA, prompt tuning, and adapters to significantly reduce communication overhead while maintaining performance.
  • Adaptive aggregation and personalization strategies effectively tackle data and system heterogeneity, ensuring robust performance on non-IID decentralized datasets.

Federated LLMs are large-scale neural networks for natural language processing collaboratively trained or adapted across multiple decentralized clients using federated learning (FL) protocols. These models enable privacy-preserving customization and fine-tuning on sensitive, distributed data without centralized aggregation of raw samples. Federated LLMs are central to applications in healthcare, law, finance, and other regulated domains, where confidentiality, system heterogeneity, and communication efficiency are critical. The recent landscape is defined by the adoption of parameter-efficient fine-tuning (PEFT, e.g., LoRA), communication reduction strategies, heterogeneity-aware architectures, and evaluation paradigms specifically suited to federated environments (Yao et al., 2024, Qi et al., 2024, Bai et al., 2024, Yue et al., 2023, Fan et al., 2024, Qin et al., 2024, Chen et al., 2023).

1. Problem Setting and Core Challenges

A federated LLM system consists of a central server (coordinator) and a set of clients (institutions, users, or edge devices), each holding private data sampled from heterogeneous, often non-IID distributions. The objective is to optimize a global or personalized model:

F(θ)=i=1KpiFi(θ),Fi(θ)=ExDi[(θ;x)]F(\theta)=\sum_{i=1}^K p_i F_i(\theta), \quad F_i(\theta) = \mathbb{E}_{x\sim D^i}[\ell(\theta;x)]

where θ\theta denotes all model parameters, DiD^i is client ii's private data, and pip_i are aggregation weights (often ni/(jnj)n_i/(\sum_j n_j)).

Three intertwined challenges define the federated LLM regime (Qi et al., 2024, Yao et al., 2024, Chen et al., 2023, Wu et al., 15 Mar 2025):

  • Model Size: LLMs typically contain 10810^8101110^{11} parameters, making naive weight aggregation (FedAvg) computationally and communicationally prohibitive.
  • Data Heterogeneity: Private data is non-IID across clients, leading to "client-drift", slow convergence, and degraded global performance if not carefully addressed.
  • System Heterogeneity: Clients have disparate compute, memory, and network capabilities and can join or leave the federation asynchronously.

These constraints necessitate parameter-efficient, communication-minimizing, and heterogeneity-aware training protocols.

2. Parameter-Efficient and Communication-Optimized Fine-Tuning

To mitigate the prohibitive memory and bandwidth demands of full-model synchronization, federated LLM solutions overwhelmingly employ PEFT methods that restrict updates to small subspaces of the architecture. The dominant approaches include (Qi et al., 2024, Yao et al., 2024, Yue et al., 2023, Jiang et al., 2023, Fan et al., 2024, Fan et al., 2023):

  • Low-Rank Adaptation (LoRA): For each learnable weight W0Rd×kW_0\in \mathbb{R}^{d\times k}, the update is

W=W0+BA;BRd×r,ARr×k,rmin(d,k)W = W_0 + B A; \quad B\in\mathbb{R}^{d\times r},\, A\in\mathbb{R}^{r\times k},\, r\ll \min(d, k)

Only AA and BB are adapted and aggregated, yielding 0.1–1% of the original parameter count per client per round (e.g., LoRA with r=4r=4 yields \sim0.05%–0.3% comms cost per round on 7B models).

  • Prompt Tuning: Train small continuous prefix embeddings (soft prompts) PRlp×dP\in \mathbb{R}^{l_p\times d} prepended to the token embedding sequence. These are exchanged as small-dimension tensors (0.01% of model).
  • Adapters: Insert lightweight MLP bottlenecks (dimension rr) within transformer blocks, only updating adapter weights (O(rd)O(rd) parameters per layer).

Empirical studies show that LoRA-based FL achieves $50$–200×200\times communication reduction relative to full-model fine-tuning, with negligible or manageable drops in accuracy, especially for non-IID splits (Qi et al., 2024, Wu et al., 15 Mar 2025, Fan et al., 2023). Prompt-based FL reduces communication even further at the expense of slightly higher accuracy drops.

3. Personalization, Heterogeneity, and Adaptive Aggregation

Federated LLMs must address both data and system heterogeneity. Three principal approaches have emerged (Zhang et al., 2024, Qi et al., 2024, Yao et al., 2024, Fan et al., 2024, Yue et al., 2023):

  • Dual/Hierarchical Adapter Architectures: FDLoRA (Qi et al., 2024) utilizes dual adapters on each client — personalized for client-specific data and global for collaborative knowledge —with only the global branch communicated. AdaFusion adaptively fuses the two branches for inference, yielding optimal personalization–collaboration trade-offs.
  • Mixture-of-Experts (MoE) for Personalization: FedAMoLE (Zhang et al., 2024) dynamically assigns a heterogeneous pool of LoRA experts to each client via a reverse selection strategy (RSEA), allowing the number and type of adapters per client to reflect data complexity and domain drift. This data-driven adaptation yields 1–5% absolute gains in heterogeneous benchmarks and enables strong scalability with modestly increased communication (e.g., 12.5MB per round for 30 experts).
  • Co-Tuning across Heterogeneous Model Sizes: FedCoLLM (Fan et al., 2024) supports bidirectional knowledge transfer between a central LLM and downstream client SLMs. LoRA adapters mediate updates so the LLM is enriched with federated domain knowledge while SLMs are enhanced via knowledge distillation on a public auxiliary set. Communication remains at 0.2–0.3% of full-model size.
  • Split Federated Learning: Frameworks such as SflLLM (Zhao et al., 20 Apr 2025) partition the model such that low-depth layers are on the client and the remainder on the server. Only adapter updates (LoRA) from client layers are federated, minimizing client FLOPs and training latency while maintaining privacy (raw data remains on device).

4. Privacy, Security, and Differential Privacy Mechanisms

Federated LLM workflows rigorously maintain user data privacy and can optionally enforce stricter differential privacy (DP) (Yao et al., 2024, Chen et al., 2023, Fan et al., 2023, Wu et al., 2024, Jiang et al., 2024):

  • Data Locality: All raw data remains strictly on device; only PEFT parameter updates, prompts, or aggregated statistics are shared.
  • Secure Aggregation: Protocols such as Bonawitz et al.'s (2017) Secret Sharing are employed to ensure the server learns only the sum of parameter changes across clients, not individual updates (Fan et al., 2023, Chen et al., 2023).
  • Differential Privacy-Noise Injection: Gaussian noise is added to local updates (e.g., LoRA parameters, gradients) before aggregation, controlling privacy budget (ε,δ)(\varepsilon, \delta) globally; careful noise calibration is required to avoid significant utility loss in high-dimensional LLM settings.
  • Black-Box Prompt-Based FL: LanFL (Wu et al., 2024) introduces an entirely prompt-based FL protocol where clients with only black-box API access to the LLM exchange differentially private synthetic examples instead of weights or activations; this enables FL settings where model weights are not accessible.

Notably, privacy costs associated with PEFT updates are lower than for full weights, but formal analysis remains a research frontier (Wu et al., 15 Mar 2025, Wu et al., 2024).

5. Federated LLM Pruning and Resource Efficiency

Practical deployment on resource-constrained settings necessitates parameter reduction. FedSpaLLM (Bai et al., 2024) is the first federated framework for pruning LLMs:

  • Layer-Wise Pruning: Clients locally prune assigned layers (using e.g., SparseGPT) based on calibration data and communicate only the pruned weights and binary masks.
  • 0\ell_0-Norm Aggregation: Instead of naive averaging, the global model retains only averaged non-zero weights (avoiding unnecessary decay), then applies adaptive mask expansion to match global sparsity targets.
  • Layer Sampling: Each client processes only a fraction of the model per round, yielding linear gains in bandwidth and supporting system heterogeneity.

Experiments show 4x–10x perplexity improvements versus standalone pruning at 70–80% sparsity and near-linear communication reduction with increased clients/layers (Bai et al., 2024).

6. Evaluation Methodologies for Federated LLMs

Traditional test-set evaluation is insufficient for generative LLMs under FL due to the open-endedness of outputs and lack of reliable external judges. FedEval-LLM (He et al., 2024) introduces:

  • Personalized Federated Referee Models: Each client fine-tunes a local evaluator on bootstrapped, task-specific comparisons using only evaluation samples (not test labels).
  • Collective Majority Voting: Multiple referee models aggregate preferences, improving agreement with human judgments and RougeL metrics.
  • Zero Leakage: No reference answers or sensitive content are shared; only question–output pairs and discrete preference votes traverse the federation.

This approach provides accurate downstream evaluation and robust privacy alignment for federated generative models.

7. Open Research Directions

Federated LLMs remain an active research frontier with several key challenges and opportunities (Wu et al., 15 Mar 2025, Yao et al., 2024, Zhang et al., 2024):

  • Efficient Federated Pre-training: Sharded, communication-optimized protocols for full LLM pre-training on distributed private corpora.
  • Personalization under Extreme Heterogeneity: Online, data-driven assignment of LoRA/adapters, hypernetwork-based adaptive fusion, and cluster-based client aggregation.
  • Advanced Differential Privacy and IP Protections: Strong DP guarantees for high-dimensional model deltas, robust watermarking (FedIPR), and secure enclaves for model inference.
  • Communication Compression: Advanced quantization, sparsification, and zeroth-order (seed-based) updates for sub-KB per-round overhead.
  • Federated Evaluation and Benchmarking: Federated task and metric suites that reflect cross-domain, open-ended objectives, and system heterogeneity.
  • Security: Defenses against poisoning and model inversion attacks on federated LLMs.
  • Green and Edge-Friendly Deployment: Jointly optimizing accuracy, energy, and bandwidth for sustainable mass-scale FL adaptation.

Cutting-edge repositories and frameworks, such as OpenFedLLM (Ye et al., 2024), FATE-LLM (Fan et al., 2023), and recent surveys (Wu et al., 15 Mar 2025, Yao et al., 2024), provide the foundation for future research and industrial deployments in this domain.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Large Language Models (LLMs).