Selective Dual-module Federated LoRA

Updated 23 January 2026

SDFLoRA is a federated fine-tuning paradigm that separates global transferable adaptations from local personalized modules through dual-module LoRA adapters.
It employs selective aggregation of only global modules using weighted stacking and SVD recompression to handle rank heterogeneity and reduce communication costs by up to 50%.
By injecting differential privacy noise solely into the global module, SDFLoRA achieves superior privacy–utility trade-offs while adapting to non-IID data across heterogeneous clients.

Selective Dual-module Federated LoRA (SDFLoRA) is a federated fine-tuning paradigm that combines parameter-efficient low-rank adaptation with explicit separation of global and local adaptation modules for robust, communication-efficient, and privacy-aware learning across heterogeneous clients. It is designed to address fundamental issues in federated learning of large pre-trained models—especially in scenarios involving non-identically distributed data, client hardware diversity, privacy constraints, and hyperparameter heterogeneity. SDFLoRA further generalizes to cases with rank-mismatched LoRA modules, enabling scalable and stable model aggregation, and integrates with advanced techniques such as selective aggregation, adaptively fused dual-modules, and private-domain expert allocation.

1. Mathematical Foundation and Dual-Module Architecture

SDFLoRA adopts the standard LoRA parameterization, where each trainable weight matrix $W_0\in\mathbb{R}^{d_{\mathrm{out}}\times d_{\mathrm{in}}}$ is augmented by a low-rank update:

$W = W_0 + \Delta W, \qquad \Delta W = B A$

with $A\in\mathbb{R}^{r\times d_{\mathrm{in}}}$ , $B\in\mathbb{R}^{d_{\mathrm{out}}\times r}$ , and $r\ll \min(d_{\mathrm{in}},d_{\mathrm{out}})$ .

Under SDFLoRA, each client $k$ maintains a dual-module LoRA adapter for every adapted layer:

$\Delta W_k = \Delta W^{(g)}_k + \Delta W^{(l)}_k = B_k^{(g)}A_k^{(g)} + B_k^{(l)}A_k^{(l)}$

The global module $(A^{(g)}_k, B^{(g)}_k)$ captures knowledge transferable across clients and is the only component aggregated globally.
The local module $(A^{(l)}_k, B^{(l)}_k)$ remains on-device, encoding client-specific or highly personalized adaptations.

This modularization is supported by sensitivity analyses, which show (empirically and theoretically) that $A$ -module (especially its directional components) encodes global, distribution-independent knowledge, while $B$ is more sensitive to local data statistics and thus suited for personalization (Guo et al., 2024, Zhao et al., 13 Oct 2025, Qi et al., 2024).

2. Selective Aggregation and Alignment under Heterogeneity

Classic federated approaches aggregate updates naïvely, often necessitating homogeneous adapter ranks. SDFLoRA addresses this by selectively aggregating only the global modules, thereby:

Handling rank heterogeneity: Each client may use a different global rank $r_k^{(g)}$ for its module. At the server, aggregation is performed by stacking the received $(A_k^{(g)}, B_k^{(g)})$ across clients, weighted by client-specific terms (e.g., sample sizes). Aggregated updates:

$A_t^{(g)} = \text{Concat}\left(\sqrt{p_k}A_k^{(g,t)}\right),\quad B_t^{(g)} = \text{Concat}\left(\sqrt{p_k}B_k^{(g,t)}\right)$

where $p_k$ are aggregation weights.

Recompression: To prevent the rank of the global adapter from growing indefinitely, the concatenated matrices are recompressed to a maximum acceptable rank $r_{\max}$ via SVD:

$\Delta W_t^{(g)} \approx \widetilde{B}_t^{(g)}\widetilde{A}_t^{(g)},\quad \mathrm{rank}\le r_{\max}$

This supports scalability and avoids the over-alignment of personalized subspaces (Shen et al., 16 Jan 2026).

3. Privacy-Aware Optimization

SDFLoRA supports privacy constraints by selectively injecting differential privacy (DP) noise only into updates for the global module during a local DP-SGD update step:

$g_k^{(g)} \leftarrow \mathrm{Clip}(g_k^{(g)},C) + \mathcal{N}(0,\sigma^2C^2I)$

while all local module updates remain noise-free. This mechanism yields superior privacy–utility trade-offs (maintaining 2–4 percentage points higher accuracy under strong DP regimes, e.g., $\epsilon=1$ , compared to full-noise LoRA) by protecting global knowledge while maximizing the utility of local adaptations (Shen et al., 16 Jan 2026).

4. Extensions, Adaptive Fusion, and Gating

SDFLoRA has been instantiated and extended in several federated and hybrid configurations:

FedSA-LoRA (Selective Share-A LoRA): Aggregates only $A$ across clients; each client keeps $B$ private. This leads to robust improvements under data heterogeneity (e.g., on GLUE, +1.05pp over traditional LoRA; communication reduced by 50%) (Guo et al., 2024).
FDLoRA (Dual-module, Personalized + Global): Each client maintains both individually personalized and globally aggregated LoRA modules; post-training, an adaptive fusion is performed—combining the dual modules via client-learned weights $w_1, w_2$ minimizing a few-shot loss with $L_1$ regularization (Qi et al., 2024). Fusion can be data-driven and asynchronous, enabling selective preference for global or local adaptation at inference.
Expert Clustering and Routing: SDFLoRA can leverage representation-based clustering (e.g., on $B$ matrices) to allocate a shared "domain" global module and a purely personal one. An adaptive two-way gating network (Editor’s term) optimally combines the outputs, balancing domain and individualized knowledge (Wang et al., 18 Sep 2025).

5. Empirical Performance and Communication–Sustainability Analysis

SDFLoRA achieves consistent empirical gains across natural language understanding (GLUE, QA, log anomaly) and generation (GSM8K, FLAN, Dolly-15k) tasks:

On GLUE (non-IID, Dirichlet $\alpha=0.5$ ): FedSA-LoRA (SDFLoRA) reaches 90.48% vs. 89.43% for classic LoRA (Guo et al., 2024); on GSM8K: 46.63% vs. 46.24%.
Communication cost drops by ~50% compared to federated LoRA (by sending only $A$ or the global module per round, not $B$ or local modules) (Guo et al., 2024, Shen et al., 16 Jan 2026).
Under heterogeneous client ranks, SDFLoRA outperforms zero-padding and full stacking schemes by 1–7pp on major GLUE tasks. Key results:

Task	SDFLoRA (Stack-g)	Zero-pad+Avg
QNLI	96.96	95.81
RTE	99.71	95.45
QQP	87.97	86.72
MNLI-m	72.19	65.48
SST-2	82.30	81.10

Energy/sustainability: On federated vision-LLMs using LoRA, SDFLoRA approaches enable accuracy above 90% with lower communication and energy (e.g., ~24g CO $_2$ for LoRA vs. ~10g for CNN3D under personalized dual-module gating) (Thuau et al., 20 Oct 2025).

6. Algorithmic Implementation

A generic SDFLoRA round comprises:

Broadcast: Server provides latest global module parameters to all participating clients.
Local update:
- Clients initialize adapters with the received global module and their persistent local module.
- For several local steps, gradients and updates for global module occur with DP noise (if required); local module updates remain noise-free.
Upload: Each client uploads the updated global module only.
Aggregation: Server constructs the next global module via weighted stacking and SVD compression.
Broadcast: The compressed module is sent out for the next round; local modules are retained privately at the client.

The dual-module pattern is adapted for expert allocation by clustering clients (e.g., via $B$ -matrix cosine similarities), assigning each to a domain-global expert, and learning a client-specific personal adaptation (Wang et al., 18 Sep 2025).

7. Applications, Limitations, and Outlook

SDFLoRA extends to:

Transformer-based NLP and VLMs, hybrid vision-language systems, and personalized multimodal pipelines (Thuau et al., 20 Oct 2025).
Scenarios with stringent cross-client privacy (DP), heterogeneous computing environments (rank-mismatched LoRA), and highly non-IID data.

Limitations:

Maintaining separate local modules increases per-client storage and on-device computation.
Selection of module split, rank allocation, and compression threshold $r_{\max}$ requires problem-dependent tuning.
Extensions to more advanced adapter architectures and adaptive domain/local capacity allocation remain areas for future work (Shen et al., 16 Jan 2026).

The SDFLoRA framework thus provides a principled, empirically validated approach for scalable and robust federated adaptation of large foundation models, balancing generalization, personalization, and practical system constraints (Shen et al., 16 Jan 2026, Guo et al., 2024, Qi et al., 2024, Zhao et al., 13 Oct 2025, Wang et al., 18 Sep 2025).