Exploring Gradient Subspaces: Addressing and Overcoming LoRA's Limitations in Federated Fine-Tuning of Large Language Models
Abstract: LLMs have demonstrated remarkable capabilities across various domains, particularly in task generalization for both text and vision data. While fine-tuning these models can significantly enhance their performance on specific downstream tasks, it often requires high-quality data that cannot be shared due to privacy concerns. Federated Learning (FL) offers a promising solution for collaborative training without direct data sharing. However, many parameter-efficient fine-tuning strategies for LLMs in FL, particularly those based on Low-Rank Adaptation (LoRA), face limitations. In this paper, we critically analyze the convergence and performance guarantees of popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to constrained subspace learning of low-rank matrices. This limitation hinders effective fine-tuning of LLMs in federated settings. Through rigorous analytical and empirical evaluations, we demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our comprehensive comparison unmasks inefficiencies in LoRA approaches and underscores the advantages of direct weight aggregation. We extend our analysis to low-rank gradient-based optimizers, such as GaLore, used during local training steps. Our findings show that GaLore along with direct-weight aggregation is a more effective approach, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities. While privacy remains paramount in FL discourse, our focus is on assessing performance outcomes of federated fine-tuned models and evaluating various FL frameworks from both theoretical and empirical perspectives. Our findings advocate reassessing the reliance on LoRA within FL contexts, paving the way for more efficient training methodologies.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS’16. ACM, October 2016. doi: 10.1145/2976749.2978318. URL http://dx.doi.org/10.1145/2976749.2978318.
- Fedrolex: Model-heterogeneous federated learning with rolling sub-model extraction. In NeurIPS, 2022.
- Composable sparse fine-tuning for cross-lingual transfer. In ACL (1), pp. 1778 – 1796, 2022. doi: 10.18653/v1/2022.acl-long.125.
- SLoRA: Federated parameter efficient fine-tuning of language models. In International Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023, 2023. URL https://openreview.net/forum?id=06quMTmtRV.
- Federated fine-tuning of large language models under heterogeneous tasks and client resources, 2024. URL https://arxiv.org/abs/2402.11505.
- A question-entailment approach to question answering. BMC Bioinform., 20(1):511:1–511:23, 2019. URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4.
- Language models are few-shot learners. In NeurIPS, 2020.
- Cheng, J. brain tumor dataset, 2017. URL https://figshare.com/articles/dataset/brain_tumor_dataset/1512427/5.
- Heterogeneous loRA for federated fine-tuning of on-device foundation models. In International Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023, 2023. URL https://openreview.net/forum?id=EmV9sGpZ7q.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023. URL https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
- Implicit gradient alignment in distributed and federated learning. In AAAI, pp. 6454 – 6462, 2022. doi: 10.1609/aaai.v36i6.20597.
- Heterofl: Computation and communication efficient federated learning for heterogeneous clients. In ICLR, 2021.
- Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, March 2023. ISSN 2522-5839. doi: 10.1038/s42256-023-00626-4. URL http://dx.doi.org/10.1038/s42256-023-00626-4.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Evaluating large language models in class-level code generation. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, ICSE ’24, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400702174. doi: 10.1145/3597503.3639219. URL https://doi.org/10.1145/3597503.3639219.
- On the effectiveness of adapter-based tuning for pretrained language model adaptation. In ACL/IJCNLP (1), pp. 2208 – 2222, 2021. doi: 10.18653/v1/2021.acl-long.172.
- Parameter-efficient transfer learning for nlp. In ICML, pp. 2790 – 2799, 2019.
- Measuring the effects of non-identical data distribution for federated visual classification, 2019. URL https://arxiv.org/abs/1909.06335.
- Lora: Low-rank adaptation of large language models. In ICLR, 2022.
- SCAFFOLD: Stochastic controlled averaging for federated learning. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 5132–5143. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/karimireddy20a.html.
- Segment anything. In IEEE International Conference on Computer Vision, pp. 3992 – 4003, 2023. doi: 10.1109/iccv51070.2023.00371.
- Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning, 2023. URL https://arxiv.org/abs/2309.00363.
- The power of scale for parameter-efficient prompt tuning. In EMNLP (1), pp. 3045 – 3059, 2021. doi: 10.18653/v1/2021.emnlp-main.243.
- Prefix-tuning: Optimizing continuous prompts for generation. In ACL/IJCNLP (1), pp. 4582 – 4597, 2021. doi: 10.18653/v1/2021.acl-long.353.
- Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp. 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/W04-1013.
- Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=w0H2xGHlkw.
- Decoupled weight decay regularization. In ICLR (Poster), 2019.
- Medsaga: Few-shot memory efficient medical image segmentation using gradient low-rank projection in sam, 2024. URL https://arxiv.org/abs/2407.15042.
- Communication-Efficient Learning of Deep Networks from Decentralized Data. In Singh, A. and Zhu, J. (eds.), Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pp. 1273–1282. PMLR, 20–22 Apr 2017. URL https://proceedings.mlr.press/v54/mcmahan17a.html.
- Federated learning of large models at the edge via principal sub-model training. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), 2022. URL https://openreview.net/forum?id=e97uuEXkSii.
- OpenAI. Gpt-4 technical report. arXiv, pp. 2303–08774, 2023.
- Bleu: a method for automatic evaluation of machine translation. In Isabelle, P., Charniak, E., and Lin, D. (eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040.
- Federated full-parameter tuning of billion-sized language models with communication cost under 18 kilobytes, 2024. URL https://arxiv.org/abs/2312.06353.
- Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 8748–8763. PMLR, 18–24 Jul 2021a. URL https://proceedings.mlr.press/v139/radford21a.html.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021b.
- Ruder, S. An overview of gradient descent optimization algorithms, 2017. URL https://arxiv.org/abs/1609.04747.
- Dial-insight: Fine-tuning large language models with high-quality domain-specific data preventing capability collapse. ArXiv, abs/2403.09167, 2024a. URL https://api.semanticscholar.org/CorpusID:268385402.
- Improving loRA in privacy-preserving federated learning. In The Twelfth International Conference on Learning Representations, 2024b. URL https://openreview.net/forum?id=NLPzL6HWNl.
- Gemma: Open models based on gemini research and technology, 2024. URL https://arxiv.org/abs/2403.08295.
- JoMA: Demystifying multilayer transformers via joint dynamics of MLP and attention. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=LbJqRGNYCf.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. In NIPS, pp. 5998 – 6008, 2017.
- Federated learning priorities under the european union artificial intelligence act, 2024. URL https://arxiv.org/abs/2402.05968.
- Information-theoretic analysis of generalization capability of learning algorithms. Advances in neural information processing systems, 30, 2017.
- Glm-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, 2022.
- Sigmoid loss for language image pre-training. In ICCV, pp. 11941 – 11952, 2023. doi: 10.1109/iccv51070.2023.01100.
- Towards building the federatedgpt: Federated instruction tuning. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024a. doi: 10.1109/icassp48485.2024.10447454.
- Tinyllama: An open-source small language model, 2024b. URL https://arxiv.org/abs/2401.02385.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Galore: Memory-efficient LLM training by gradient low-rank projection. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=hYHsrKDiX7.
- Asymmetry in low-rank adapters of foundation models. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp. 62369–62385. PMLR, 21–27 Jul 2024. URL https://proceedings.mlr.press/v235/zhu24c.html.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.