Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer
Abstract: While LLMs are empowered with broad knowledge, their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. In this paper, we propose a novel approach to enhance LLMs with smaller LLMs (SLMs) that are trained on clients using their private task-specific data. To enable mutual enhancement between LLMs and SLMs, we propose CrossLM, where the SLMs promote the LLM to generate task-specific high-quality data, and both the LLM and SLMs are enhanced with the generated data. We evaluate CrossLM using publicly accessible LLMs across a range of benchmark tasks. The results demonstrate that CrossLM significantly enhances the task-specific performance of SLMs on clients and the LLM on the cloud server simultaneously while preserving the LLM's generalization capability.
- Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274, 2023.
- Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676, 2019.
- Don’t stop pretraining: Adapt language models to domains and tasks. In ACL, pages 8342–8360, 2020.
- LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- General Data Protection Regulation. General data protection regulation (gdpr). Intersoft Consulting, Accessed in October, 24(1), 2018.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282, 2017.
- Reduce communication costs and preserve privacy: Prompt tuning method in federated learning. arXiv preprint arXiv:2208.12268, 2022.
- When federated learning meets pre-trained language models’ parameter-efficient tuning methods. In ACL, 2022.
- SLoRA: Federated parameter efficient fine-tuning of language models. arXiv preprint arXiv:2308.06522, 2023.
- Language models are few-shot learners. NeurIPS, 33:1877–1901, 2020.
- Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870, 2023.
- Fedbert: When federated learning meets pre-training. ACM Transactions on Intelligent Systems and Technology (TIST), 13(4):1–26, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
- Superglue: A stickier benchmark for general-purpose language understanding systems. NeurIPS, 32, 2019.
- Federated learning meets natural language processing: A survey. arXiv preprint arXiv:2107.12603, 2021.
- Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
- Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019.
- Pretraining federated text models for next word prediction. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2, pages 477–488, 2021.
- Applied federated learning: Improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903, 2018.
- Parameter-efficient transfer learning for nlp. In ICML, pages 2790–2799, 2019.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In ACL, pages 1–9, 2022.
- Federated learning of large models at the edge via principal sub-model training. arXiv preprint arXiv:2208.13141, 2022.
- Hierarchical neural story generation. In ACL, pages 889–898, 2018.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2, 2019.
- Recursive deep models for semantic compositionality over a sentiment treebank. In ACL, pages 1631–1642, 2013.
- Learning word vectors for sentiment analysis. In ACL, pages 142–150, 2011.
- Character-level convolutional networks for text classification. NeurIPS, 28, 2015.
- Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
- A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695, 2023.
- Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022.
- The penn treebank: Annotating predicate argument structure. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994, 1994.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.