Evolving Knowledge Distillation with Large Language Models and Active Learning
Abstract: LLMs have demonstrated remarkable capabilities across various NLP tasks. However, their computational costs are prohibitively high. To address this issue, previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data. Nonetheless, these works have mainly focused on the direct use of LLMs for text generation and labeling, without fully exploring their potential to comprehend the target task and acquire valuable knowledge. In this paper, we propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using LLMs, simultaneously improving the task capabilities of small domain model (student model). Different from previous work, we actively analyze the student model's weaknesses, and then synthesize labeled samples based on the analysis. In addition, we provide iterative feedback to the LLMs regarding the student model's performance to continuously construct diversified and challenging samples. Experiments and analysis on different NLP tasks, namely, text classification and named entity recognition show the effectiveness of EvoKD.
- Dana Angluin. 1988. Queries and concept learning. Machine learning, 2:319–342.
- Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation.
- Language models are few-shot learners.
- Palm: Scaling language modeling with pathways.
- Scaling instruction-finetuned language models.
- Active learning with statistical models.
- Auggpt: Leveraging chatgpt for text data augmentation.
- Plug and play language models: A simple approach to controlled text generation.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
- Active prompting with chain-of-thought for large language models.
- Guiding generative language models for data augmentation in few-shot text classification.
- Learning to teach.
- Self-guided noise-free data generation for efficient zero-shot learning.
- Fast rates in pool-based batch active learning.
- Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Kong, China. Association for Computational Linguistics.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819.
- Knowledge distillation of large language models.
- XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online. Association for Computational Linguistics.
- Distilling the knowledge in a neural network.
- Large language models are reasoning teachers.
- Illustrating reinforcement learning from human feedback (rlhf). Hugging Face Blog. Https://huggingface.co/blog/rlhf.
- Universal information extraction as unified semantic matching.
- Unified structure generation for universal information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5755–5772, Dublin, Ireland. Association for Computational Linguistics.
- Edward Ma. 2019. Nlp augmentation. https://github.com/makcedward/nlpaug.
- George A. Miller. 1994. WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994.
- Seungwhan Moon and Jaime G. Carbonell. 2019. Learn to active learn: Dynamic active learning with attention-based strategies selection.
- Fredrik Olsson. 2009. A literature survey of active machine learning in the context of natural language processing.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Language models are unsupervised multitask learners.
- A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40.
- Proximal policy optimization algorithms.
- Raphael Schumann and Ines Rehbein. 2019. Active learning via membership query synthesis for semi-supervised sentence classification. In Proceedings of the 23rd conference on computational natural language learning (CoNLL), pages 472–481.
- Ozan Sener and Silvio Savarese. 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489.
- Burr Settles. 2009. Active learning literature survey.
- Does synthetic data generation of llms help clinical text mining?
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Zeroshotdataaug: Generating and augmenting training data with chatgpt.
- Attention is all you need.
- Want to reduce labeling cost? GPT-3 can help. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4195–4205, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Community preserving network embedding. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).
- Self-instruct: Aligning language model with self generated instructions.
- Few-shot text classification with triplet networks, data augmentation, and curriculum learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5493–5500, Online. Association for Computational Linguistics.
- Emergent abilities of large language models.
- Lamini-lm: A diverse herd of distilled models from large-scale instructions. CoRR, abs/2304.14402.
- Precedent-enhanced legal judgment prediction with llm and domain-model collaboration.
- Baichuan 2: Open large-scale language models.
- ZeroGen: Efficient zero-shot learning via dataset generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11653–11669, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cold-start active learning through self-supervised language modeling. arXiv preprint arXiv:2010.09535.
- Controllable text-to-image generation with gpt-4.
- Dynamic active learning based on agreement and applied to emotion recognition in spoken interactions. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, page 275–278, New York, NY, USA. Association for Computing Machinery.
- Uer: An open-source toolkit for pre-training models. EMNLP-IJCNLP 2019, page 241.
- Fine-tuning language models from human preferences.
- Xavier Carreras and LluÃs Mà rquez. 2004. Introduction to the CoNLL-2004 shared task: Semantic role labeling. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, pages 89–97, Boston, Massachusetts, USA. Association for Computational Linguistics.
- The multilingual amazon reviews corpus. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.