Papers
Topics
Authors
Recent
Search
2000 character limit reached

LoRA Meets Dropout under a Unified Framework

Published 25 Feb 2024 in cs.CL and cs.AI | (2403.00812v2)

Abstract: With the remarkable capabilities, LLMs have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked. To fill this gap, we first confirm that parameter-efficient LoRA is also overfitting-prone. We then revisit transformer-specific dropout methods, and establish their equivalence and distinctions mathematically and empirically. Building upon this comparative analysis, we introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure. Through this framework, we reveal the new preferences and performance comparisons of them when involved with limited trainable parameters. This framework also allows us to amalgamate the most favorable aspects into a novel dropout method named HiddenKey. Extensive experiments verify the remarkable superiority and sufficiency of HiddenKey across multiple models and tasks, which highlights it as the preferred approach for high-performance and parameter-efficient finetuning of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  2. Semeval-2017 task 1: Semantic textual similarity - multilingual and cross-lingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
  3. Hiddencut: Simple data augmentation for natural language understanding with better generalizability. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4380–4390.
  4. Netgpt: A native-ai network architecture beyond provisioning personalized generative services. arXiv preprint arXiv:2307.06148.
  5. WilliamB. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases.
  6. The webnlg challenge: Generating text from rdf data. In Proceedings of the 10th International Conference on Natural Language Generation.
  7. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  8. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  9. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  10. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  11. Dropkey for vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22700–22709.
  12. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  13. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68.
  14. Roberta: A robustly optimized bert pretraining approach. Cornell University - arXiv,Cornell University - arXiv.
  15. The e2e dataset: New challenges for end-to-end generation. Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbrücken, Germany, 15-17 August 2017.
  16. OpenAI. 2023. Gpt-4 technical report.
  17. Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822.
  18. A simple but tough-to-beat data augmentation approach for natural language understanding and generation. arXiv preprint arXiv:2009.13818.
  19. Recursive deep models for semantic compositionality over a sentiment treebank. Empirical Methods in Natural Language Processing,Empirical Methods in Natural Language Processing.
  20. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  21. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  22. Attention is all you need. Advances in neural information processing systems, 30.
  23. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
  24. Multilora: Democratizing lora for better multi-task learning. arXiv preprint arXiv:2311.11501.
  25. Neural network acceptability judgments.
  26. R-drop: Regularized dropout for neural networks. Advances in Neural Information Processing Systems, 34:10890–10905.
  27. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.
  28. Dropattention: A regularization method for fully-connected self-attention networks. arXiv preprint arXiv:1907.11065.
Citations (5)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.