Papers
Topics
Authors
Recent
Search
2000 character limit reached

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

Published 29 Feb 2024 in cs.CR, cs.AI, and cs.CL | (2403.00108v2)

Abstract: Finetuning LLMs with LoRA has gained significant popularity due to its simplicity and effectiveness. Often, users may even find pluggable, community-shared LoRAs to enhance their base models for a specific downstream task of interest; enjoying a powerful, efficient, yet customized LLM experience with negligible investment. However, this convenient share-and-play ecosystem also introduces a new attack surface, where attackers can distribute malicious LoRAs to a community eager to try out shared assets. Despite the high-risk potential, no prior art has comprehensively explored LoRA's attack surface under the downstream-enhancing share-and-play context. In this paper, we investigate how backdoors can be injected into task-enhancing LoRAs and examine the mechanisms of such infections. We find that with a simple, efficient, yet specific recipe, a backdoor LoRA can be trained once and then seamlessly merged (in a training-free fashion) with multiple task-enhancing LoRAs, retaining both its malicious backdoor and benign downstream capabilities. This allows attackers to scale the distribution of compromised LoRAs with minimal effort by leveraging the rich pool of existing shared LoRA assets. We note that such merged LoRAs are particularly infectious -- because their malicious intent is cleverly concealed behind improved downstream capabilities, creating a strong incentive for voluntary download -- and dangerous -- because under local deployment, no safety measures exist to intervene when things go wrong. Our work is among the first to study this new threat model of training-free distribution of downstream-capable-yet-backdoor-injected LoRAs, highlighting the urgent need for heightened security awareness in the LoRA ecosystem. Warning: This paper contains offensive content and involves a real-life tragedy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Mathqa: Towards interpretable math word problem solving with operation-based formalisms.
  3. Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning, pages 33–40.
  4. Program synthesis with large language models.
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  7. Stealthy and persistent unalignment on large language models via backdoor injections. arXiv preprint arXiv:2312.00027.
  8. Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca.
  9. Theoremqa: A theorem-driven question answering dataset.
  10. Comprehensive assessment of jailbreak attacks against llms. arXiv preprint arXiv:2402.05668.
  11. Security and privacy challenges of large language models: A survey. arXiv preprint arXiv:2402.00888.
  12. A gradient control method for backdoor attacks on parameter-efficient tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3508–3520.
  13. Data poisoning for in-context learning. arXiv preprint arXiv:2402.02160.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  15. Composite backdoor attacks against large language models. arXiv preprint arXiv:2310.07676.
  16. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176.
  17. Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b. arXiv preprint arXiv:2310.20624.
  18. Fine-tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693.
  19. Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser. Computing Research Repository, arXiv:1503.06733. Version 2.
  20. On the exploitability of instruction tuning. arXiv preprint arXiv:2306.17194.
  21. An embarrassingly simple approach for trojan attack in deep neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 218–228.
  22. Setting the trap: Capturing and defeating backdoors in pretrained language models through honeypots. arXiv preprint arXiv:2310.18633.
  23. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  24. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  25. Virtual prompt injection for instruction-tuned large language models. arXiv preprint arXiv:2307.16888.
  26. Large language models for healthcare data augmentation: An example on patient-trial matching. In AMIA Annual Symposium Proceedings, volume 2023, page 1324. American Medical Informatics Association.
  27. Composing parameter-efficient modules with arithmetic operations. arXiv preprint arXiv:2306.14870.
  28. Loraretriever: Input-aware lora retrieval and composition for mixed tasks in the wild.
Citations (11)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.