Papers
Topics
Authors
Recent
Search
2000 character limit reached

LoQT: Low-Rank Adapters for Quantized Pretraining

Published 26 May 2024 in cs.LG and cs.CL | (2405.16528v4)

Abstract: Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for language modeling and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Towards efficient post-training quantization of pre-trained language models, 2021.
  2. Scalable methods for 8-bit training of neural networks, 2018.
  3. Logarithmic unbiased quantization: Simple 4-bit training in deep learning, 2022.
  4. Llm.int8(): 8-bit matrix multiplication for transformers at scale, 2022.
  5. 8-bit optimizers via block-wise quantization, 2022.
  6. Qlora: Efficient finetuning of quantized llms, 2023.
  7. Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023.
  8. Extreme compression of large language models via additive quantization, 2024.
  9. Gptq: Accurate post-training quantization for generative pre-trained transformers, 2023.
  10. A survey of quantization methods for efficient neural network inference. CoRR, abs/2103.13630, 2021.
  11. Gradient descent happens in a tiny subspace, 2018.
  12. Lora+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354, 2024.
  13. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, 2023.
  14. Rethinking channel dimensions to isolate outliers for low-bit weight quantization of large language models, 2024.
  15. Training compute-optimal large language models, 2022.
  16. Lora: Low-rank adaptation of large language models, 2021.
  17. Learning to quantize deep networks by optimizing quantization intervals with task loss, 2018.
  18. Adam: A method for stochastic optimization, 2017.
  19. How many degrees of freedom do we need to train deep networks: a loss landscape perspective, 2022.
  20. FlexRound: Learnable rounding based on element-wise division for post-training quantization. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 18913–18939. PMLR, 23–29 Jul 2023.
  21. Loftq: Lora-fine-tuning-aware quantization for large language models, 2023.
  22. Relora: High-rank training through low-rank updates, 2023.
  23. Apiq: Finetuning of 2-bit quantized large language model, 2024.
  24. Llm-qat: Data-free quantization aware training for large language models. arXiv preprint arXiv:2305.17888, 2023.
  25. Full parameter fine-tuning for large language models with limited resources, 2023.
  26. The era of 1-bit llms: All large language models are in 1.58 bits, 2024.
  27. Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models, 2024.
  28. Fp8-lm: Training fp8 large language models, 2023.
  29. Training and inference of large language models using 8-bit floating point, 2023.
  30. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints, 2019.
  31. Omniquant: Omnidirectionally calibrated quantization for large language models, 2024.
  32. Adafactor: Adaptive learning rates with sublinear memory cost, 2018.
  33. Q-bert: Hessian based ultra low precision quantization of bert, 2019.
  34. Llama 2: Open foundation and fine-tuned chat models, 2023.
  35. Quip : Even better llm quantization with hadamard incoherence and lattice codebooks, 2024.
  36. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Tal Linzen, Grzegorz Chrupała, and Afra Alishahi, editors, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics.
  37. Glue: A multi-task benchmark and analysis platform for natural language understanding, 2019.
  38. Bitnet: Scaling 1-bit transformers for large language models, 2023.
  39. Training deep neural networks with 8-bit floating point numbers, 2018.
  40. Stable and low-precision training for large-scale vision-language models, 2023.
  41. Jetfire: Efficient and accurate transformer pretraining with int8 data flow and per-block quantization, 2024.
  42. Smoothquant: Accurate and efficient post-training quantization for large language models, 2024.
  43. Q8bert: Quantized 8bit bert. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS). IEEE, December 2019.
  44. Galore: Memory-efficient llm training by gradient low-rank projection, 2024.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 26 likes about this paper.

Reddit

  1. Why no one was talking about this paper? (0 points, 3 comments)