The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Abstract: Recent research, such as BitNet, is paving the way for a new era of 1-bit LLMs. In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
- PIQA: reasoning about physical commonsense in natural language. CoRR, abs/1911.11641, 2019.
- QuIP: 2-bit quantization of large language models with guarantees. CoRR, abs/2307.13304, 2023.
- Boolq: Exploring the surprising difficulty of natural yes/no questions. CoRR, abs/1905.10044, 2019.
- Together Computer. Redpajama: an open dataset for training large language models, 2023.
- OPTQ: accurate quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations, 2023.
- Gpipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in Neural Information Processing Systems, pages 103–112, 2019.
- Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Conference on Solid-State Circuits Conference, ISSCC 2014, Digest of Technical Papers, San Francisco, CA, USA, February 9-13, 2014, pages 10–14, 2014.
- Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023.
- AWQ: activation-aware weight quantization for LLM compression and acceleration. CoRR, abs/2306.00978, 2023.
- Can a suit of armor conduct electricity? A new dataset for open book question answering. CoRR, abs/1809.02789, 2018.
- Pointer sentinel mixture models, 2016.
- The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics, 2016.
- Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, abs/1910.10683, 2019.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024.
- WinoGrande: an adversarial winograd schema challenge at scale. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, pages 8732–8740, 2020.
- Noam Shazeer. GLU variants improve transformer. CoRR, abs/2002.05202, 2020.
- Stablelm 3b 4e1t.
- Quip#: Even better LLM quantization with hadamard incoherence and lattice codebooks. CoRR, abs/2402.04396, 2024.
- LLaMA: open and efficient foundation language models. CoRR, abs/2302.13971, 2023.
- Llama 2: open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023.
- Crowdsourcing multiple choice science questions. In Leon Derczynski, Wei Xu, Alan Ritter, and Tim Baldwin, editors, Proceedings of the 3rd Workshop on Noisy User-generated Text, NUT@EMNLP 2017, Copenhagen, Denmark, September 7, 2017, pages 94–106. Association for Computational Linguistics, 2017.
- Ladder: Efficient tensor compilation on customized data format. In OSDI, 2023.
- Bitnet: Scaling 1-bit transformers for large language models. CoRR, abs/2310.11453, 2023.
- SmoothQuant: accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, 2023.
- Quick and (not so) dirty: Unsupervised selection of justification sentences for multi-hop question answering. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, EMNLP-IJCNLP, 2019.
- HellaSwag: can a machine really finish your sentence? In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 4791–4800, 2019.
- Root mean square layer normalization. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems, pages 12360–12371, 2019.
- PokeBNN: A binary pursuit of lightweight accuracy. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12465–12475. IEEE, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.