Papers
Topics
Authors
Recent
Search
2000 character limit reached

When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization

Published 8 Nov 2024 in cs.LG and cs.CL | (2411.05882v1)

Abstract: Contemporary machine learning models, such as LLMs, are powerful, but come with immense resource requirements both at training and inference time. It has been shown that decoder-only LLMs can be trained to a competitive state with ternary weights (1.58 bits per weight), facilitating efficient inference. Here, we start our exploration with non-transformer model architectures, investigating 1.58-bit training for multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based LLMs, namely encoder-only and encoder-decoder models. Our results show that in all of these settings, 1.58-bit training is on par with or sometimes even better than the standard 32/16-bit models.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.