1 bit is all we need: binary normalized neural networks

Published 7 Sep 2025 in cs.LG and cs.AI | (2509.07025v1)

Abstract: The increasing size of large neural network models, specifically LLMs and foundational image models, poses deployment challenges, prompting efforts to reduce memory requirements and enhance computational efficiency. These efforts are critical to ensure practical deployment and effective utilization of these models across various applications. In this work, a novel type of neural network layers and models is developed that uses only single-bit parameters. In this novel type of models all parameters of all layers, including kernel weights and biases, only have values equal to zero or one. This novel type of models uses layers named as binary normalized layer. These binary normalized layers can be of any type, such as fully connected, convolutional, attention, etc., and they consist of slight variations of the corresponding conventional layers. To show the effectiveness of the binary normalized layers, two different models are configured to solve a multiclass image classification problem and a language decoder to predict the next token of a sequence. The model to solve the image classification has convolutional and fully connected layers, and the LLM is composed of transformer blocks with multi-head attention. The results show that models with binary normalized layers present almost the same results obtained by equivalent models with real 32-bit parameters. The binary normalized layers allow to develop models that use 32 times less memory than current models and have equivalent performance. Besides, the binary normalized layers can be easily implemented on current computers using 1-bit arrays, and do not require the development of dedicated electronic hardware. This novel type of layers opens a new era for large neural network models with reduced memory requirements that can be deployed using simple and cheap hardware, such as mobile devices or only cpus.

Abstract PDF Upgrade to Chat

Summary

The paper introduces binary normalized layers that leverage dual precision (32-bit for training and 1-bit for inference) to maintain efficiency and performance.
Experiments on convolutional and transformer models reveal that binary networks achieve similar validation accuracies and rapid convergence compared to traditional models.
The approach offers significant memory reduction, enabling deployment on resource-constrained devices without sacrificing accuracy.

1 Bit is All We Need: Binary Normalized Neural Networks

Introduction

The paper presents a novel class of neural network models characterized by binary normalized layers, where all parameters are reduced to a single bit. This approach addresses the challenge of deploying large neural networks by drastically minimizing memory usage and enhancing computational efficiency. Compared to traditional models featuring 32-bit floating-point precision, these models employ binary normalized layers to maintain comparable performance while significantly reducing resource demands.

Binary Normalized Layers

Binary normalized layers are designed to operate with single-bit parameters, including kernel weights and biases. During training, these parameters maintain dual representations: full-precision 32-bit for gradient updates and 1-bit for forward computations. Through quantization, parameters transition from floating-point to binary using a threshold defined by the mean value, as formalized in Equation 1. This dual approach ensures effective training while achieving efficient, memory-reduced inference with entirely binarized parameters post-training.

Implementation of Binary Models

Image Classification with Convolutional Layers

Two binary convolutional models were configured for a multiclass image classification task using the Food-101 dataset, differing only in filter dimensions (3×3 and 5×5). All layers employ binary weights except activation functions, which are relu or softmax in classification layers. These models illustrate how binary normalized layers can preserve rapid convergence and comparable accuracy despite relying solely on 1-bit parameters.

Figure 1: Training results of image classification problem with the convolutional models.

Language Decoding with Transformer Models

The paper configures binary transformer models for token prediction tasks using the WikiText-103-raw dataset. Two variants—small and large—are tested to evaluate performance relative to standard 32-bit models. The binary models integrated transformer blocks with attention mechanisms and MLP heads, demonstrating that increased model complexity can be supported even with reduced precision parameters.

Figure 2: Training results of the language decoders.

Performance Evaluation

The binary models demonstrate effective training stability and performance comparable to conventional models in both image classification and language decoder tasks. In the convolutional models, binary architectures achieved validation accuracies close to their float-based counterparts with minimal training epochs. The language decoding results showed similar performance between binary and standard models, with large binary models exceeding the standard model's metrics. These results indicate that scaling binary networks with more units allows them to match the efficacy of higher-precision models.

Implications and Future Work

The reduction in memory usage to 1/32 of traditional models presents substantial implications for deploying AI on resource-constrained devices, such as mobile platforms and CPUs. Additionally, this approach could lead to increased complexity in model architectures without the associated increase in computational burden. Future work may focus on refining activation precision and exploring efficient implementations of single-bit operations, which could further amplify these models' computational gains and extend their applicability to larger-scale AI tasks in embedded environments.

Conclusion

Binary normalized neural networks embody a significant step towards optimizing neural networks for practical deployment across diverse hardware environments. By reducing parameters to a single bit without compromising performance, these models promise a substantial leap in deploying AI technologies beyond data centers. Continued exploration into optimizing these models' training algorithms and activation precisions is anticipated to accelerate their adoption in scenarios where computational resources are limited, offering more expansive AI capabilities.

Markdown Report Issue