Codebook-Free BSQ: Geometric Quantization

Updated 22 February 2026

Codebook-free BSQ is a quantization method that discretizes high-dimensional data by projecting vectors onto a unit sphere and applying binary sign functions, eliminating explicit codebook storage.
The approach leverages deterministic geometric constructions and bit-level sparsity with differentiable optimization to achieve dynamic precision adjustment, resulting in significant compression gains and minimal computational overhead.
Incorporating entropy regularization and drawing parallels to lattice coding, BSQ enhances uniform code utilization and is effectively applied in neural network quantization and visual tokenization for generative models.

Codebook-free Binary Spherical Quantization (BSQ) denotes a class of quantization methods in which high-dimensional data (weights, activations, or embeddings) are discretized without a learned or stored codebook, instead relying on deterministic geometric constructions—most notably the set of binary sign vectors on the unit sphere. BSQ and related schemes yield substantial compression gains and computational efficiencies by exploiting both bit-level parameterization and implicit geometric constraints. These methods are used in neural network quantization, tokenization for image/video representation, and as discrete bottlenecks for generative models, providing strong tradeoffs between fidelity and compression rate without the overhead of storing or searching explicit codebooks (Yang et al., 2021, Zhao et al., 2024, Zhao et al., 16 Dec 2025).

1. Mathematical Formalism of Codebook-Free BSQ

In codebook-free BSQ, quantization is achieved by projecting data to a low-dimensional hypersphere followed by elementwise binary quantization. Consider a $d$ -dimensional vector $v$ . The process comprises:

Unit Spherical Projection: Normalize $v$ to $u = v/\|v\|_2$ , ensuring $\|u\|_2 = 1$ .
Binary Quantization: Each coordinate of $u$ is quantized by its sign, resulting in $q = \mathrm{sign}(u) \in \{\pm1\}^d$ . The quantized vector is often rescaled (e.g., $q' = q/\sqrt{d}$ ) to ensure it remains on the unit sphere.
Reconstruction: A decoder (possibly linear) projects the quantized code $q'$ back to the original space.

The set of all possible codes, $\mathcal{C}_{\mathrm{BSQ}} = \{\pm1/\sqrt{d}\}^d$ , constitutes an implicit, exponentially large codebook but is never explicitly stored. This scaffolding enables parameter-free, lookup-free operation at both training and inference (Zhao et al., 2024, Zhao et al., 16 Dec 2025).

The quantization process can be summarized as: $\begin{aligned} &\text{Encode:} \quad v \mapsto u = v/\|v\|_2 \mapsto q = \mathrm{sign}(u)/\sqrt{d} \ &\text{Decode:} \quad q \mapsto Dq \end{aligned}$ where $D$ is a learned linear map bridging quantized codes to the downstream model.

2. Bit-Level Sparsity and Differentiable Optimization

BSQ extends beyond vector quantization. For neural network quantization, bit-level sparsity is achieved by exposing each bit of a fixed-point quantization as an independent trainable variable. Consider weights $w_i$ with per-layer scaling $s$ , and their $n$ -bit quantized representation: $w_i \approx s \frac{\text{sign}(w_{s,i})}{2^n - 1} \sum_{b=0}^{n-1} w_i^{(b)} 2^b$ Here, $w_i^{(b)} \in \{0,1\}$ are bit-planes relaxed to $[0,2]$ during training. Differentiable group Lasso regularization over bit-planes

$R_{\text{bit}}(W^g) = \sum_{b=0}^{n-1} \left\| \tilde W_g^{(b)} \right\|_2$

induces group-wise sparsity, driving entire bit-planes in each group $g$ to zero and enabling dynamic precision reduction per layer (Yang et al., 2021). Optimization is performed with straight-through estimators (STE) to enable gradient-based learning despite discrete quantization.

Periodically, bit-planes pruned to all-zeros are removed (dynamic precision adjustment), and the bitwidths of network layers can be reduced during training, yielding a mixed-precision scheme. Critically, no codebook of centroids or vectors is required at any stage; the "codebook" is simply the space of attainable binary sign vectors or bit-combinations.

3. Entropy Regularization and Implicit Geometric Codebooks

Because $\mathcal{C}_{\mathrm{BSQ}}$ can be exponentially large, ensuring efficient use of its representational capacity demands entropy-based regularization. The entropy loss in BSQ-based autoencoders encourages both (i) tight cluster assignments (low-variance codes) and (ii) uniform use of codes (equiprobable cells on the hypersphere): $L_{\text{entropy}} = \mathbb{E}_{u}[H(\hat{q}(\cdot|u))] - \gamma H(\mathbb{E}_u[\hat{q}(\cdot|u)])$ where $\hat{q}(c|u) \propto \exp(\tau c^\top u)$ and $\tau$ is a temperature parameter. This regularization maintains the effectiveness of the implicit codebook and addresses quantizer degeneracies that can arise in lookup-free configurations (Zhao et al., 2024, Zhao et al., 16 Dec 2025).

The geometric interpretation is that BSQ layouts correspond to the corners of a hypercube intersected with a hypersphere. Alternative codebook-free quantizers with more favorable packing properties, such as Spherical Leech Quantization (Λ₄₂-SQ), have been constructed using densest lattice packings to further optimize rate-distortion tradeoffs (Zhao et al., 16 Dec 2025).

4. Applications in Neural Network Quantization and Visual Tokenization

Codebook-free BSQ has been deployed in mixed-precision weight quantization, activation quantization, and transformer-based tokenization pipelines.

Mixed-Precision Quantization: BSQ enables single-pass, gradient-based quantization of weights and activations, adapting layerwise precision by bit-plane pruning. Empirical results show BSQ achieves 14.2× to 36.6× compression on standard architectures (e.g., ResNet-20 on CIFAR-10) with negligible to moderate accuracy drop (e.g., 92.3% vs. 92.6% top-1 at 2.25 bits/weight) (Yang et al., 2021).
Visual Tokenization: In transformer tokenizers (BSQ-ViT), BSQ encodes visual feature maps to binary spherical codes (e.g., 18 to 36 bits/token) before decoding or further modeling. This yields up to 100× compression ratios, significant throughput improvements (2.4× over prior art; e.g., 45 fps vs. 18.9 fps on a standard GPU), and enables effective image/video reconstruction and generative modeling (Zhao et al., 2024).
Entropy Coding: The binary tokens generated by BSQ facilitate downstream arithmetic or masked language modeling for efficient, adaptive compression. Transformer-based AR priors squeeze further redundancy out of BSQ codes, achieving compression rates competitive with state-of-the-art codecs (Zhao et al., 2024).

5. Relation to Lattice Coding and Advances Beyond Standard BSQ

BSQ is a special case of lattice coding under geometric constraints. The codebook-free construction admits generalization:

Lattice Coding Perspective: The set $\mathcal{C}_{\mathrm{BSQ}}$ is the intersection of the $d$ -dimensional cube with the sphere, corresponding to a binary lattice projected onto $\mathcal{S}^{d-1}$ (Zhao et al., 16 Dec 2025).
Comparison to Leech Lattice Quantization: Spherical Leech Quantization (Λ₂₄-SQ) employs the first shell of the Leech lattice in $\mathbb{R}^{24}$ , yielding a set of 196,560 vectors with minimum angular separation $δ_\text{min} ≈ 0.866$ , considerably denser than that of BSQ ( $δ_\text{min} ≈ 0.471$ for $d=18$ ). Λ₂₄-SQ demonstrably improves rate–distortion performance across reconstruction quality metrics (e.g., rFID, LPIPS, PSNR) and enables more uniform symbol utilization in autoregressive generation frameworks (Zhao et al., 16 Dec 2025).

Method	Underlying Codebook	Minimum Separation ( $\delta_\text{min}$ )	Typical Use-case
BSQ	$\{\pm1/\sqrt{d}\}^d$ (cube $\cap$ sphere)	$\sim 0.471$ (for $d=18$ )	Neural quantization, tokenization
Λ₂₄-SQ	Leech lattice shell	$\sim 0.866$	Tokenization, AR models

This suggests evolving codebook-free schemes can leverage denser geometric packings to surpass the basic BSQ construction in both empirical compression and generation quality.

6. Empirical Performance and Practical Considerations

BSQ-based codebook-free quantization offers notable practical advantages:

Parameter Efficiency: No explicit storage or learning of codeword arrays is required; storage/cost scales as $O(d)$ , avoiding the $O(Kd)$ cost of codebook-based VQ (Zhao et al., 2024).
Computational Efficiency: Quantization is reduced to sign computations and scaling, avoiding nearest-neighbor search; decoder overhead is minimal.
Layer-wise Adaptivity: Bit-level sparsity regularization automatically determines precision allocation among layers or groups, tracking known sensitivity without manual intervention.
Extensibility: The same methods are extensible to activation quantization, and adaptable to visual data beyond static images, such as variable-length videos with blockwise causal masking (Yang et al., 2021, Zhao et al., 2024).
Compression and Fidelity: On benchmarks such as ImageNet and COCO, BSQ achieves up to 100× compression with minimal perceptual loss. Integration of BSQ codes with AR entropy models or masked LLMs further narrows the gap to highly engineered codecs (e.g., H.264/HEVC) and delivers state-of-the-art synthesis metrics (e.g., FID of 5.44 on ImageNet-128, comparable to leading GAN and diffusion models) (Zhao et al., 2024, Zhao et al., 16 Dec 2025).

A plausible implication is that codebook-free BSQ and its lattice-coded variants are primed for scalable deployment in high-throughput, low-resource, and streaming generative modeling pipelines, specifically where codebook parameter count and nearest-neighbor lookup are major bottlenecks.

7. Limitations and Theoretical Perspectives

While codebook-free BSQ eliminates codebook overhead and training instability found in learned VQ, its reliance on geometric codebook structure imposes limits on achievable rate–distortion frontiers, especially at high code rates or in low dimensions. The simple hypercube intersection (BSQ) may result in less uniform packing compared to optimal lattice packings (e.g., Leech lattice). Empirical studies show significant gains by adopting more sophisticated implicit codebooks, and entropy regularization is essential to avoid code collapse or non-uniform code assignment (Zhao et al., 16 Dec 2025). Ongoing work explores extending these frameworks to higher-dimensional lattices and joint optimization of projection and quantization geometry to further improve representational efficiency.

Codebook-free BSQ represents a parameter-efficient, scalable, and empirically competitive class of quantization methods grounded in geometric constructions, enabling high-fidelity, compressed discrete representations for neural networks and generative models without reliance on learned codebooks (Yang et al., 2021, Zhao et al., 2024, Zhao et al., 16 Dec 2025).