Quantized Fourier Features (QFF)

Updated 18 February 2026

Quantized Fourier Features (QFF) are quantized versions of Random Fourier Features that achieve drastic compression while maintaining kernel approximation quality.
They employ advanced methods like Lloyd–Max, Sigma–Delta, asymmetric, and tensorized quantization to balance bitrate, accuracy, and resource efficiency.
QFF techniques are applied in neural field models and high-dimensional kernel machines, enabling efficient learning with significant reductions in memory and computation.

Quantized Fourier Features (QFF) extend the widely used Random Fourier Features (RFF) paradigm by imposing quantization schemes on RFF embeddings, enabling drastic compression in storage and computation with theoretically bounded degradation in kernel approximation and empirical performance. QFF methods span low-bit Lloyd–Max quantization and its variants, Sigma–Delta and distributed noise-shaping quantization, asymmetric quantization for client–server and embedded devices, tensorized (quantized) Fourier feature constructions for expressivity, and recent adaptive binning for neural field representations. These frameworks exploit statistical properties of RFFs—particularly the parameter-independence of their marginal distribution—yielding quantizers that achieve near-optimal distortion, empirical kernel reconstruction, and downstream learning performance, with orders-of-magnitude reductions in resource requirements.

1. Mathematical Foundations of Quantized Fourier Features

Quantized Fourier Features are built upon the RFF approximation for shift-invariant kernels, notably the Gaussian (RBF) kernel. For $u,v\in\mathbb{R}^d$ , with $\|u\|=\|v\|=1$ , the Gaussian kernel is given by

$K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$

RFF approximates $K_\gamma(u,v)$ through features

$z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$

with $w_i \sim N(0,I_d)$ and $b_i \sim \mathrm{Unif}[0,2\pi]$ . Each coordinate $Z = \cos(\gamma\,w^T u + b)$ has marginal density

$f_Z(z) = \frac{1}{\pi\sqrt{1-z^2}},\quad z\in[-1,1],$

which is independent of the Gaussian kernel bandwidth parameter $\gamma$ due to the phase randomization inherent in RFF generation, as shown by convolution arguments (Li et al., 2021). This property is foundational for QFF algorithms.

2. Quantization Schemes: Lloyd–Max, LM $\|u\|=\|v\|=1$ 0, Sigma–Delta, and Distributed Noise Shaping

QFF methodologies adopt several quantization designs to encode RFF vectors into compact representations.

Lloyd–Max (LM) Quantization:

Applies optimal (in mean-squared error) scalar quantization to $\|u\|=\|v\|=1$ 1 using the known density $\|u\|=\|v\|=1$ 2. With $\|u\|=\|v\|=1$ 3 quantization levels and thresholds $\|u\|=\|v\|=1$ 4, the recursive LM equations optimize centroids $\|u\|=\|v\|=1$ 5 and thresholds $\|u\|=\|v\|=1$ 6 by alternating minimization of

$\|u\|=\|v\|=1$ 7

where $\|u\|=\|v\|=1$ 8 is the quantized output (Li et al., 2021).

LM $\|u\|=\|v\|=1$ 9–RFF:

Targets quantization of $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 0, optimizing squared errors in the high-similarity regime where $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 1. The procedure performs LM on $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 2 under $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 3, then symmetrizes and maps back to $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 4, reducing error for applications sensitive to squared-cosine error, notably in vanilla (unnormalized) estimators.

Sigma–Delta and Distributed Noise-Shaping Quantization:

Sequential recursive schemes, such as first-order Sigma–Delta, quantize $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 5 using feedback to achieve noise shaping: $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 6 with advanced schemes of order $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 7 extending this to higher-order difference matrices (Zhang et al., 2021). Distributed noise-shaping employs nonlocal feedback with a parameter $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 8 and constructs a condensed embedding via a linear transform. Both classes admit nonasymptotic uniform kernel approximation error bounds.

3. Theoretical Error, Bitrate, and Memory–Accuracy Tradeoffs

All QFF schemes provide explicit theoretical characterizations of distortion and memory usage:

Distortion vs. Bitrate: For quantizers with bit depth $K_\gamma(u,v) = \exp\left(-\frac{\gamma^2}{2}\|u-v\|^2\right) = \exp\left(-\gamma^2(1-\rho)\right),\quad \rho= u^T v.$ 9, both LM and noise-shaping quantizers achieve $K_\gamma(u,v)$ 0 as $K_\gamma(u,v)$ 1. In practice, as few as $K_\gamma(u,v)$ 2 or $K_\gamma(u,v)$ 3 bits suffice for $K_\gamma(u,v)$ 4 increases in kernel regression or SVM error rates compared to full-precision RFFs (Li et al., 2021, Zhang et al., 2021).
Memory Complexity: Each data point requires $K_\gamma(u,v)$ 5 bits; 2-bit quantization provides a $K_\gamma(u,v)$ 6 storage saving over 32-bit float representations.
Kernel Estimate Error: For a quantized estimator

$K_\gamma(u,v)$ 7

mean and variance are controlled analytically by the quantizer distortion $K_\gamma(u,v)$ 8; a normalized estimator further reduces variance, especially for 1-bit quantization.

Error Bounds: For Sigma–Delta quantizers, the error decays polynomially in the number of features $K_\gamma(u,v)$ 9, and exponentially in compaction and bit rate under combined compression (Zhang et al., 2021).

Empirical results consistently indicate that LM and noise-shaping QFFs outperform stochastic and naive sign quantization, especially at ultra-low bitrates.

4. Asymmetric and Adaptive Quantization Strategies

QFF extends to asymmetric random periodic features as shown in (Schellekens et al., 2020), where only one side of a kernel evaluation pipeline employs quantized features. For features $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 0 (square wave), a semi-quantized scheme with one side quantized and the other using the standard cosine map recovers the original kernel (up to known scaling) without expectation bias: $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 1 This exact recovery does not hold for symmetric quantization (both sides quantized), suggesting particular relevance in client–server and embedded inference scenarios. Uniform $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 2 error bounds are established in terms of the sample complexity and the mean Lipschitz smoothness of the periodic map.

This approach achieves order-of-magnitude bitrate reductions (e.g., 1-bit per entry) with negligible (<5%) accuracy degradation in SVM classification, especially when only the query or database side is quantized.

5. Quantized Fourier Features in Neural Field Representations

In neural field models such as Neural Image Representations, Neural Radiance Fields (NeRF), and Signed Distance Functions (SDF), QFF are used as a binning mechanism in the Fourier domain (Lee et al., 2022). Instead of being optimized for signal compression, here the quantization creates localized feature bins in the range of each Fourier feature. Key properties:

QFF partitions each $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 3 or $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 4 across $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 5 bins, associating each bin with a small learnable vector.
Periodic binning is implemented efficiently via interpolation between bin vectors, exploiting the inherent periodicity of sinusoids; this allows smoothness to be controlled, and discontinuities are avoided by summing in the original feature.
The multiresolution nature is preserved, as high-frequency bins are naturally narrower.
Empirically, QFF reduces model size (up to $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 6), accelerates convergence (requiring an order of magnitude fewer steps for similar PSNR or Chamfer metrics), and maintains or improves quality compared to non-quantized Fourier encodings or hard spatial grids.

Typical parameter choices (for 3D NeRF): $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 7 bins, $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 8 feature channels, $z(x) = \sqrt{\frac{2}{D}}\,[\cos(w_1^T x + b_1),\dots,\cos(w_D^T x + b_D)],$ 9– $w_i \sim N(0,I_d)$ 0 frequencies. The QFF approach leads to fast high-frequency fitting and preserves network smoothness with minimal adjustments to standard MLP architectures.

6. Tensorized and Structured Quantized Fourier Features

Expanding beyond scalar quantization, (Wesel et al., 2023) introduces a tensorized ("quantized") decomposition approach for Fourier features useful in high-dimensional kernel machines. For each dimension, the set of $w_i \sim N(0,I_d)$ 1 frequencies is factorized via radix-Q expansion, replacing the expensive tensor-product feature with a higher-order tensor of much smaller per-mode dimension ( $w_i \sim N(0,I_d)$ 2):

Each standard Vandermonde vector $w_i \sim N(0,I_d)$ 3 is decomposed into $w_i \sim N(0,I_d)$ 4, mapping original $w_i \sim N(0,I_d)$ 5-way tensors of side $w_i \sim N(0,I_d)$ 6 to $w_i \sim N(0,I_d)$ 7-way tensors of side $w_i \sim N(0,I_d)$ 8.
The model weights are themselves tensorized, e.g., in Tensor-Train or CPD structures, reducing memory while improving expressivity—manifested as a higher VC-dimension bound for the same parameter budget.
In large-scale regression tasks, quantized tensor network models (QTKM/QFF with TT structure) reach lower test error than both non-quantized TNs and kernel ridge regression at drastically lower parameter counts.
This tensorization regularizes learning by focusing model capacity on the most salient data-driven harmonics, and is practical for datasets with up to $w_i \sim N(0,I_d)$ 9 samples and moderate feature cardinality.

This paradigm requires all $b_i \sim \mathrm{Unif}[0,2\pi]$ 0 to be factorizable as $b_i \sim \mathrm{Unif}[0,2\pi]$ 1, and optimization is nonconvex but manageable via established tensor network solvers.

7. Practical Recommendations, Limitations, and Prospects

Implementation Guidance:

Precompute $b_i \sim \mathrm{Unif}[0,2\pi]$ 2-bit codebooks or bin-lookup tables, leveraging the universal $b_i \sim \mathrm{Unif}[0,2\pi]$ 3-free density on $b_i \sim \mathrm{Unif}[0,2\pi]$ 4 for all RFF (Li et al., 2021).
For classical kernel machines: 1-bit LM-RFF with a normalized estimator is often within 2–3% of full performance, 2 bits achieves $b_i \sim \mathrm{Unif}[0,2\pi]$ 5 degradation with $b_i \sim \mathrm{Unif}[0,2\pi]$ 6 memory savings.
In neural fields: choose bin count $b_i \sim \mathrm{Unif}[0,2\pi]$ 7 empirically, interpolate across periodic bins, and add back the original Fourier features for continuity. Binning at higher frequencies improves fine detail without discontinuities.

Limitations:

For symmetric (both-side) quantization, kernel recovery is not perfectly unbiased; for client–server or asymmetric architectures, theoretical exactness is attainable, with error bounds governed by feature complexity (Schellekens et al., 2020).
Base-Q tensorized QFFs require suitable factorization of mode sizes, and optimization over TNs is nonconvex and sensitive to hyperparameter selection (Wesel et al., 2023).
Some variants, such as LM $b_i \sim \mathrm{Unif}[0,2\pi]$ 8–RFF, offer improved performance only in certain estimator regimes.

Prospects and Open Directions:

Adaptive or learned binning, nonuniform quantization, and frequency learning may further enhance QFF performance, especially for neural field and high-dimensional modeling tasks (Lee et al., 2022).
Extension to quantized polynomial features and hybrid deep architectures is compelling for computational and storage efficiency at scale.
Theoretical understanding of approximation error as a function of bin count and feature channel dimension remains a subject for future work.

Quantized Fourier Features thus represent a mature, versatile technology for scalable kernel approximation, efficient neural field modeling, and expressive tensor network construction, enabling practical, resource-efficient implementations without substantial loss of fidelity (Li et al., 2021, Zhang et al., 2021, Schellekens et al., 2020, Lee et al., 2022, Wesel et al., 2023).