Frequency-Aware Multi-Parameter Intra-Row Grouping
- The paper demonstrates how frequency decomposition with the Haar transform and adaptive thresholding minimizes reconstruction error and improves model perplexity.
- It employs intra-row grouping to partition weight matrices into four subgroups, thus enhancing the capacity of 1-bit quantization while incurring negligible storage overhead.
- The approach increases quantization fidelity by expanding the discrete inverse quantization set from single-digit limits to up to 1024 levels.
Frequency-aware multi-parameter intra-row grouping is a structure-aware quantization strategy introduced in HBLLM, a wavelet-based high-fidelity 1-bit quantization method for LLMs. This method combines frequency decomposition via the Haar transform with adaptive, band-specific grouping and aggregation to increase the capacity and accuracy of ultra-low-bit quantization while incurring negligible storage overhead (Chen et al., 30 Nov 2025).
1. Formal Definition and Notation
Given a full-precision weight matrix of a linear layer, the method operates row-wise. For a selected row , a one-dimensional Haar wavelet transform is applied to obtain the spectral coefficients:
is partitioned into low- and high-frequency components:
with and . Each frequency band is independently partitioned into two groups (dense/sparse) via a threshold selected from a discrete candidate set, yielding four final subgroups per row. Thresholds are set per-row and per-band by minimizing local quantization error, reflecting diverse spectral patterns across rows and bands.
2. Mathematical Formulation and Quantization Workflow
Within each frequency band , the absolute values of coefficients are sorted, and a set of candidate percentiles is chosen. For each candidate , threshold is defined as the -th percentile, forming two groups:
Group-wise means are computed:
A single rowwise scale (or optionally per-group) is used. 1-bit quantization is applied:
The optimal threshold is selected to minimize reconstruction error , aggregating within-group squared errors. The final grouping is with means .
To reduce mean-storage overhead, HBLLM can employ mean sharing within each band:
This reduces per-weight storage by approximately $0.25$ bits/weight without negatively affecting, and sometimes slightly improving, perplexity (see Table 3c in (Chen et al., 30 Nov 2025)).
3. Algorithm Workflow and Pseudocode
A single-row grouping and quantization pass is implemented via the following routine, omitting salient columns (which are handled separately):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
def RowFreqGroup(w, C, share_means=True): # 1. Haar transform h = Haar1D(w) # h splits into h^(L) and h^(H) for f in {L, H}: # two frequency bands c_abs = sorted(abs(h_f)) best_err = float('inf') for p in linspace(0, 100, C): t = percentile(c_abs, p) G1 = [i for i in range(len(h_f)) if abs(h_f[i]) <= t] G2 = [i for i in range(len(h_f)) if abs(h_f[i]) > t] mu1 = mean([h_f[i] for i in G1]) mu2 = mean([h_f[i] for i in G2]) # alpha can be precomputed (row or group); assume rowwise B1 = [alpha * sign(h_f[i] - mu1) for i in G1] B2 = [alpha * sign(h_f[i] - mu2) for i in G2] err = sum((h_f[i] - B1[i] - mu1)**2 for i in G1) + \ sum((h_f[i] - B2[i] - mu2)**2 for i in G2) if err < best_err: best_err, best_groups = err, (G1, G2, mu1, mu2) G1_star, G2_star, mu1_star, mu2_star = best_groups if share_means: mu_shared = (len(G1_star)*mu1_star + len(G2_star)*mu2_star)/(len(G1_star)+len(G2_star)) mu1_star = mu2_star = mu_shared # store groups and means for i in G1_star: h_f_B[i] = alpha * sign(h_f[i] - mu1_star) for i in G2_star: h_f_B[i] = alpha * sign(h_f[i] - mu2_star) return inverse_Haar([h_L_B, h_H_B]) |
In practice, vectorized implementations batch multiple rows, and salient columns are skipped or assigned via FillAvg prior to the Haar step (see Algorithm 1 and Fig. 2 in (Chen et al., 30 Nov 2025)).
4. Computational and Storage Complexity
The overall complexity per row is , dominated by due to scanning thresholds per frequency band ( is constant):
- Haar 1D transform per row:
- Threshold enumeration: per row, for rows
The storage overhead per row is minimal, requiring either $4$ floats/row (two means per band) or $2$ floats/row (if mean sharing is used). For typical , extra per-weight storage is negligible, e.g., bits/weight. Table 1 reports an average -bits of $1.09$ for the HBLLM-row scheme and $1.00$ for HBLLM-col (Chen et al., 30 Nov 2025).
5. Comparative Analysis with Prior Grouping Schemes
Several baselines are contrasted:
- BiLLM (intra-row global): Employs a single split per row, independent of frequency.
- ARB-LLM (channel-wise): Groups along columns with uniform, value-agnostic partitions.
- Mixture-of-Scales, OneBitGPT: Use data-driven groupings on raw (time-domain) weights, lacking frequency selectivity.
HBLLM’s method, by applying localized Haar transforms and optimizing two independent splits per row (one per band), produces four intra-row subgroups and increases the cardinality of the inverse quantization set (CIQ) from (ARB-LLM) to up to $1024$, directly improving representation fidelity under 1-bit quantization (see Sec. 3.1, Appendix B/C in (Chen et al., 30 Nov 2025)).
| Method | #Subgroups/Row | Frequencies Used | CIQ Upper Bound |
|---|---|---|---|
| BiLLM | 2 | No | 8 |
| ARB-LLM | Block-wise | No | 10 |
| HBLLM (proposed) | 4 | Yes | 1024 |
6. Empirical and Theoretical Impact on Quantization Fidelity
HBLLM’s frequency-aware multi-parameter intra-row grouping achieves substantial practical gains:
- Ablation (Table 3b): Switching to frequency-aware grouping reduces LLaMA2-7B perplexity from (Wiki2) and (PTB).
- Final Model Results (Table 1): HBLLM-row reports $7.62/6.68/34.94$ perplexities on C4/Wiki2/PTB for LLaMA1-13B, versus $13.93/14.99/69.75$ for BiLLM.
- Relative Distance to FP16 (Fig. 1): HBLLM reduces the gap to full-precision by $33$– compared to previous 1-bit methods.
The theoretical CIQ bound for HBLLM reaches $1024$, versus $8$ (BiLLM) and $10$ (ARB-LLM) (Chen et al., 30 Nov 2025).
7. Figures, Tables, and Implementation Reference
- Fig. 2: Shows the pipeline including FillAvg handling, Haar transform, and differentiation of salient/non-salient paths.
- Eq. 3–4 (Sec. 3.3): Provide the explicit formulations for the Haar transform and sign-based quantization.
- Table 3(b): Details the ablation comparing grouping granularities.
- Table 1: Summarizes perplexity and accuracy of HBLLM against baselines.
- Sec. 3.1, Appendix B/C: Discuss CIQ analysis and the theoretical expressiveness implications.
Taken collectively, frequency-aware multi-parameter intra-row grouping significantly enriches the discrete quantization set, achieving runtime and negligible storage cost. It provides marked improvements in quantization error and downstream perplexity relative to both global intra-row and channel-wise quantization schemes (Chen et al., 30 Nov 2025).