Papers
Topics
Authors
Recent
Search
2000 character limit reached

Frequency-Aware Multi-Parameter Intra-Row Grouping

Updated 7 December 2025
  • The paper demonstrates how frequency decomposition with the Haar transform and adaptive thresholding minimizes reconstruction error and improves model perplexity.
  • It employs intra-row grouping to partition weight matrices into four subgroups, thus enhancing the capacity of 1-bit quantization while incurring negligible storage overhead.
  • The approach increases quantization fidelity by expanding the discrete inverse quantization set from single-digit limits to up to 1024 levels.

Frequency-aware multi-parameter intra-row grouping is a structure-aware quantization strategy introduced in HBLLM, a wavelet-based high-fidelity 1-bit quantization method for LLMs. This method combines frequency decomposition via the Haar transform with adaptive, band-specific grouping and aggregation to increase the capacity and accuracy of ultra-low-bit quantization while incurring negligible storage overhead (Chen et al., 30 Nov 2025).

1. Formal Definition and Notation

Given a full-precision weight matrix W∈Rd×mW \in \mathbb{R}^{d \times m} of a linear layer, the method operates row-wise. For a selected row w∈Rmw \in \mathbb{R}^m, a one-dimensional Haar wavelet transform is applied to obtain the spectral coefficients:

h^=H(w)∈Rm\hat{h} = \mathcal{H}(w) \in \mathbb{R}^m

h^\hat{h} is partitioned into low- and high-frequency components:

h^=[h^(L),h^(H)]\hat{h} = [\hat{h}^{(L)}, \hat{h}^{(H)}]

with h^(L)=Hlow-pass(w)∈Rm/2\hat{h}^{(L)} = \mathcal{H}_{\text{low-pass}}(w) \in \mathbb{R}^{m/2} and h^(H)=Hhigh-pass(w)∈Rm/2\hat{h}^{(H)} = \mathcal{H}_{\text{high-pass}}(w) \in \mathbb{R}^{m/2}. Each frequency band f∈{L,H}f \in \{L, H\} is independently partitioned into two groups (dense/sparse) via a threshold t(f)t^{(f)} selected from a discrete candidate set, yielding four final subgroups per row. Thresholds are set per-row and per-band by minimizing local quantization error, reflecting diverse spectral patterns across rows and bands.

2. Mathematical Formulation and Quantization Workflow

Within each frequency band ff, the absolute values of coefficients ∣h^(f)∣|\hat{h}^{(f)}| are sorted, and a set of CC candidate percentiles P={p1,...,pC}P = \{p_1, ..., p_C\} is chosen. For each candidate pkp_k, threshold tkt_k is defined as the pkp_k-th percentile, forming two groups:

  • Gk(f,1)={i:∣h^i(f)∣≤tk}G_k^{(f,1)} = \{i : |\hat{h}_i^{(f)}| \leq t_k\}
  • Gk(f,2)={i:∣h^i(f)∣>tk}G_k^{(f,2)} = \{i : |\hat{h}_i^{(f)}| > t_k\}

Group-wise means are computed:

μ(f,g)=1∣G(f,g)∣∑i∈G(f,g)h^i(f),g∈{1,2}\mu^{(f,g)} = \frac{1}{|G^{(f,g)}|}\sum_{i\in G^{(f,g)}} \hat{h}_i^{(f)},\quad g\in\{1,2\}

A single rowwise scale α\alpha (or optionally per-group) is used. 1-bit quantization is applied:

h^i(f)(B)=α⋅sign(h^i(f)−μ(f,g))\hat{h}_i^{(f)}(B) = \alpha \cdot \text{sign}(\hat{h}_i^{(f)} - \mu^{(f,g)})

The optimal threshold k∗k^* is selected to minimize reconstruction error Ek(f)E^{(f)}_k, aggregating within-group squared errors. The final grouping is Gk∗(f,1),Gk∗(f,2)G_{k^*}^{(f,1)}, G_{k^*}^{(f,2)} with means μk∗(f,1),μk∗(f,2)\mu_{k^*}^{(f,1)}, \mu_{k^*}^{(f,2)}.

To reduce mean-storage overhead, HBLLM can employ mean sharing within each band:

μshared(f)=∣G(f,1)∣μ(f,1)+∣G(f,2)∣μ(f,2)∣G(f,1)∣+∣G(f,2)∣\mu^{(f)}_{\text{shared}} = \frac{|G^{(f,1)}|\mu^{(f,1)} + |G^{(f,2)}|\mu^{(f,2)}}{|G^{(f,1)}| + |G^{(f,2)}|}

This reduces per-weight storage by approximately $0.25$ bits/weight without negatively affecting, and sometimes slightly improving, perplexity (see Table 3c in (Chen et al., 30 Nov 2025)).

3. Algorithm Workflow and Pseudocode

A single-row grouping and quantization pass is implemented via the following routine, omitting salient columns (which are handled separately):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def RowFreqGroup(w, C, share_means=True):
    # 1. Haar transform
    h = Haar1D(w)      # h splits into h^(L) and h^(H)
    for f in {L, H}:   # two frequency bands
        c_abs = sorted(abs(h_f))
        best_err = float('inf')
        for p in linspace(0, 100, C):
            t = percentile(c_abs, p)
            G1 = [i for i in range(len(h_f)) if abs(h_f[i]) <= t]
            G2 = [i for i in range(len(h_f)) if abs(h_f[i]) > t]
            mu1 = mean([h_f[i] for i in G1])
            mu2 = mean([h_f[i] for i in G2])
            # alpha can be precomputed (row or group); assume rowwise
            B1 = [alpha * sign(h_f[i] - mu1) for i in G1]
            B2 = [alpha * sign(h_f[i] - mu2) for i in G2]
            err = sum((h_f[i] - B1[i] - mu1)**2 for i in G1) + \
                  sum((h_f[i] - B2[i] - mu2)**2 for i in G2)
            if err < best_err:
                best_err, best_groups = err, (G1, G2, mu1, mu2)
        G1_star, G2_star, mu1_star, mu2_star = best_groups
        if share_means:
            mu_shared = (len(G1_star)*mu1_star + len(G2_star)*mu2_star)/(len(G1_star)+len(G2_star))
            mu1_star = mu2_star = mu_shared
        # store groups and means
        for i in G1_star: h_f_B[i] = alpha * sign(h_f[i] - mu1_star)
        for i in G2_star: h_f_B[i] = alpha * sign(h_f[i] - mu2_star)
    return inverse_Haar([h_L_B, h_H_B])

In practice, vectorized implementations batch multiple rows, and salient columns are skipped or assigned via FillAvg prior to the Haar step (see Algorithm 1 and Fig. 2 in (Chen et al., 30 Nov 2025)).

4. Computational and Storage Complexity

The overall complexity per row is O(m+Cm)O(m + Cm), dominated by O(Cm)O(Cm) due to scanning CC thresholds per frequency band (C≈40C \approx 40 is constant):

  • Haar 1D transform per row: O(m)O(m)
  • Threshold enumeration: O(Cm)O(Cm) per row, O(dCm)O(dCm) for dd rows

The storage overhead per row is minimal, requiring either $4$ floats/row (two means per band) or $2$ floats/row (if mean sharing is used). For typical mm, extra per-weight storage is negligible, e.g., (4×32)/m≈O(1/m)(4 \times 32)/m \approx O(1/m) bits/weight. Table 1 reports an average WW-bits of $1.09$ for the HBLLM-row scheme and $1.00$ for HBLLM-col (Chen et al., 30 Nov 2025).

5. Comparative Analysis with Prior Grouping Schemes

Several baselines are contrasted:

  • BiLLM (intra-row global): Employs a single split per row, independent of frequency.
  • ARB-LLMx_x (channel-wise): Groups along columns with uniform, value-agnostic partitions.
  • Mixture-of-Scales, OneBitGPT: Use data-driven groupings on raw (time-domain) weights, lacking frequency selectivity.

HBLLM’s method, by applying localized Haar transforms and optimizing two independent splits per row (one per band), produces four intra-row subgroups and increases the cardinality of the inverse quantization set (CIQ) from ≤10\leq 10 (ARB-LLMx_x) to up to $1024$, directly improving representation fidelity under 1-bit quantization (see Sec. 3.1, Appendix B/C in (Chen et al., 30 Nov 2025)).

Method #Subgroups/Row Frequencies Used CIQ Upper Bound
BiLLM 2 No 8
ARB-LLMx_x Block-wise No 10
HBLLM (proposed) 4 Yes 1024

6. Empirical and Theoretical Impact on Quantization Fidelity

HBLLM’s frequency-aware multi-parameter intra-row grouping achieves substantial practical gains:

  • Ablation (Table 3b): Switching to frequency-aware grouping reduces LLaMA2-7B perplexity from 16.32→11.0816.32 \to 11.08 (Wiki2) and 1990→95.581990 \to 95.58 (PTB).
  • Final Model Results (Table 1): HBLLM-row reports $7.62/6.68/34.94$ perplexities on C4/Wiki2/PTB for LLaMA1-13B, versus $13.93/14.99/69.75$ for BiLLM.
  • Relative Distance to FP16 (Fig. 1): HBLLM reduces the gap to full-precision by $33$–66%66\% compared to previous 1-bit methods.

The theoretical CIQ bound for HBLLM reaches $1024$, versus $8$ (BiLLM) and $10$ (ARB-LLMx_x) (Chen et al., 30 Nov 2025).

7. Figures, Tables, and Implementation Reference

  • Fig. 2: Shows the pipeline including FillAvg handling, Haar transform, and differentiation of salient/non-salient paths.
  • Eq. 3–4 (Sec. 3.3): Provide the explicit formulations for the Haar transform and sign-based quantization.
  • Table 3(b): Details the ablation comparing grouping granularities.
  • Table 1: Summarizes perplexity and accuracy of HBLLM against baselines.
  • Sec. 3.1, Appendix B/C: Discuss CIQ analysis and the theoretical expressiveness implications.

Taken collectively, frequency-aware multi-parameter intra-row grouping significantly enriches the discrete quantization set, achieving O(N)O(N) runtime and negligible storage cost. It provides marked improvements in quantization error and downstream perplexity relative to both global intra-row and channel-wise quantization schemes (Chen et al., 30 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Frequency-Aware Multi-Parameter Intra-Row Grouping.