Frequency-Aware Multi-Parameter Intra-Row Grouping

Updated 7 December 2025

The paper demonstrates how frequency decomposition with the Haar transform and adaptive thresholding minimizes reconstruction error and improves model perplexity.
It employs intra-row grouping to partition weight matrices into four subgroups, thus enhancing the capacity of 1-bit quantization while incurring negligible storage overhead.
The approach increases quantization fidelity by expanding the discrete inverse quantization set from single-digit limits to up to 1024 levels.

Frequency-aware multi-parameter intra-row grouping is a structure-aware quantization strategy introduced in HBLLM, a wavelet-based high-fidelity 1-bit quantization method for LLMs. This method combines frequency decomposition via the Haar transform with adaptive, band-specific grouping and aggregation to increase the capacity and accuracy of ultra-low-bit quantization while incurring negligible storage overhead (Chen et al., 30 Nov 2025).

1. Formal Definition and Notation

Given a full-precision weight matrix $W \in \mathbb{R}^{d \times m}$ of a linear layer, the method operates row-wise. For a selected row $w \in \mathbb{R}^m$ , a one-dimensional Haar wavelet transform is applied to obtain the spectral coefficients:

$\hat{h} = \mathcal{H}(w) \in \mathbb{R}^m$

$\hat{h}$ is partitioned into low- and high-frequency components:

$\hat{h} = [\hat{h}^{(L)}, \hat{h}^{(H)}]$

with $\hat{h}^{(L)} = \mathcal{H}_{\text{low-pass}}(w) \in \mathbb{R}^{m/2}$ and $\hat{h}^{(H)} = \mathcal{H}_{\text{high-pass}}(w) \in \mathbb{R}^{m/2}$ . Each frequency band $f \in \{L, H\}$ is independently partitioned into two groups (dense/sparse) via a threshold $t^{(f)}$ selected from a discrete candidate set, yielding four final subgroups per row. Thresholds are set per-row and per-band by minimizing local quantization error, reflecting diverse spectral patterns across rows and bands.

2. Mathematical Formulation and Quantization Workflow

Within each frequency band $f$ , the absolute values of coefficients $|\hat{h}^{(f)}|$ are sorted, and a set of $C$ candidate percentiles $P = \{p_1, ..., p_C\}$ is chosen. For each candidate $p_k$ , threshold $t_k$ is defined as the $p_k$ -th percentile, forming two groups:

$G_k^{(f,1)} = \{i : |\hat{h}_i^{(f)}| \leq t_k\}$
$G_k^{(f,2)} = \{i : |\hat{h}_i^{(f)}| > t_k\}$

Group-wise means are computed:

$\mu^{(f,g)} = \frac{1}{|G^{(f,g)}|}\sum_{i\in G^{(f,g)}} \hat{h}_i^{(f)},\quad g\in\{1,2\}$

A single rowwise scale $\alpha$ (or optionally per-group) is used. 1-bit quantization is applied:

$\hat{h}_i^{(f)}(B) = \alpha \cdot \text{sign}(\hat{h}_i^{(f)} - \mu^{(f,g)})$

The optimal threshold $k^*$ is selected to minimize reconstruction error $E^{(f)}_k$ , aggregating within-group squared errors. The final grouping is $G_{k^*}^{(f,1)}, G_{k^*}^{(f,2)}$ with means $\mu_{k^*}^{(f,1)}, \mu_{k^*}^{(f,2)}$ .

To reduce mean-storage overhead, HBLLM can employ mean sharing within each band:

$\mu^{(f)}_{\text{shared}} = \frac{|G^{(f,1)}|\mu^{(f,1)} + |G^{(f,2)}|\mu^{(f,2)}}{|G^{(f,1)}| + |G^{(f,2)}|}$

This reduces per-weight storage by approximately $0.25$ bits/weight without negatively affecting, and sometimes slightly improving, perplexity (see Table 3c in (Chen et al., 30 Nov 2025)).

3. Algorithm Workflow and Pseudocode

A single-row grouping and quantization pass is implemented via the following routine, omitting salient columns (which are handled separately):

def RowFreqGroup(w, C, share_means=True):
    # 1. Haar transform
    h = Haar1D(w)      # h splits into h^(L) and h^(H)
    for f in {L, H}:   # two frequency bands
        c_abs = sorted(abs(h_f))
        best_err = float('inf')
        for p in linspace(0, 100, C):
            t = percentile(c_abs, p)
            G1 = [i for i in range(len(h_f)) if abs(h_f[i]) <= t]
            G2 = [i for i in range(len(h_f)) if abs(h_f[i]) > t]
            mu1 = mean([h_f[i] for i in G1])
            mu2 = mean([h_f[i] for i in G2])
            # alpha can be precomputed (row or group); assume rowwise
            B1 = [alpha * sign(h_f[i] - mu1) for i in G1]
            B2 = [alpha * sign(h_f[i] - mu2) for i in G2]
            err = sum((h_f[i] - B1[i] - mu1)**2 for i in G1) + \
                  sum((h_f[i] - B2[i] - mu2)**2 for i in G2)
            if err < best_err:
                best_err, best_groups = err, (G1, G2, mu1, mu2)
        G1_star, G2_star, mu1_star, mu2_star = best_groups
        if share_means:
            mu_shared = (len(G1_star)*mu1_star + len(G2_star)*mu2_star)/(len(G1_star)+len(G2_star))
            mu1_star = mu2_star = mu_shared
        # store groups and means
        for i in G1_star: h_f_B[i] = alpha * sign(h_f[i] - mu1_star)
        for i in G2_star: h_f_B[i] = alpha * sign(h_f[i] - mu2_star)
    return inverse_Haar([h_L_B, h_H_B])

In practice, vectorized implementations batch multiple rows, and salient columns are skipped or assigned via FillAvg prior to the Haar step (see Algorithm 1 and Fig. 2 in (Chen et al., 30 Nov 2025)).

4. Computational and Storage Complexity

The overall complexity per row is $O(m + Cm)$ , dominated by $O(Cm)$ due to scanning $C$ thresholds per frequency band ( $C \approx 40$ is constant):

Haar 1D transform per row: $O(m)$
Threshold enumeration: $O(Cm)$ per row, $O(dCm)$ for $d$ rows

The storage overhead per row is minimal, requiring either $4$ floats/row (two means per band) or $2$ floats/row (if mean sharing is used). For typical $m$ , extra per-weight storage is negligible, e.g., $(4 \times 32)/m \approx O(1/m)$ bits/weight. Table 1 reports an average $W$ -bits of $1.09$ for the HBLLM-row scheme and $1.00$ for HBLLM-col (Chen et al., 30 Nov 2025).

5. Comparative Analysis with Prior Grouping Schemes

Several baselines are contrasted:

BiLLM (intra-row global): Employs a single split per row, independent of frequency.
ARB-LLM $_x$ (channel-wise): Groups along columns with uniform, value-agnostic partitions.
Mixture-of-Scales, OneBitGPT: Use data-driven groupings on raw (time-domain) weights, lacking frequency selectivity.

HBLLM’s method, by applying localized Haar transforms and optimizing two independent splits per row (one per band), produces four intra-row subgroups and increases the cardinality of the inverse quantization set (CIQ) from $\leq 10$ (ARB-LLM $_x$ ) to up to $1024$, directly improving representation fidelity under 1-bit quantization (see Sec. 3.1, Appendix B/C in (Chen et al., 30 Nov 2025)).

Method	#Subgroups/Row	Frequencies Used	CIQ Upper Bound
BiLLM	2	No	8
ARB-LLM $_x$	Block-wise	No	10
HBLLM (proposed)	4	Yes	1024

6. Empirical and Theoretical Impact on Quantization Fidelity

HBLLM’s frequency-aware multi-parameter intra-row grouping achieves substantial practical gains:

Ablation (Table 3b): Switching to frequency-aware grouping reduces LLaMA2-7B perplexity from $16.32 \to 11.08$ (Wiki2) and $1990 \to 95.58$ (PTB).
Final Model Results (Table 1): HBLLM-row reports $7.62/6.68/34.94$ perplexities on C4/Wiki2/PTB for LLaMA1-13B, versus $13.93/14.99/69.75$ for BiLLM.
Relative Distance to FP16 (Fig. 1): HBLLM reduces the gap to full-precision by $33$– $66\%$ compared to previous 1-bit methods.

The theoretical CIQ bound for HBLLM reaches $1024$, versus $8$ (BiLLM) and $10$ (ARB-LLM $_x$ ) (Chen et al., 30 Nov 2025).

7. Figures, Tables, and Implementation Reference

Fig. 2: Shows the pipeline including FillAvg handling, Haar transform, and differentiation of salient/non-salient paths.
Eq. 3–4 (Sec. 3.3): Provide the explicit formulations for the Haar transform and sign-based quantization.
Table 3(b): Details the ablation comparing grouping granularities.
Table 1: Summarizes perplexity and accuracy of HBLLM against baselines.
Sec. 3.1, Appendix B/C: Discuss CIQ analysis and the theoretical expressiveness implications.

Taken collectively, frequency-aware multi-parameter intra-row grouping significantly enriches the discrete quantization set, achieving $O(N)$ runtime and negligible storage cost. It provides marked improvements in quantization error and downstream perplexity relative to both global intra-row and channel-wise quantization schemes (Chen et al., 30 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Frequency-Aware Multi-Parameter Intra-Row Grouping.