Querybank Normalisation (QB-Norm)

Updated 31 January 2026

Querybank Normalisation (QB-Norm) is a framework that adjusts similarity scores to mitigate hubness in high-dimensional embedding spaces.
It employs a fixed probe query bank and techniques like Dynamic Inverted Softmax for robust cross-modal retrieval.
QB-Norm enables real-time similarity adjustments without test query access or retraining, enhancing retrieval performance.

Querybank Normalisation (QB-Norm) is a framework for mitigating hubness in high-dimensional joint embedding spaces, specifically enhancing cross-modal retrieval systems. Hubness refers to the phenomenon where a small subset of gallery embeddings emerge disproportionately often as nearest neighbours to numerous queries, distorting retrieval quality. QB-Norm introduces a robust, training-free similarity normalisation regime that leverages a fixed bank of probe queries, enabling real-time adjustments to query-gallery similarities and reducing hubness without requiring access to test queries or retraining.

Cross-modal retrieval involves searching for items in a gallery $G$ of modality $m_g$ (e.g., images, videos, audio) given queries $q$ from a different modality $m_q$ (e.g., textual descriptions). Typical approaches employ learned encoders $f_q: m_q \rightarrow \mathbb{R}^C$ and $f_g: m_g \rightarrow \mathbb{R}^C$ , embedding both queries and gallery items into a shared $C$ -dimensional space, where similarity is usually measured by cosine similarity.

A persistent challenge in high-dimensional spaces is hubness ("hubness problem") [Radovanović et al., 2010], where certain gallery embeddings ("hubs") appear excessively often in the k-nearest neighbour lists across many queries. Formally, the $k$ -occurrence count for a gallery item $x$ is

$N_k(x) = \sum_{i=1}^{|Q|} \mathbf{1}[x \in NN_k(q_i)],$

where $NN_k(q_i)$ is the set of $k$ nearest gallery neighbours to query $q_i$ . The skewed distribution of $N_k$ produces retrieval bias, with hubs dominating rankings and relevant items receiving less exposure.

Conventional mitigations, such as Globally-Corrected Retrieval (GC), Cross-Domain Similarity Local Scaling (CSLS), and Inverted Softmax (IS), re-normalise similarity scores based on a "probe" query bank. However, these approaches typically require access to the whole test set of queries or are sensitive to the domain of probe queries, often failing in online settings or under substantial domain gap.

2. QB-Norm Framework: Construction and Operation

QB-Norm is a non-parametric, at-inference method that adjusts similarities to down-weight hubs. It comprises two main stages:

2.1 Probe Querybank Construction

A set of $N$ probe queries $B = \{b_1, \ldots, b_N\}$ is selected from the query modality, typically sampled from the training set. The size $N$ balances memory constraints and fidelity of hubness estimation; empirical choices range from 1K to 60K.

2.2 Similarity Probe Matrix Formation

For each gallery item $g_j$ , the similarity to each probe is precomputed:

$P(j,i) = \operatorname{sim}(f_q(b_i), f_g(g_j)),$

yielding a $|G| \times N$ matrix $P$ .

2.3 Raw Similarity Computation

For any incoming query $q$ , form raw similarities

$s_q(j) = \operatorname{sim}(f_q(q), f_g(g_j)), \quad j = 1, \ldots, |G|.$

2.4 Normalisation

Define QB-Norm as a function $\text{QB-Norm}: \mathbb{R}^{|G|} \times \mathbb{R}^{|G| \times N} \to \mathbb{R}^{|G|}$ , producing adjusted similarities $\eta_q = \text{QB-Norm}(s_q, P)$ . This function can be instantiated as GC, CSLS, IS, or the novel Dynamic Inverted Softmax (DIS).

The ranking algorithm, using QB-Norm, is summarised as:

Input: gallery G, querybank B, precomputed probe matrix P
For each incoming query q:
    s_q ← [sim(f_q(q), f_g(g_j)) for j in 1...|G|]      # O(|G|)
    η_q ← QB-Norm(s_q, P)                               # O(|G|) or O(|G| log N)
    return argsort_desc(η_q)                            # ranked retrievals

3. Formal Definitions: Methods in QB-Norm

QB-Norm accommodates several similarity normalisation paradigms:

Method	Normalisation Formula	Domain Sensitivity
Globally-Corrected	$\eta_q(j) = -(Rank(s_q(j); p_j) - s_q(j))$ ; $p_j = P(j, \cdot)$	High
CSLS	$\eta_q(j) = 2 s_q(j) - r_q - r_{g_j}$ ; $r_q$ / $r_{g_j}$ : average of top- $K$ similarities	High
Inverted Softmax	$\eta_q(j) = \exp(\beta s_q(j)) / \sum_{i=1}^N \exp(\beta p_j(i))$ ; $\beta>0$	High
Dynamic IS	Apply IS only when top retrieval is a hub candidate (see below); else use $s_q(j)$	Low

3.1 Globally-Corrected Retrieval

Ranks each raw similarity of a gallery item against its probe similarities and adjusts accordingly. Sensitive to probe bank choice.

3.2 CSLS

Subtracts local averages of similarities, intended to address biases arising from density variations in embedding space.

3.3 Inverted Softmax

Applies softmax normalisation per gallery item with respect to probe queries; regulated by inverse temperature $\beta$ .

3.4 Dynamic Inverted Softmax (DIS)

Identifies a hub candidate set $\mathcal{A}$ :

$\mathcal{A} = \{j : j \in \text{Top}_k(p_j(:)) \text{ for some probe}\}, \ k=1\ \text{in practice}.$

DIS then applies:

$\eta_q(j) = \begin{cases} \frac{\exp(\beta s_q(j))}{\sum_{i=1}^N \exp(\beta p_j(i))}, & \text{if } \arg \max_\ell s_q(\ell) \in \mathcal{A}\ s_q(j), & \text{otherwise} \end{cases}$

This selective normalisation reduces risk of performance degradation under poor probe banks.

4. Algorithmic Implementation Details

Key implementation steps:

Precompute $B$ (probe queries) and $P$ (similarity matrix).
For IS/DIS, also precompute $D(j) = \sum_{i=1}^N \exp(\beta P(j,i))$ for each gallery item and the hub candidate set $\mathcal{A}$ .
For each query $q$ , calculate $s_q(j)$ , find $j^* = \arg\max_j s_q(j)$ , and decide based on $\mathcal{A}$ whether to apply IS or not.

Efficient computation is possible:

Similarity calculation and normalisation are $O(M)$ per query.
Use of approximate $k$ -NN libraries (e.g. FAISS) is suggested for large-scale galleries.

Hyperparameters include probe bank size $N$ (often 5,000–20,000), inverse temperature $\beta$ (e.g., 20 for cosine similarity), and hub candidate selection parameter $k$ (typically 1). Both $D$ and $\mathcal{A}$ are stored in float32 arrays with $O(M)$ memory.

QB-Norm is attached as a post-processing wrapper around frozen joint-embedding retrieval models. It substitutes direct ranking (e.g., by cosine similarity) with the adjusted $\eta_q$ scores from its normalisation operator. The procedure requires only access to the training set to construct the probe bank, avoiding the impractical need for test queries and allowing single-query inference.

This broad compatibility enables rapid deployment and testing across various encoder architectures, as no retraining or additional model fitting is necessary.

6. Empirical Evidence: Hubness Reduction and Performance Gains

Empirical evaluation demonstrates substantial hubness reduction, evidenced by decreased skewness in $k$ -occurrence distributions (e.g., 0.94 to 0.51 on MSR-VTT). Using a probe bank from the training set delivers retrieval accuracy comparable to that attained with a test-set probe bank (e.g., R@1=17.3 vs. 17.5; baseline 14.9). Dynamic IS outperforms alternative normalisation strategies in robustness, maintaining or improving baseline performance even under adverse or out-of-domain probe banks.

Observed gains include:

On MSR-VTT 1k-A split, TT-CE⁺: R@1 29.6→33.3 (+3.7), R@5 61.6→63.7.
CLIP2Video: R@1 45.6→47.2 (+1.6).
Improvements of 1–5 points in R@1 across seven video-text datasets.
Text-image (MSCOCO): CLIP R@1 increases from 34.8 to 37.8 (5k eval setting), with OSCAR, ViN VL models seeing 1–2 point improvements.
Text-audio retrieval (AudioCaps): R@1 up by ∼1 point.
Minor, consistent gains on image–image datasets (CUB, SOP).

A plausible implication is that QB-Norm generalises effectively across modalities and architectures, particularly when hubness is pronounced.

7. Practical Implementation Notes

Similarity computations can be batched and efficiently executed on GPU hardware. Approximate nearest neighbour routines are recommended for scaling to large gallery sizes. Hyperparameter tuning for $\beta$ is performed once on a validation set, with practical ranges $[1,100]$ . Probe bank size beyond 20K yields diminishing returns, but larger banks may further reduce hubness at moderate memory cost.

Storing the $D$ and $\mathcal{A}$ vectors imposes $O(M)$ memory burden. No retraining or fine-tuning of joint-embedding networks is required for QB-Norm integration.

8. Concluding Remarks

QB-Norm constitutes a simple, computationally efficient, and robust strategy for correcting hubness in joint embedding spaces, offering broad and consistent improvements in cross-modal retrieval tasks. Its dynamic normalisation mechanism—especially Dynamic Inverted Softmax—renders it resilient to probe bank quality and domain variance, facilitating practical deployment without retraining or concurrent query access (Bogolin et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Cross Modal Retrieval with Querybank Normalisation (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Querybank Normalisation (QB-Norm).