Papers
Topics
Authors
Recent
Search
2000 character limit reached

Querybank Normalisation (QB-Norm)

Updated 31 January 2026
  • Querybank Normalisation (QB-Norm) is a framework that adjusts similarity scores to mitigate hubness in high-dimensional embedding spaces.
  • It employs a fixed probe query bank and techniques like Dynamic Inverted Softmax for robust cross-modal retrieval.
  • QB-Norm enables real-time similarity adjustments without test query access or retraining, enhancing retrieval performance.

Querybank Normalisation (QB-Norm) is a framework for mitigating hubness in high-dimensional joint embedding spaces, specifically enhancing cross-modal retrieval systems. Hubness refers to the phenomenon where a small subset of gallery embeddings emerge disproportionately often as nearest neighbours to numerous queries, distorting retrieval quality. QB-Norm introduces a robust, training-free similarity normalisation regime that leverages a fixed bank of probe queries, enabling real-time adjustments to query-gallery similarities and reducing hubness without requiring access to test queries or retraining.

1. Background: Cross-Modal Retrieval and Hubness

Cross-modal retrieval involves searching for items in a gallery GG of modality mgm_g (e.g., images, videos, audio) given queries qq from a different modality mqm_q (e.g., textual descriptions). Typical approaches employ learned encoders fq:mqRCf_q: m_q \rightarrow \mathbb{R}^C and fg:mgRCf_g: m_g \rightarrow \mathbb{R}^C, embedding both queries and gallery items into a shared CC-dimensional space, where similarity is usually measured by cosine similarity.

A persistent challenge in high-dimensional spaces is hubness ("hubness problem") [Radovanović et al., 2010], where certain gallery embeddings ("hubs") appear excessively often in the k-nearest neighbour lists across many queries. Formally, the kk-occurrence count for a gallery item xx is

Nk(x)=i=1Q1[xNNk(qi)],N_k(x) = \sum_{i=1}^{|Q|} \mathbf{1}[x \in NN_k(q_i)],

where NNk(qi)NN_k(q_i) is the set of kk nearest gallery neighbours to query qiq_i. The skewed distribution of NkN_k produces retrieval bias, with hubs dominating rankings and relevant items receiving less exposure.

Conventional mitigations, such as Globally-Corrected Retrieval (GC), Cross-Domain Similarity Local Scaling (CSLS), and Inverted Softmax (IS), re-normalise similarity scores based on a "probe" query bank. However, these approaches typically require access to the whole test set of queries or are sensitive to the domain of probe queries, often failing in online settings or under substantial domain gap.

2. QB-Norm Framework: Construction and Operation

QB-Norm is a non-parametric, at-inference method that adjusts similarities to down-weight hubs. It comprises two main stages:

2.1 Probe Querybank Construction

A set of NN probe queries B={b1,,bN}B = \{b_1, \ldots, b_N\} is selected from the query modality, typically sampled from the training set. The size NN balances memory constraints and fidelity of hubness estimation; empirical choices range from 1K to 60K.

2.2 Similarity Probe Matrix Formation

For each gallery item gjg_j, the similarity to each probe is precomputed:

P(j,i)=sim(fq(bi),fg(gj)),P(j,i) = \operatorname{sim}(f_q(b_i), f_g(g_j)),

yielding a G×N|G| \times N matrix PP.

2.3 Raw Similarity Computation

For any incoming query qq, form raw similarities

sq(j)=sim(fq(q),fg(gj)),j=1,,G.s_q(j) = \operatorname{sim}(f_q(q), f_g(g_j)), \quad j = 1, \ldots, |G|.

2.4 Normalisation

Define QB-Norm as a function QB-Norm:RG×RG×NRG\text{QB-Norm}: \mathbb{R}^{|G|} \times \mathbb{R}^{|G| \times N} \to \mathbb{R}^{|G|}, producing adjusted similarities ηq=QB-Norm(sq,P)\eta_q = \text{QB-Norm}(s_q, P). This function can be instantiated as GC, CSLS, IS, or the novel Dynamic Inverted Softmax (DIS).

The ranking algorithm, using QB-Norm, is summarised as:

1
2
3
4
5
Input: gallery G, querybank B, precomputed probe matrix P
For each incoming query q:
    s_q  [sim(f_q(q), f_g(g_j)) for j in 1...|G|]      # O(|G|)
    η_q  QB-Norm(s_q, P)                               # O(|G|) or O(|G| log N)
    return argsort_desc(η_q)                            # ranked retrievals

3. Formal Definitions: Methods in QB-Norm

QB-Norm accommodates several similarity normalisation paradigms:

Method Normalisation Formula Domain Sensitivity
Globally-Corrected ηq(j)=(Rank(sq(j);pj)sq(j))\eta_q(j) = -(Rank(s_q(j); p_j) - s_q(j)); pj=P(j,)p_j = P(j, \cdot) High
CSLS ηq(j)=2sq(j)rqrgj\eta_q(j) = 2 s_q(j) - r_q - r_{g_j}; rqr_q/rgjr_{g_j}: average of top-KK similarities High
Inverted Softmax ηq(j)=exp(βsq(j))/i=1Nexp(βpj(i))\eta_q(j) = \exp(\beta s_q(j)) / \sum_{i=1}^N \exp(\beta p_j(i)); β>0\beta>0 High
Dynamic IS Apply IS only when top retrieval is a hub candidate (see below); else use sq(j)s_q(j) Low

3.1 Globally-Corrected Retrieval

Ranks each raw similarity of a gallery item against its probe similarities and adjusts accordingly. Sensitive to probe bank choice.

3.2 CSLS

Subtracts local averages of similarities, intended to address biases arising from density variations in embedding space.

3.3 Inverted Softmax

Applies softmax normalisation per gallery item with respect to probe queries; regulated by inverse temperature β\beta.

3.4 Dynamic Inverted Softmax (DIS)

Identifies a hub candidate set A\mathcal{A}:

A={j:jTopk(pj(:)) for some probe}, k=1 in practice.\mathcal{A} = \{j : j \in \text{Top}_k(p_j(:)) \text{ for some probe}\}, \ k=1\ \text{in practice}.

DIS then applies:

ηq(j)={exp(βsq(j))i=1Nexp(βpj(i)),if argmaxsq()A sq(j),otherwise\eta_q(j) = \begin{cases} \frac{\exp(\beta s_q(j))}{\sum_{i=1}^N \exp(\beta p_j(i))}, & \text{if } \arg \max_\ell s_q(\ell) \in \mathcal{A}\ s_q(j), & \text{otherwise} \end{cases}

This selective normalisation reduces risk of performance degradation under poor probe banks.

4. Algorithmic Implementation Details

Key implementation steps:

  • Precompute BB (probe queries) and PP (similarity matrix).
  • For IS/DIS, also precompute D(j)=i=1Nexp(βP(j,i))D(j) = \sum_{i=1}^N \exp(\beta P(j,i)) for each gallery item and the hub candidate set A\mathcal{A}.
  • For each query qq, calculate sq(j)s_q(j), find j=argmaxjsq(j)j^* = \arg\max_j s_q(j), and decide based on A\mathcal{A} whether to apply IS or not.

Efficient computation is possible:

  • Similarity calculation and normalisation are O(M)O(M) per query.
  • Use of approximate kk-NN libraries (e.g. FAISS) is suggested for large-scale galleries.

Hyperparameters include probe bank size NN (often 5,000–20,000), inverse temperature β\beta (e.g., 20 for cosine similarity), and hub candidate selection parameter kk (typically 1). Both DD and A\mathcal{A} are stored in float32 arrays with O(M)O(M) memory.

5. Integration in Cross-Modal Retrieval Systems

QB-Norm is attached as a post-processing wrapper around frozen joint-embedding retrieval models. It substitutes direct ranking (e.g., by cosine similarity) with the adjusted ηq\eta_q scores from its normalisation operator. The procedure requires only access to the training set to construct the probe bank, avoiding the impractical need for test queries and allowing single-query inference.

This broad compatibility enables rapid deployment and testing across various encoder architectures, as no retraining or additional model fitting is necessary.

6. Empirical Evidence: Hubness Reduction and Performance Gains

Empirical evaluation demonstrates substantial hubness reduction, evidenced by decreased skewness in kk-occurrence distributions (e.g., 0.94 to 0.51 on MSR-VTT). Using a probe bank from the training set delivers retrieval accuracy comparable to that attained with a test-set probe bank (e.g., R@1=17.3 vs. 17.5; baseline 14.9). Dynamic IS outperforms alternative normalisation strategies in robustness, maintaining or improving baseline performance even under adverse or out-of-domain probe banks.

Observed gains include:

  • On MSR-VTT 1k-A split, TT-CE⁺: R@1 29.6→33.3 (+3.7), R@5 61.6→63.7.
  • CLIP2Video: R@1 45.6→47.2 (+1.6).
  • Improvements of 1–5 points in R@1 across seven video-text datasets.
  • Text-image (MSCOCO): CLIP R@1 increases from 34.8 to 37.8 (5k eval setting), with OSCAR, ViN VL models seeing 1–2 point improvements.
  • Text-audio retrieval (AudioCaps): R@1 up by ∼1 point.
  • Minor, consistent gains on image–image datasets (CUB, SOP).

A plausible implication is that QB-Norm generalises effectively across modalities and architectures, particularly when hubness is pronounced.

7. Practical Implementation Notes

Similarity computations can be batched and efficiently executed on GPU hardware. Approximate nearest neighbour routines are recommended for scaling to large gallery sizes. Hyperparameter tuning for β\beta is performed once on a validation set, with practical ranges [1,100][1,100]. Probe bank size beyond 20K yields diminishing returns, but larger banks may further reduce hubness at moderate memory cost.

Storing the DD and A\mathcal{A} vectors imposes O(M)O(M) memory burden. No retraining or fine-tuning of joint-embedding networks is required for QB-Norm integration.

8. Concluding Remarks

QB-Norm constitutes a simple, computationally efficient, and robust strategy for correcting hubness in joint embedding spaces, offering broad and consistent improvements in cross-modal retrieval tasks. Its dynamic normalisation mechanism—especially Dynamic Inverted Softmax—renders it resilient to probe bank quality and domain variance, facilitating practical deployment without retraining or concurrent query access (Bogolin et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Querybank Normalisation (QB-Norm).