Querybank Normalisation (QB-Norm)
- Querybank Normalisation (QB-Norm) is a framework that adjusts similarity scores to mitigate hubness in high-dimensional embedding spaces.
- It employs a fixed probe query bank and techniques like Dynamic Inverted Softmax for robust cross-modal retrieval.
- QB-Norm enables real-time similarity adjustments without test query access or retraining, enhancing retrieval performance.
Querybank Normalisation (QB-Norm) is a framework for mitigating hubness in high-dimensional joint embedding spaces, specifically enhancing cross-modal retrieval systems. Hubness refers to the phenomenon where a small subset of gallery embeddings emerge disproportionately often as nearest neighbours to numerous queries, distorting retrieval quality. QB-Norm introduces a robust, training-free similarity normalisation regime that leverages a fixed bank of probe queries, enabling real-time adjustments to query-gallery similarities and reducing hubness without requiring access to test queries or retraining.
1. Background: Cross-Modal Retrieval and Hubness
Cross-modal retrieval involves searching for items in a gallery of modality (e.g., images, videos, audio) given queries from a different modality (e.g., textual descriptions). Typical approaches employ learned encoders and , embedding both queries and gallery items into a shared -dimensional space, where similarity is usually measured by cosine similarity.
A persistent challenge in high-dimensional spaces is hubness ("hubness problem") [Radovanović et al., 2010], where certain gallery embeddings ("hubs") appear excessively often in the k-nearest neighbour lists across many queries. Formally, the -occurrence count for a gallery item is
where is the set of nearest gallery neighbours to query . The skewed distribution of produces retrieval bias, with hubs dominating rankings and relevant items receiving less exposure.
Conventional mitigations, such as Globally-Corrected Retrieval (GC), Cross-Domain Similarity Local Scaling (CSLS), and Inverted Softmax (IS), re-normalise similarity scores based on a "probe" query bank. However, these approaches typically require access to the whole test set of queries or are sensitive to the domain of probe queries, often failing in online settings or under substantial domain gap.
2. QB-Norm Framework: Construction and Operation
QB-Norm is a non-parametric, at-inference method that adjusts similarities to down-weight hubs. It comprises two main stages:
2.1 Probe Querybank Construction
A set of probe queries is selected from the query modality, typically sampled from the training set. The size balances memory constraints and fidelity of hubness estimation; empirical choices range from 1K to 60K.
2.2 Similarity Probe Matrix Formation
For each gallery item , the similarity to each probe is precomputed:
yielding a matrix .
2.3 Raw Similarity Computation
For any incoming query , form raw similarities
2.4 Normalisation
Define QB-Norm as a function , producing adjusted similarities . This function can be instantiated as GC, CSLS, IS, or the novel Dynamic Inverted Softmax (DIS).
The ranking algorithm, using QB-Norm, is summarised as:
1 2 3 4 5 |
Input: gallery G, querybank B, precomputed probe matrix P
For each incoming query q:
s_q ← [sim(f_q(q), f_g(g_j)) for j in 1...|G|] # O(|G|)
η_q ← QB-Norm(s_q, P) # O(|G|) or O(|G| log N)
return argsort_desc(η_q) # ranked retrievals |
3. Formal Definitions: Methods in QB-Norm
QB-Norm accommodates several similarity normalisation paradigms:
| Method | Normalisation Formula | Domain Sensitivity |
|---|---|---|
| Globally-Corrected | ; | High |
| CSLS | ; /: average of top- similarities | High |
| Inverted Softmax | ; | High |
| Dynamic IS | Apply IS only when top retrieval is a hub candidate (see below); else use | Low |
3.1 Globally-Corrected Retrieval
Ranks each raw similarity of a gallery item against its probe similarities and adjusts accordingly. Sensitive to probe bank choice.
3.2 CSLS
Subtracts local averages of similarities, intended to address biases arising from density variations in embedding space.
3.3 Inverted Softmax
Applies softmax normalisation per gallery item with respect to probe queries; regulated by inverse temperature .
3.4 Dynamic Inverted Softmax (DIS)
Identifies a hub candidate set :
DIS then applies:
This selective normalisation reduces risk of performance degradation under poor probe banks.
4. Algorithmic Implementation Details
Key implementation steps:
- Precompute (probe queries) and (similarity matrix).
- For IS/DIS, also precompute for each gallery item and the hub candidate set .
- For each query , calculate , find , and decide based on whether to apply IS or not.
Efficient computation is possible:
- Similarity calculation and normalisation are per query.
- Use of approximate -NN libraries (e.g. FAISS) is suggested for large-scale galleries.
Hyperparameters include probe bank size (often 5,000–20,000), inverse temperature (e.g., 20 for cosine similarity), and hub candidate selection parameter (typically 1). Both and are stored in float32 arrays with memory.
5. Integration in Cross-Modal Retrieval Systems
QB-Norm is attached as a post-processing wrapper around frozen joint-embedding retrieval models. It substitutes direct ranking (e.g., by cosine similarity) with the adjusted scores from its normalisation operator. The procedure requires only access to the training set to construct the probe bank, avoiding the impractical need for test queries and allowing single-query inference.
This broad compatibility enables rapid deployment and testing across various encoder architectures, as no retraining or additional model fitting is necessary.
6. Empirical Evidence: Hubness Reduction and Performance Gains
Empirical evaluation demonstrates substantial hubness reduction, evidenced by decreased skewness in -occurrence distributions (e.g., 0.94 to 0.51 on MSR-VTT). Using a probe bank from the training set delivers retrieval accuracy comparable to that attained with a test-set probe bank (e.g., R@1=17.3 vs. 17.5; baseline 14.9). Dynamic IS outperforms alternative normalisation strategies in robustness, maintaining or improving baseline performance even under adverse or out-of-domain probe banks.
Observed gains include:
- On MSR-VTT 1k-A split, TT-CE⁺: R@1 29.6→33.3 (+3.7), R@5 61.6→63.7.
- CLIP2Video: R@1 45.6→47.2 (+1.6).
- Improvements of 1–5 points in R@1 across seven video-text datasets.
- Text-image (MSCOCO): CLIP R@1 increases from 34.8 to 37.8 (5k eval setting), with OSCAR, ViN VL models seeing 1–2 point improvements.
- Text-audio retrieval (AudioCaps): R@1 up by ∼1 point.
- Minor, consistent gains on image–image datasets (CUB, SOP).
A plausible implication is that QB-Norm generalises effectively across modalities and architectures, particularly when hubness is pronounced.
7. Practical Implementation Notes
Similarity computations can be batched and efficiently executed on GPU hardware. Approximate nearest neighbour routines are recommended for scaling to large gallery sizes. Hyperparameter tuning for is performed once on a validation set, with practical ranges . Probe bank size beyond 20K yields diminishing returns, but larger banks may further reduce hubness at moderate memory cost.
Storing the and vectors imposes memory burden. No retraining or fine-tuning of joint-embedding networks is required for QB-Norm integration.
8. Concluding Remarks
QB-Norm constitutes a simple, computationally efficient, and robust strategy for correcting hubness in joint embedding spaces, offering broad and consistent improvements in cross-modal retrieval tasks. Its dynamic normalisation mechanism—especially Dynamic Inverted Softmax—renders it resilient to probe bank quality and domain variance, facilitating practical deployment without retraining or concurrent query access (Bogolin et al., 2021).