Papers
Topics
Authors
Recent
Search
2000 character limit reached

BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector

Published 2 Apr 2026 in cs.DB and cs.DS | (2604.01960v1)

Abstract: Although Approximate Nearest Neighbor (ANN) search has been extensively studied, large-k ANN queries that aim to retrieve a large number of nearest neighbors remain underexplored, despite their numerous real-world applications. Existing ANN methods face significant performance degradation for such queries. In this work, we first investigate the reasons for the performance degradation of quantization-based ANN indexes: (1) the inefficiency of existing top-k collectors, which incurs significant overhead in candidate maintenance, and (2) the reduced pruning effectiveness of quantization methods, which leads to a costly re-ranking process. To address this, we propose a novel bucket-based result collector (BBC) to enhance the efficiency of existing quantization-based ANN indexes for large-k ANN queries. BBC introduces two key components: (1) a bucket-based result buffer that organizes candidates into buckets by their distances to the query. This design reduces ranking costs and improves cache efficiency, enabling high performance maintenance of a candidate superset and a lightweight final selection of top-k results. (2) two re-ranking algorithms tailored for different types of quantization methods, which accelerate their re-ranking process by reducing either the number of candidate objects to be re-ranked or cache misses. Extensive experiments on real-world datasets demonstrate that BBC accelerates existing quantization-based ANN methods by up to 3.8x at recall@k = 0.95 for large-k ANN queries.

Authors (5)

Summary

  • The paper introduces BBC, a cache-efficient bucket-based result collector that enhances throughput by 1.4×–3.8× for large-k ANN searches.
  • It addresses heap inefficiencies and degraded quantization pruning by using specialized re-ranking algorithms for bounded and unbounded methods.
  • Empirical and theoretical results confirm minimal quantization error and maintained recall, supporting scalable vector search in large datasets.

BBC: A Formal Analysis of Bucket-based Large-kk ANN Result Collection

Problem Statement and Motivation

The paper "BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector" (2604.01960) addresses the efficiency challenges associated with large-kk approximate nearest neighbor (ANN) search over high-dimensional vector datasets. Large-kk queries (k≥5,000k \geq 5{,}000) arise in critical applications such as candidate recall for recommendation, large-scale rerank pipelines in retrieval-augmented generation, and dataset construction in ML model training. While ANN search is well-studied for standard (small) kk (k<1000k<1000), existing systems exhibit severe throughput degradation for large kk due to two dominant factors: (1) priority queue-based top-kk collectors suffering from cache inefficiency as kk grows, and (2) declining pruning efficiency in quantization-based ANN, leading to growing re-ranking costs.

Empirical evaluation across several methods (IVF, IVF+PQ, IVF+RaBitQ, HNSW) reveals a 4.8×\times--5.7kk0 drop in throughput when kk1 increases from 100 to 5,000 at high recall levels (see Figure 1). Figure 1

Figure 1: Querying performance of IVF, HNSW, IVF+PQ, and IVF+RaBitQ on the C4 dataset at kk2 and kk3.

The motivation for focusing on quantization-based methods stems from their superior resilience to elevated kk4 as compared to graph-based algorithms (e.g., HNSW), whose design is inherently tuned for small-kk5 retrieval via local neighborhood traversal.

Analysis of Bottlenecks

The performance bottlenecks for large-kk6 ANN are twofold:

  1. Top-kk7 Collector Inefficiency: Classic binary heap implementations for collecting the kk8 nearest results incur kk9 complexity per insertion and, crucially, increasingly frequent L1 cache misses when the heap (distance, id) pair array exceeds L1 capacity (kk032KB). This effect is pronounced for kk1, where L1 miss rates soar, and heap maintenance can dominate runtime (in IVF+RaBitQ, the share grows from 2% to 23% as kk2 scales).
  2. Pruning Degradation: Quantization methods (bounded such as RaBitQ and unbounded such as PQ) both require expanding candidate pools as kk3 grows. For bounded variants, overlap between bound intervals and threshold expands, requiring more frequent expensive re-ranking. Unbounded methods must linearly scale candidate pools relative to kk4 to achieve target recall. Figure 2

    Figure 2: Time overhead breakdown at varying kk5, showing increasing dominance of heap and distance computation as kk6 increases.

BBC: The Bucket-based Result Collector

The paper introduces BBC—a cache-efficient, bucket-quantized result collector—designed to replace conventional heap-based top-kk7 collectors in the ANN pipeline (IVF+PQ, IVF+RaBitQ). BBC comprises:

  1. Bucket-based Result Buffer: The candidate distance range is partitioned via one-dimensional quantization into kk8 buckets, each storing candidates in linear buffers (IDs/distances). Candidates are only ordered across buckets (coarse order). Sequential insertion enables hardware prefetching and greatly reduces L1 cache miss rates. Updating the threshold bucket is amortized and localized.
  2. Specialized Re-ranking Algorithms: Distinct mechanisms for bounded (RaBitQ-style) and unbounded (PQ-style) quantizers:
    • For bounded quantization, the algorithm greedily skips re-ranking for candidates provably out/in (via bound-threshold comparison), re-ranking only objects near the threshold, minimizing redundant exact distance computation.
    • For unbounded quantization, an early re-ranking mechanism computes exact distances for predicted in-boundary candidates right after estimated distances are computed, exploiting linear buffer layout for optimal cache use. Figure 3

      Figure 3: Probability density of query-data distances in C4 showing concentration, motivating effectiveness of bucketization.

Quantitative Performance and Empirical Validation

BBC’s design is motivated by the distance concentration observed in high-dimensional settings: bucket partitioning using equal-depth quantization tightly bounds the error between the relaxed (bucket) threshold and the true threshold. Theoretical bounds demonstrate that for high kk9 and moderate k≥5,000k \geq 5{,}0000, the quantization error is negligible.

Empirical results validate substantial gains:

  • Speedup: BBC-integrated systems (IVF+PQ+BBC and IVF+RaBitQ+BBC) yield 1.4k≥5,000k \geq 5{,}0001–3.8k≥5,000k \geq 5{,}0002 throughput increases with recall@k≥5,000k \geq 5{,}0003=0.95, with the advantage expanding as k≥5,000k \geq 5{,}0004 grows, especially on large datasets (Figure 4).
  • Collector Overhead: BBC reduces top-k≥5,000k \geq 5{,}0005 collection time by up to an order of magnitude versus heaps or sorted-buffer alternatives, halving L1 cache misses (Figure 5, Table 1).
  • Re-ranking Efficiency: BBC’s greedy and early re-ranking algorithms reduce both the number of objects needing exact evaluation and per-object access cost, accelerating re-ranking by up to 1.8k≥5,000k \geq 5{,}0006.
  • Accuracy Guarantees: Theoretical and empirical study show bucket thresholds deviate on the order of k≥5,000k \geq 5{,}0007 from the true value (Figure 6), preserving accuracy and strictly controlling recall loss (Figure 7).
  • Small-k≥5,000k \geq 5{,}0008 Regime: BBC does not degrade performance for small k≥5,000k \geq 5{,}0009 (e.g., kk0), matching or slightly outperforming heap-based approaches due to batch and prefetch advantages (Figure 8). Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Accuracy-efficiency trade-off for varying kk1. BBC consistently outperforms baselines, especially as kk2 increases.

System Design and Implementation Choices

Selecting the number of buckets kk3 is analytically tied to L1 cache capacity, quantization code footprint, and lookup table requirements. The paper provides an explicit formula balancing these constraints.

Bucket partitioning leverages equal-depth quantization via sampling over kk4 to adapt to local distance distributions per query, essential for high-dimensional performance.

SIMD acceleration is exploited for code and threshold computation, further enhancing line-rate performance.

Theoretical and Practical Implications

Theoretical Implications

  • BBC establishes the first formal result-collection framework explicitly optimized for large-kk5 ANN, revealing the sharp interplay between memory hierarchy, quantization error, and algorithmic pipeline design.
  • The results quantify a critical performance transition point in kk6 where cache-aware design is mandatory, influencing the broader vector search literature on collector design.
  • The separation of re-ranking logic by quantizer-type (bounded vs. unbounded) provides a template for future indexer design (particularly hybrid systems leveraging both).

Practical Implications

  • BBC is compatible with all similarity metrics admitting order (Euclidean, inner-product, cosine), making it plug-and-play for prevailing quantization-based indexes and modern vector DBMSs (e.g., IVF, PQ, RaBitQ, etc.).
  • The memory footprint of BBC is negligible compared to dataset scales, making it suitable for deployment in both memory- and disk-bound environments.
  • Key applications include recommendation candidate generation at scale, multi-stage rerank pipelines for LLM-backed RAG, and iterative data mining over billion-scale datasets.

Future Directions

  • Graph-based ANN Integration: Extending BBC’s bucketed collection strategies to graph-based ANN methods, which currently degrade dramatically on large-kk7 due to unscalable heap dependence.
  • GPU/Batch Processing: Adapting bucket partitioning and cache-centric logic for high-throughput GPU pipelines with warp-level collectors and multi-query batch collation.
  • Algorithmic Generalization: Exploring variable-sized buckets, dynamic binning, or adaptive quantization codebook construction per query for further improvements.

Conclusion

The BBC framework, comprising a bucketed result collector and quantization-aware re-ranking strategies, resolves fundamental bottlenecks in large-kk8 ANN search. Through a combination of cache model-aware data layout, quantization-driven bucket partitioning, and specialized re-ranking algorithms, BBC achieves up to 3.8kk9 performance increase without recall tradeoff, maintaining or improving performance for small k<1000k<10000, and providing a formal methodology for collector design in scale-adaptive vector search infrastructure.

The approach stands as a new benchmark for scalable ANN design, with strong relevance for high-throughput AI workloads and modern vector-oriented data systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.