Papers
Topics
Authors
Recent
Search
2000 character limit reached

BM25S Reranker

Updated 9 January 2026
  • BM25S reranking is a hybrid retrieval method that leverages exact BM25 scores to enhance candidate ranking in multi-stage search pipelines.
  • It employs efficient sparse matrix techniques and token injection into neural and LLM models, achieving up to 500× speed improvements.
  • Empirical evaluations on benchmarks like MSMARCO and TREC DL show BM25S bridges lexical and semantic matching gaps to improve ranking metrics.

A BM25S reranker is an approach within information retrieval pipelines that explicitly exploits BM25 lexical scores for enhanced ranking effectiveness in multi-stage retrieval systems. The term “BM25S” is used in two distinct but related contexts: as a practical, highly optimized Python implementation for efficient BM25 reranking (Lù, 2024), and as a general framework for injecting BM25 scores (as either tokens or features) into neural or LLM-based rerankers—sometimes referred to as “BM25-as-text” (Askari et al., 2023). Both variants address the challenge of integrating efficient exact-matching signals into neural re-ranking, boosting performance over conventional neural or interpolated hybrid systems.

1. Principles of BM25S Reranking

BM25S reranking operationalizes a two-stage architecture:

  • Stage 1: Lexical Retrieval A fast, recall-oriented BM25 retriever narrows the candidate set to the top-n passages for a query.
  • Stage 2: Re-Ranking Candidates are rescored either by directly applying BM25S (as a lightweight, exact reranker) or by fusing their BM25 scores into neural/LLM-based re-rankers to enable synergetic exploitation of lexical and semantic signals (Lù, 2024, Askari et al., 2023).

Distinct from other hybrid or ensemble strategies, BM25S reranking can involve direct input injection of normalized BM25 scores as text tokens within cross-encoder models, or as part of reranker prompts for LLMs—meaning the model uses the actual BM25 value as an explicit feature during inference rather than relying on linear interpolation or post-hoc fusion.

2. Mathematical Formulation and Implementation Details

BM25 Scoring:

For a query Q={q1,...,qQ}Q = \{q_1, ..., q_{|Q|}\} and document DD:

B(Q,D)=i=1QIDF(qi,C)TF(qi,D)TF(qi,D)+k1(1b+bDavgdl)B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}

where:

  • TF(qi,D)\mathrm{TF}(q_i, D): term frequency in document
  • IDF(qi,C)\mathrm{IDF}(q_i, C): inverse document frequency
  • k1k_1, bb: tunable parameters
  • D|D|: document length, avgdl\mathrm{avgdl}: average document length in CC.

BM25S (Eager Sparse Scoring):

BM25S (Lù, 2024) constructs a sparse matrix MM where each entry corresponds to a term-document contribution S(t,D)S(t, D) precomputed at index time:

S(t,D)=IDF(t,C)TF(t,D)TF(t,D)+k1(1b+bDavgdl)S(t, D) = \mathrm{IDF}(t, C) \cdot \frac{\mathrm{TF}(t, D)}{\mathrm{TF}(t, D) + k_1 (1-b + b \frac{|D|}{\mathrm{avgdl}})}

At query time, BM25 scores for candidates are efficiently obtained by slicing and aggregating entries from MM for relevant terms, enabling substantial speedups (up to 500×500\times over baseline Python implementations).

BM25-as-Text Injection in Neural/LLM Rerankers:

BM25S (Askari et al., 2023) also refers to injecting normalized BM25 scores as input tokens to neural rerankers:

  • For a sequence: [CLS][\mathrm{CLS}] query [SEP][\mathrm{SEP}] BM25_TOKEN [SEP][\mathrm{SEP}] passage [SEP][\mathrm{SEP}], the BM25 score (globally min-max normalized, scaled, and integerized) is placed between the query and passage.
  • The neural model’s self- and cross-attention layers allow the BM25 score token to directly influence the learned representation.

3. Score Normalization and Injection Strategies

The raw BM25 scores are unbounded and non-uniformly distributed. The normalization and injection process is critical:

  • Normalization Options:
    • Local min–max: (sminS)/(maxSminS)(s - \min S)/(\max S - \min S) where SS is top-kk candidate BM25 scores for a query.
    • Global min–max: Fixed corpus-level bounds (e.g., min=0, max=50).
    • Z-score (local/global) and sum-normalized variants were also studied.
    • Best practice: Normalize to [0,1][0,1], multiply by 100, and cast to integer, e.g., “12” for 12% of the max observed score (Askari et al., 2023).
  • Injection:

The BM25 token is inserted as an actual text token in the neural reranker input. For LLM-based rerankers (e.g., InsertRank (Seetharaman et al., 17 Jun 2025)), BM25 scores are included explicitly in the prompt for each candidate document, allowing the LLM to reason over both the passage content and its lexical relevance.

4. Empirical Performance and Analysis

BM25S (Efficient Reranker):

  • On BEIR and MSMARCO, the eager-sparse BM25S framework achieves 2×500×2\times-500\times faster reranking over popular alternatives, with rankings mathematically identical to standard BM25 (Lù, 2024).
  • Five major BM25 variants are supported via “score shifting” to remain sparse, including Okapi, Lucene, ATIRE, BM25L, and BM25+.

BM25-as-Text Neural Reranking:

  • Statistically significant gains achieved on MSMARCO and TREC DL:
    • On MSMARCO: MiniLMCAT_\mathrm{CAT} nDCG@10 = 0.419 vs. MiniLMBM25CAT_\mathrm{BM25CAT} = 0.424, with similar lifts on MAP and MRR@10 (Askari et al., 2023).
    • BERT-LargeBM25CAT_\mathrm{BM25CAT} yields nDCG@10 = 0.728 vs. 0.695 for the vanilla model (TREC DL-20).
  • Across all query types (Abbreviation, Location, Description, Human, Numeric, Entity), BM25S injection outperforms baseline cross-encoders by 2–4 MRR points.
  • For exact-match queries (e.g., masked-passage overlap), BM25S-enhanced models surpass both BM25 and standard BERT, bridging the exact-match “gap” typically observed in Transformer rerankers (Askari et al., 2023).

LLM Listwise Reranking with BM25 Injection (InsertRank):

  • On complex reasoning benchmarks (BRIGHT, R2MED), inserting BM25 scores into LLM prompts consistently improves NDCG@10 by up to 16.3% relative for Gemini 2.5, and modestly (0.6–3.3%) for other LLMs (Seetharaman et al., 17 Jun 2025).
  • BM25 injection mitigates, but does not eliminate, document ordering bias in listwise LLM rerankers.

5. Integration in Retrieval Pipelines

BM25S rerankers are adopted as either:

  • Pure BM25 rescoring mechanisms (for high-throughput, latency-critical second-stage reranking) using precomputed sparse matrices (Lù, 2024).
  • Score-injection enhancements for neural/LLM cross-encoders and prompt-based rerankers, requiring only minor pipeline modifications (token or prompt augmentation) and no retraining or extra model ensemble steps (Askari et al., 2023, Seetharaman et al., 17 Jun 2025).
  • In multi-stage QA or RAG (retrieval-augmented generation) systems, a typical pipeline is:

    1. BM25 or hybrid retriever produces top-KK candidate passages.
    2. Neural cross-encoder (BM25S version) reranks candidates with BM25 score injection.
    3. Final selection is made based on the new ranking (Moreira et al., 2024).
  • BM25S reranking can be combined with hybrid retrievers trained on sparse+dense rankings for additional robustness (cf. HYRR framework (Lu et al., 2022)).

6. Limitations and Future Directions

  • Order Sensitivity: LLM-based rerankers, even with BM25 injection, remain sensitive to document order; partial mitigation is possible but setwise or rank-aggregation approaches may be needed (Seetharaman et al., 17 Jun 2025).
  • Score Selection: Overreliance on BM25 anchors may induce bias or reduce diversity; adaptive weighting or gating strategies have been identified as plausible avenues for improvement.
  • Expansion to Additional Signals: While current BM25S approaches focus on BM25 exclusively, extension to other lexical (TF-IDF, lexical diversity) or dense retrieval scores is underexplored.
  • Scalability and Memory: For extremely large collections, practitioners should consider chunking, memory-mapping, and multi-threaded processing to maintain BM25S throughput advantages (Lù, 2024).
  • License and Commercial Use: In industry, model and data licensing must be verified, and deployment best practices (model sharding, batching, caching) are critical (Moreira et al., 2024).

7. Comparative Summary of BM25S Approaches

BM25S Context Method Summary Empirical Gains
Eager Sparse (BM25S) Precomputed sparse matrix, pure BM25 rerank 2500×2-500\times speedup vs. Python
BM25-as-Text (Neural) Inject normalized BM25 scores as input tokens +0.02+0.02 MRR@10, +0.04 nDCG@10
BM25-in-Prompt (LLM) BM25 scores in listwise LLM reranker prompts +0.816.3%+0.8-16.3\% NDCG@10

BM25S reranking presents an efficient, modular, and empirically robust solution at the intersection of sparse and neural ranking, enabling enhanced retrieval accuracy, interpretability, and serving efficiency with minimal pipeline complexity (Lù, 2024, Askari et al., 2023, Seetharaman et al., 17 Jun 2025, Moreira et al., 2024, Lu et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BM25S Reranker.