BM25S Reranker

Updated 9 January 2026

BM25S reranking is a hybrid retrieval method that leverages exact BM25 scores to enhance candidate ranking in multi-stage search pipelines.
It employs efficient sparse matrix techniques and token injection into neural and LLM models, achieving up to 500× speed improvements.
Empirical evaluations on benchmarks like MSMARCO and TREC DL show BM25S bridges lexical and semantic matching gaps to improve ranking metrics.

A BM25S reranker is an approach within information retrieval pipelines that explicitly exploits BM25 lexical scores for enhanced ranking effectiveness in multi-stage retrieval systems. The term “BM25S” is used in two distinct but related contexts: as a practical, highly optimized Python implementation for efficient BM25 reranking (Lù, 2024), and as a general framework for injecting BM25 scores (as either tokens or features) into neural or LLM-based rerankers—sometimes referred to as “BM25-as-text” (Askari et al., 2023). Both variants address the challenge of integrating efficient exact-matching signals into neural re-ranking, boosting performance over conventional neural or interpolated hybrid systems.

1. Principles of BM25S Reranking

BM25S reranking operationalizes a two-stage architecture:

Stage 1: Lexical Retrieval A fast, recall-oriented BM25 retriever narrows the candidate set to the top-n passages for a query.
Stage 2: Re-Ranking Candidates are rescored either by directly applying BM25S (as a lightweight, exact reranker) or by fusing their BM25 scores into neural/LLM-based re-rankers to enable synergetic exploitation of lexical and semantic signals (Lù, 2024, Askari et al., 2023).

Distinct from other hybrid or ensemble strategies, BM25S reranking can involve direct input injection of normalized BM25 scores as text tokens within cross-encoder models, or as part of reranker prompts for LLMs—meaning the model uses the actual BM25 value as an explicit feature during inference rather than relying on linear interpolation or post-hoc fusion.

2. Mathematical Formulation and Implementation Details

BM25 Scoring:

For a query $Q = \{q_1, ..., q_{|Q|}\}$ and document $D$ :

$B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$

where:

$\mathrm{TF}(q_i, D)$ : term frequency in document
$\mathrm{IDF}(q_i, C)$ : inverse document frequency
$k_1$ , $b$ : tunable parameters
$|D|$ : document length, $\mathrm{avgdl}$ : average document length in $C$ .

BM25S (Eager Sparse Scoring):

BM25S (Lù, 2024) constructs a sparse matrix $D$ 0 where each entry corresponds to a term-document contribution $D$ 1 precomputed at index time:

$D$ 2

At query time, BM25 scores for candidates are efficiently obtained by slicing and aggregating entries from $D$ 3 for relevant terms, enabling substantial speedups (up to $D$ 4 over baseline Python implementations).

BM25-as-Text Injection in Neural/LLM Rerankers:

BM25S (Askari et al., 2023) also refers to injecting normalized BM25 scores as input tokens to neural rerankers:

For a sequence: $D$ 5 query $D$ 6 BM25_TOKEN $D$ 7 passage $D$ 8, the BM25 score (globally min-max normalized, scaled, and integerized) is placed between the query and passage.
The neural model’s self- and cross-attention layers allow the BM25 score token to directly influence the learned representation.

3. Score Normalization and Injection Strategies

The raw BM25 scores are unbounded and non-uniformly distributed. The normalization and injection process is critical:

Normalization Options:
- Local min–max: $D$ 9 where $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 0 is top- $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 1 candidate BM25 scores for a query.
- Global min–max: Fixed corpus-level bounds (e.g., min=0, max=50).
- Z-score (local/global) and sum-normalized variants were also studied.
- Best practice: Normalize to $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 2, multiply by 100, and cast to integer, e.g., “12” for 12% of the max observed score (Askari et al., 2023).
Injection:

The BM25 token is inserted as an actual text token in the neural reranker input. For LLM-based rerankers (e.g., InsertRank (Seetharaman et al., 17 Jun 2025)), BM25 scores are included explicitly in the prompt for each candidate document, allowing the LLM to reason over both the passage content and its lexical relevance.

4. Empirical Performance and Analysis

BM25S (Efficient Reranker):

On BEIR and MSMARCO, the eager-sparse BM25S framework achieves $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 3 faster reranking over popular alternatives, with rankings mathematically identical to standard BM25 (Lù, 2024).
Five major BM25 variants are supported via “score shifting” to remain sparse, including Okapi, Lucene, ATIRE, BM25L, and BM25+.

BM25-as-Text Neural Reranking:

Statistically significant gains achieved on MSMARCO and TREC DL:
- On MSMARCO: MiniLM $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 4 nDCG@10 = 0.419 vs. MiniLM $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 5 = 0.424, with similar lifts on MAP and MRR@10 (Askari et al., 2023).
- BERT-Large $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 6 yields nDCG@10 = 0.728 vs. 0.695 for the vanilla model (TREC DL-20).
Across all query types (Abbreviation, Location, Description, Human, Numeric, Entity), BM25S injection outperforms baseline cross-encoders by 2–4 MRR points.
For exact-match queries (e.g., masked-passage overlap), BM25S-enhanced models surpass both BM25 and standard BERT, bridging the exact-match “gap” typically observed in Transformer rerankers (Askari et al., 2023).

LLM Listwise Reranking with BM25 Injection (InsertRank):

On complex reasoning benchmarks (BRIGHT, R2MED), inserting BM25 scores into LLM prompts consistently improves NDCG@10 by up to 16.3% relative for Gemini 2.5, and modestly (0.6–3.3%) for other LLMs (Seetharaman et al., 17 Jun 2025).
BM25 injection mitigates, but does not eliminate, document ordering bias in listwise LLM rerankers.

5. Integration in Retrieval Pipelines

BM25S rerankers are adopted as either:

Pure BM25 rescoring mechanisms (for high-throughput, latency-critical second-stage reranking) using precomputed sparse matrices (Lù, 2024).
Score-injection enhancements for neural/LLM cross-encoders and prompt-based rerankers, requiring only minor pipeline modifications (token or prompt augmentation) and no retraining or extra model ensemble steps (Askari et al., 2023, Seetharaman et al., 17 Jun 2025).
In multi-stage QA or RAG (retrieval-augmented generation) systems, a typical pipeline is:
1. BM25 or hybrid retriever produces top- $B(Q,D) = \sum_{i=1}^{|Q|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{|D|}{\mathrm{avgdl}})}$ 7 candidate passages.
2. Neural cross-encoder (BM25S version) reranks candidates with BM25 score injection.
3. Final selection is made based on the new ranking (Moreira et al., 2024).
BM25S reranking can be combined with hybrid retrievers trained on sparse+dense rankings for additional robustness (cf. HYRR framework (Lu et al., 2022)).

6. Limitations and Future Directions

Order Sensitivity: LLM-based rerankers, even with BM25 injection, remain sensitive to document order; partial mitigation is possible but setwise or rank-aggregation approaches may be needed (Seetharaman et al., 17 Jun 2025).
Score Selection: Overreliance on BM25 anchors may induce bias or reduce diversity; adaptive weighting or gating strategies have been identified as plausible avenues for improvement.
Expansion to Additional Signals: While current BM25S approaches focus on BM25 exclusively, extension to other lexical (TF-IDF, lexical diversity) or dense retrieval scores is underexplored.
Scalability and Memory: For extremely large collections, practitioners should consider chunking, memory-mapping, and multi-threaded processing to maintain BM25S throughput advantages (Lù, 2024).
License and Commercial Use: In industry, model and data licensing must be verified, and deployment best practices (model sharding, batching, caching) are critical (Moreira et al., 2024).

7. Comparative Summary of BM25S Approaches

BM25S Context	Method Summary	Empirical Gains
Eager Sparse (BM25S)	Precomputed sparse matrix, pure BM25 rerank	$B(Q,D) = \sum_{i=1}^{\|Q\|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{\|D\|}{\mathrm{avgdl}})}$ 8 speedup vs. Python
BM25-as-Text (Neural)	Inject normalized BM25 scores as input tokens	$B(Q,D) = \sum_{i=1}^{\|Q\|} \mathrm{IDF}(q_i,C) \cdot \frac{\mathrm{TF}(q_i,D)}{\mathrm{TF}(q_i,D) + k_1(1-b + b \frac{\|D\|}{\mathrm{avgdl}})}$ 9 MRR@10, +0.04 nDCG@10
BM25-in-Prompt (LLM)	BM25 scores in listwise LLM reranker prompts	$\mathrm{TF}(q_i, D)$ 0 NDCG@10

BM25S reranking presents an efficient, modular, and empirically robust solution at the intersection of sparse and neural ranking, enabling enhanced retrieval accuracy, interpretability, and serving efficiency with minimal pipeline complexity (Lù, 2024, Askari et al., 2023, Seetharaman et al., 17 Jun 2025, Moreira et al., 2024, Lu et al., 2022).