BM25S: Accelerated Sparse BM25 Retrieval

Updated 9 February 2026

BM25S is a Python-based information retrieval framework that precomputes term-document BM25 scores to enable rapid, vectorized query ranking.
It leverages sparse matrix storage and BLAS-accelerated operations to achieve up to 500× speedups over conventional BM25 implementations.
BM25S supports differential scoring for BM25 variants, ensuring accurate retrieval while maintaining memory efficiency during indexing and query-time aggregation.

BM25S (“BM25 via eager sparse scoring”) is a Python-based information retrieval framework that accelerates BM25 and its variants by precomputing all possible term-document BM25 scores at index time and storing them in a sparse matrix, thereby enabling orders-of-magnitude faster query-time ranked retrieval compared to conventional implementations. BM25S relies on NumPy, SciPy, and (optionally) libstemmer, and achieves up to 500× speedups over popular Python toolkits, while outperforming highly optimized Java-based solutions such as Elasticsearch by factors of 2–10× on many standard datasets. BM25S also generalizes to “non-occurrence” BM25 variants (such as BM25+ and BM25L) via a differential-score shifting method that preserves sparse storage and exact accuracy (Lù, 2024).

1. Classical BM25 Scoring

BM25 is a family of lexical ranking functions used in text retrieval. For a document collection $C$ of size $|C|$ and a query $Q = \{q_1, ..., q_{|Q|}\}$ , classical BM25 assigns to each document $D \in C$ a score

$B(Q, D) = \sum_{i=1}^{|Q|} S(q_i, D)$

where the term-document score $S(t, D)$ is typically given (following Lucene) as:

$S(t, D) = \mathrm{IDF}(t, C) \cdot \frac{\mathrm{TF}(t, D)} {\mathrm{TF}(t, D) + k_1 \left(1 - b + b \frac{|D|}{L_{\mathrm{avg}}} \right)}$

with the following standard components:

$\mathrm{TF}(t, D)$ : frequency of term $t$ in document $D$
$|D|$ : number of tokens in $D$
$L_{\mathrm{avg}} = (1/|C|)\sum_{D\in C}|D|$ : average document length
$k_1 > 0$ , $b \in [0, 1]$ : tunable parameters (e.g., $k_1=1.5$ , $b=0.75$ )
$\mathrm{DF}(t,C) = |\{ D \in C : t \in D \}|$ : term document frequency
$\mathrm{IDF}(t, C) = \ln\left( \frac{|C| - \mathrm{DF}(t, C) + 0.5}{\mathrm{DF}(t, C) + 0.5} + 1 \right)$

Conventional BM25 implementations recompute $\mathrm{TF}$ and $\mathrm{IDF}$ (or look up) at query time and evaluate $S(q_i, D)$ for every $D$ with $q_i$ (via inverted indexes).

2. Eager Sparse Scoring and Matrix Construction

BM25S departs from the traditional inverted-index paradigm by eagerly evaluating each nonzero $S(t,D)$ during corpus indexing and storing the results as a sparse term-document matrix $M \in \mathbb{R}^{|V| \times |C|}$ , where $V$ is the vocabulary:

Each unique word token (possibly after stemming and stopword removal) is mapped to an integer row index $r \in [0, |V|)$ .
For document $D$ (column $c$ ), $M_{r,c} = S(t, D)$ for every $t \in D$ .
Terms not present in a document lead to $\mathrm{TF} = 0$ and hence $S=0$ ; such entries are omitted (sparsity).

Matrix $M$ is stored in CSC (Compressed Sparse Column) format, optimizing for fast access to document-wise sums and efficient slicing by multiple term rows. During querying, for a query $Q = \{q_1, ..., q_m\}$ with corresponding row indices $r_1, ..., r_m$ , BM25S extracts the $m \times |C|$ submatrix $M' = M[[r_1, ..., r_m], :]$ . Summing across rows gives the score vector for all documents:

1	scores = np.array(M′.sum(axis=0)).ravel()

This operation leverages BLAS-accelerated sparse summation, yielding efficient query-time document ranking.

3. Extension to Non-Occurrence Variants: Differential Scoring

Variants of BM25 (such as BM25+, BM25L, and others) may assign a nonzero score even when $\mathrm{TF}(t, D) = 0$ , i.e., when a term does not occur in a document. Let $S^\theta(t) = S(t, \emptyset)$ denote the “non-occurrence score.” For example, BM25+ uses:

$S(t, D) = \frac{\mathrm{IDF}(t) \cdot [\mathrm{TF}(t, D) + \delta]}{\mathrm{TF}(t, D) + k_1 (1-b + b |D|/L_{\mathrm{avg}})}$

with $\delta>0$ and $S(t, \emptyset) = \delta \, \mathrm{IDF} / (k_1(1-b + b |D|/L_{\mathrm{avg}})) > 0$ .

BM25S defines the “differential score”:

$S^\Delta(t, D) = S(t, D) - S^\theta(t)$

For $t \notin D$ , $S(t,D) = S^\theta(t) \implies S^\Delta(t,D) = 0$ . Thus, $S^\Delta$ is still sparse. The aggregate BM25 score is exactly recovered as:

$B(Q, D) = \sum_{i=1}^m S(q_i, D) = \sum_{i=1}^m S^\Delta(q_i, D) + \sum_{i=1}^m S^\theta(q_i)$

BM25S stores only $S^\Delta$ in the sparse matrix; a small 1D array of $S^\theta$ values (for each term) is maintained, and $\sum_{i=1}^m S^\theta(q_i)$ is computed once per query and added to the document scores. This approach generalizes to all BM25 variants covered in Kamphuis et al. (2020) without dense storage explosion.

4. Computational Complexity and Memory Analysis

BM25S exhibits the following complexity characteristics:

Index-time: $O(\sum_{D}|D|)$ arithmetic operations to compute all $\mathrm{TF}$ and $S(t,D)$ ; assembling the CSC sparse matrix with $nnz \approx \sum_D |D|$ explicitly stored entries (each entry: 8 bytes for float + two 4-byte indices).
Query-time: For a query of length $m$ and $n$ documents, extracting $m$ rows and summing $m$ sparse vectors takes $O(\sum_i \mathrm{df}(q_i))$ , where $\mathrm{df}(q_i)$ is the posting list length for $q_i$ . Top- $k$ selection is performed by $\texttt{numpy.argpartition}$ in expected $O(n)$ time.

Empirical usage shows $\sum_i \mathrm{df}(q_i) \ll n$ for typical queries, and fast C-based kernels dominate performance. For comparison:

Naive Python implementations (e.g., Rank-BM25) recompute all terms per query tuple, with $O(mn)$ Python-level operations.
Java Lucene computes $\mathrm{TF}/(\mathrm{TF}+\cdots)$ at query time for each term–document match. BM25S shifts all per-occurrence computations to indexing, enabling high-throughput query-time ranking via vectorized linear algebra.

5. Empirical Benchmarks

BM25S was evaluated on 14 BEIR zero-shot benchmark datasets (e.g., ArguAna, Climate-FEVER, CQADupStack, DBPedia, FEVER, FiQA, HotpotQA, MS-MARCO, NFCorpus, NaturalQuestions, Quora, SciDocs, SciFact, TREC-COVID, Touche2020) on a single-threaded Intel Xeon (2.2 GHz, 30 GB RAM). Throughput is measured in queries per second (QPS):

Dataset	BM25S QPS	Elasticsearch QPS	BM25-PT QPS	Rank-BM25 QPS	Relative Speedup (BM25S/ES)
ArguAna	574	13.7	110.5	2.0	~287×
NFCorpus	1,196	45.8	256.7	224.7	~26×

On 10 of 14 datasets, BM25S achieves $>100\times$ speedup over Rank-BM25, peaking at $500\times$ (ArguAna). Against Java Elasticsearch, BM25S achieves $2$– $10\times$ higher QPS on most cases. NDCG@10 evaluation shows that adding a Snowball stemmer and English stopword removal can improve effectiveness from 38.4 to 39.7 average, confirming parity or slight superiority versus established toolkits (Lù, 2024).

6. Implementation Details and Recipes

Tokenization & Vocabulary: Default uses Scikit-Learn’s regex r"(?u)\b\w\w+\b"; (optionally) C-based libstemmer. Each token is mapped to its integer vocabulary index.
Index Construction:

import numpy as np
from scipy.sparse import csc_matrix
rows, cols, data = [], [], []
idf = np.log((N - df + 0.5)/(df + 0.5) + 1) # shape (|V|,)
for d, doc_terms in enumerate(corpus):
    Ld = len(doc_terms)
    norm = k1*(1 - b + b*(Ld/L_avg))
    for t_idx, tf in doc_terms.items():
        denom = tf + norm
        score = idf[t_idx] * tf / denom
        rows.append(t_idx); cols.append(d); data.append(score)
M = csc_matrix((data,(rows,cols)), shape=(|V|,N))

Querying & Top-k Selection:

sub = M[query_row_indices, :] # m x N sparse
scores = np.asarray(sub.sum(axis=0)).ravel()
topk_idx = np.argpartition(scores, -k)[-k:]
topk_sorted = topk_idx[np.argsort(scores[topk_idx])[::-1]]

Non-occurrence Variants: Store $\Delta = \text{score} - \text{base\_score}$ in matrix, with a small $|V|$ -length array for base scores; at query, $O(m)$ scalar addition retrieves the global offset.
Optional Accelerations: Employ JAX’s jax.lax.top_k for faster selection or wrap matrix operations in a thread pool for multi-threaded throughput.

7. Limitations and Deployment Considerations

Index-Time Resource Usage: Each term occurrence is precomputed and stored as a float (vs. integer $\mathrm{TF}$ for classic inverted indexes). Large corpora (e.g., $2\times10^6$ documents, $2\times10^5$ vocabulary) remain sparse, but RAM requirements can reach tens of GB.
Parameter Fixity: Parameters $k_1$ and $b$ are fixed at index time. Modifying them requires index rebuilding, unlike Rank-BM25, which supports query-time parameter adjustment.
Tokenizer Choice: The provided regex+stemmer combination offers a balance of speed and fidelity. Language-specific analyzers may require customization.
Index Maintenance: BM25S is suited to mostly-static corpora. Document additions/deletions require partial or full reindexing; incremental updates are nontrivial.
Non-Occurrence Overhead: BM25+ and related variants incur an extra $O(m)$ per-query addition, which remains negligible relative to main computation.

BM25S “rolls up” all expensive term-document scoring into an index-time matrix, transforming query-time retrieval to a small set of vectorized operations and top- $k$ selection. Its speed, exactness for BM25 and variants, minimal dependencies, and ease-of-integration make it suitable for both research and production, from server deployments to browser-based Pyodide execution (Lù, 2024).

Markdown Report Issue Upgrade to Chat

References (1)

BM25S: Orders of magnitude faster lexical search via eager sparse scoring (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BM25S.

BM25S: Accelerated Sparse BM25 Retrieval

1. Classical BM25 Scoring

2. Eager Sparse Scoring and Matrix Construction

3. Extension to Non-Occurrence Variants: Differential Scoring

4. Computational Complexity and Memory Analysis

5. Empirical Benchmarks

6. Implementation Details and Recipes

7. Limitations and Deployment Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

BM25S: Accelerated Sparse BM25 Retrieval

1. Classical BM25 Scoring

2. Eager Sparse Scoring and Matrix Construction

3. Extension to Non-Occurrence Variants: Differential Scoring

4. Computational Complexity and Memory Analysis

5. Empirical Benchmarks

6. Implementation Details and Recipes

7. Limitations and Deployment Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research