Recursive Model Index (RMI)

Updated 10 February 2026

Recursive Model Index (RMI) is a learned indexing method that employs a hierarchy of models to approximate key-to-position mappings and narrow search ranges.
It uses multi-layered predictive models where each level refines the approximation, thereby achieving sub-logarithmic average lookup performance.
Benchmarks indicate that RMIs can deliver competitive lookup latencies and memory trade-offs compared to both traditional and other learned index structures.

RadixSpline is a learned index structure designed to approximate the mapping from sorted keys to their array positions with high efficiency. It combines a piecewise-linear error-bounded spline with a compact radix table to facilitate single-pass construction, competitive lookup performance, and simple parameter tuning. RadixSpline has been demonstrated to be competitive in size and lookup latency with state-of-the-art learned indexes, such as Recursive Model Indexes (RMIs), while being substantially simpler to implement and build (Kipf et al., 2020).

1. Formal Specification

A RadixSpline index is defined over a sorted list of key/position pairs

$D = [(k_0,p_0), (k_1,p_1), \ldots, (k_{N-1},p_{N-1})]$

and consists of two components:

(a) Error-bounded spline approximation:

The spline $S(\cdot)$ is a piecewise-linear function comprising $M$ knots $(x_0,y_0), (x_1,y_1), \ldots, (x_{M-1},y_{M-1})$ where $x_j$ are keys, $y_j$ are positions, and $M \ll N$ for moderate error bound $E$ . For each $x \in [x_i, x_{i+1}]$ , $S(x) = y_i + \frac{y_{i+1}-y_i}{x_{i+1}-x_i} (x - x_i)$ For all data points, $|S(k_i) - p_i| \leq E$ .

(b) Flat radix table:

A table $T[0\ldots2^r]$ maps the $r$ most significant bits (after removing any fixed leading prefix) of a query key $x$ to an interval in the knot array. For a radix value $b = \text{top}_r\_\text{bits}(x)$ ,

$T[b]$ : smallest index $j$ such that $x_j$ 's top- $r$ -bit prefix $\geq b$
$T[b+1]$ : likewise for prefix $\geq b+1$ At lookup, the spline segment for $x$ is bracketed between knots $T[b]$ and $T[b+1]$ .

2. Single-Pass Construction

RadixSpline supports single-pass construction by employing the Greedy Spline Corridor algorithm. The spline and radix table are built together in a single scan over the sorted keys.

Key steps:

Initialize the spline with the first data point.
Track a corridor (range of feasible slopes) that maintains the error bound $E$ .
When the corridor is violated by a new data point, emit a new knot, and update the radix table for the affected prefix region.
The process continues, emitting knots and filling the radix table on-the-fly.
After the last point, fill any unassigned entries in $T$ with the most recent value.

Complexity:

Time: $O(N)$ ; each key is examined once.
Space: $O(M + 2^r)$ , where $M \approx N/E$ for uniform data and $r$ is user-specified.

Pseudocode Overview:

The construction pseudocode, as presented in the source, is as follows (abbreviated here for clarity; exact code and formulas in (Kipf et al., 2020)):

def BuildRadixSpline(D, E, r):
    Initialize state, knots, T
    for each (k, p) in D[1:]:
        Update slope corridor
        if violated:
            emit knot, fill radix table
            reset corridor
    emit final knot, fill radix table
    finalize T
    return (knots, T)

3. Lookup Procedure and Complexity

Index lookup consists of three phases:

Radix Table Bracketing: Extract the top- $r$ bits of the query, obtain $j_0=T[b]$ , $j_1=T[b+1]$ .
Binary Search on Knots: Identify segment $[x_i, x_{i+1}]$ such that $k_i \leq x < k_{i+1}$ within knots $[j_0, j_1)$ .
Final Binary Search in Array: Compute predicted position $\hat{p}$ via spline interpolation, then search in $[\hat{p}-E, \hat{p}+E]$ for the true key.

Complexity:

$O(1)$ for radix table access.
$O(\log(M/2^r))$ for the knot range binary search; typically near constant with appropriate $r$ .
$O(\log E)$ for the final search in the array segment.

The worst-case overall complexity is $O(\log N)$ , but with tuned parameters typical queries achieve sub-logarithmic average performance.

4. Parameter Space and Trade-off Analysis

RadixSpline has two parameters:

$E$ : Error bound on the spline interpolation.
$r$ : Number of radix bits used ( $2^r$ table entries).

Trade-offs:

Decreasing $E$ yields more knots ( $M \sim N/E$ ), increasing spline memory but reducing the refinement window during lookup.
Increasing $r$ enlarges the radix table but reduces the average length of knot ranges for each radix slot, thus limiting the binary search span over knots.

Heuristic:

Select $E$ so $N/E$ matches the available memory for the spline, then set $r \approx \lceil \log_2(N/E) \rceil$ .

Concrete Example (face dataset, $N=2\times10^8$ ):

$(E=2,r=25)$ : $M\approx10^8$ knots, total index size $\approx$ 650 MiB, lowest latency.
$(E=16,r=20)$ : $M\approx1.25\times10^7$ knots, index size $\approx$ 200 MiB, and lookup only 11.5% slower; space/time tradeoff is significantly improved for modest latency inflation.

5. Experimental Performance and Evaluation

Evaluation was performed using the SOSD benchmark with 200M 64-bit keys on an AWS c5.4xlarge (single-threaded) (Kipf et al., 2020).

Competing methods: Binary Search (BS), STX B+-tree (stride=32), Adaptive Radix Tree (ART), Recursive Model Index (RMI), and RadixSpline (RS).
Datasets: amzn, face, logn, osmc, wiki, etc.

Summary Table: Performance Metrics

Index	Build Time (s)	Lookup Latency (ns/op)	Size (MiB)
BS	none	~850	~100
BTree	~0.7	~600	~100
ART	~0.7	~300	~100
RMI	3–6	120–250	~100
RS (tuned)	~0.9	130–280	100–650 (dataset)

Key findings highlight that RS achieves:

Single-pass build time ( $\approx$ 0.9s), faster than RMI (3–6s).
Competitive lookup latency ( $\approx$ 130–280ns).
Tunable index size, matching or exceeding the compactness of non-learned and learned competitors depending on $E, r$ .

Trade-off trends (face):

As $E$ increases ( $2 \rightarrow 128$ ): build time lowers (1.2s $\rightarrow$ 0.4s), memory drops (650 MiB $\rightarrow$ ≤100 MiB), latency moderately degrades (180ns $\rightarrow$ 300ns).
As $r$ increases ( $20 \rightarrow 26$ ): table memory increases (4MiB $\rightarrow$ 256MiB), with only moderate latency reduction due to better knot bracketing.

LSM-Tree Case Study (RocksDB, osmc):

Dropping B+-tree per SSTable for RS ( $E=16,r=22$ $E = 16, r = 22$ ) on a 400M operation mixed workload:
- Read latency decreases by 20%
- Write latency increases by 4% (owing to one-pass builds on compaction)
- Total time reduced from 712s to 521s (−27%)
- Index memory reduced by 45% (freeing space for larger Bloom filters)

6. Implementation Recommendations and Optimization

Minimalistic codebase ( $\approx$ 100 lines of idiomatic C++), without any third-party ML dependencies.
Drop common most significant key prefixes prior to radix table construction to minimize $r$ and table size.
Store knot arrays contiguously; use 32-bit integers for table entries (guaranteed for $M \leq N \leq 2^{32}$ ).
Fill the radix table in a left-to-right pass interleaved with knot emission for improved cache performance.
In highly skewed data, radix slots may span more knots than average; fallback strategies (e.g., local tree index) may be applied per slot.
The sole required algorithmic dependency is a fast binary search over integers.

RadixSpline leverages the GreedySplineCorridor algorithm of I. Apostolico et al. for efficient, one-pass $O(N)$ , error-bounded spline approximation. It differs fundamentally from RMIs by:

Being trainable in a single pass, versus multi-epoch or multi-model training for RMI
Using only two interpretable parameters for tuning
Supporting direct analytical reasoning about space/time trade-offs

The design intent is to enable system programmers to reliably tune and deploy single-pass learned indexes competitive with the best multi-pass techniques, using only standard library operations and algorithmic constructs (Kipf et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

RadixSpline: A Single-Pass Learned Index (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Model Index (RMI).