Maximum Inner Product Search (MIPS)
- MIPS is an algorithmic task that retrieves the vector(s) with the highest inner product with a query, essential for recommender systems and embedding-based searches.
- Advanced methods like asymmetric LSH, bandit algorithms, and proximity graphs enable sublinear query times by overcoming challenges of high dimensionality and norm bias.
- Theoretical guarantees and empirical results demonstrate significant speedups and high recall in large-scale applications, making MIPS a critical tool for real-world data retrieval.
Maximum Inner Product Search (MIPS) is the algorithmic task of, given a database of vectors and a query , efficiently retrieving the vector(s) maximizing the inner product . MIPS is fundamental in recommender systems, large-scale classification, retrieval-augmented generation, and embedding-based search. Algorithmic research has produced a wide array of techniques spanning exact, approximate, dense, and sparse settings, with theoretical and empirical underpinnings informed by advances in randomized, hashing, bandit, graph, and quantization-based indexing.
1. Formal Problem Statement and Complexity
Let . The exact MIPS problem is
For top- variants, return the vectors with the largest inner products with . The naïve approach computes all products in time per query. High dimensionality and large make linear scan impractical, motivating the design of sublinear-time algorithms with provable or empirical accuracy-speed tradeoffs (Liu et al., 2018).
2. Indexing and Search Methodologies
A. Locality-Sensitive Hashing (LSH) and Asymmetric Transformations Standard LSH for metrics fails for inner product because it lacks triangle inequality and self-similarity. To overcome this, asymmetric transformations reduce MIPS to Euclidean or cosine nearest neighbor search. L2-ALSH and Sign-ALSH apply such mappings, making standard LSH applicable; the latter achieves state-of-the-art collision gaps and practical performance (Shrivastava et al., 2014). Simple-LSH and its improvements (Norm-Ranging/Norm-Range Partition) address norm imbalances by partitioning the dataset by 2-norm and building per-block LSH indices, substantially reducing hash exponents and query costs (Yan et al., 2018, Yan et al., 2018).
B. Bandit and Sampling Algorithms The key insight is to interpret each database vector as a multi-armed bandit arm, with arm-reward lists being the set of per-coordinate products . The fixed-confidence best-arm identification objective is to find, with probability at least , a vector whose inner product is within of the optimum while minimizing coordinate accesses. The BoundedME algorithm adapts median-elimination to this "finite-reward, bounded-pull" model, yielding explicit guarantees and eliminating the need for preprocessing (Liu et al., 2018). BanditMIPS advances this further by random subsampling on each candidate coordinate, adaptively evaluating more for promising candidates. Its complexity is independent of , scaling as in the inter-arm gap, yielding high accuracy at optimal sample complexity in high dimensions (Tiwari et al., 2022).
C. Sparse and Hybrid Schemes Sparse MIPS, crucial for retrieval-augmented generation and IR, employs inverted indices with SIMD-accelerated batched accumulation (SINDI), optimized windowing for cache, and mass pruning for rapid discard of irrelevant postings. This enables order-of-magnitude increases in QPS at >99% recall, confirmed across large open and industrial datasets (Li et al., 10 Sep 2025). Hybrid methods treat dense and sparse MIPS uniformly via sketching and IVF-based partitioning, with spherical k-means providing cluster-based dynamic pruning, allowing unified high-recall retrieval (Bruch et al., 2023).
D. Proximity Graph and Dual-Metric Indexing Empirically, MIPS solutions exhibit strong norm bias: true maxima almost always occur among high-norm items. Graph-based methods (e.g., ip-NSW, Möbius graph, SSG) exploit this by constructing proximity graphs under IP or Euclidean metrics; however, naive IP-only walks can stall in local optima. ip-NSW+ enhances this by introducing an angular (cosine) graph for directionality and running a two-phase search, achieving up to 11× speedup over IP-only walks (Liu et al., 2019). Recent theoretical breakthroughs prove that MIPS is equivalent to Euclidean NN search for a query scaled by a sufficiently large factor, allowing direct application of advanced NNS graph indices and edge-pruning rules (e.g., SSG, MRNG), with no distortion or loss of topological fidelity (Chen et al., 10 Mar 2025, Chen et al., 21 Apr 2025). Metric-amphibious index construction (MAG) further "stitches" IP and Euclidean connectivity, adaptively tuning the edge mix and search path based on dataset-specific indicators (norm variation, Davies–Bouldin indices) to maximize global connectivity and local convergence (Chen et al., 21 Apr 2025).
E. Quantization, Sampling, and Clustering Low-memory (or disk-resident) approximate MIPS solutions include partitioning the coordinate space into subspaces and learning per-block codebooks to minimize quantization error on the inner product (e.g., QUIP). Clustering-based methods (e.g., spherical k-means) achieve high recall and speed for top-k MIPS by reducing the candidate set via cosine similarity clusters (Guo et al., 2015, Auvolat et al., 2015). Sampling-based approaches (wedge, diamond, deterministic wedge) trade off inner product computations and recall via controlled, budgeted exploration, outperforming prior budgeted algorithms in practical recall-speed regimes (Yu et al., 2016, Lorenzen et al., 2019). A recent entry, CEOs, leverages extreme order statistics and concomitants to select and index only the most informative projected coordinates, yielding near-theoretical lower-bound performance for (1+)-approximate MIPS (Pham, 2020).
3. Theoretical Bounds and Guarantees
- LSH exponents (determined by collision/decay gaps) govern query complexity, improved by per-block normalization and partitioning (Yan et al., 2018, Yan et al., 2018).
- Bandit-based approaches provide explicit probably approximately correct (PAC) guarantees through finite-population (sampling-without-replacement) concentration inequalities (Liu et al., 2018).
- Graph-theoretic reductions demonstrate that greedy NN walks on appropriate proximity graphs (Euclidean or "stitched" IP+Euclidean graphs) can guarantee convergence to exact MIPS solutions with path length/complexity matching that for metric NNS (Chen et al., 10 Mar 2025, Chen et al., 21 Apr 2025).
- Sampling (wedge-based) and quantization (block-PQ/QUIP) methods provide formal concentration and error bounds under mild distributional assumptions (Guo et al., 2015, Lorenzen et al., 2019).
4. Empirical Performance and Applications
Experimental evaluation on benchmarks ranging from MovieLens and Netflix (MF embeddings, –$300$), ImageNet/SIFT (vision, –$150$), Gist/Tiny5M, and large IR-style corpora (MS Marco, BEIR, Laion10M, Commerce100M) demonstrate:
- Sublinear and bandit-based methods (e.g., BoundedME, BanditMIPS, CEOs, wedge/dWedge sampling) sustain 90–99% recall at speedups from – versus brute force for .
- Graph-hybrid indices (MAG/ANMS, SSG+PSP, ip-NSW+) deliver order-of-magnitude speedups and scale gracefully with and , with stability confirmed across norm/cosine-distributed datasets (Chen et al., 21 Apr 2025, Chen et al., 10 Mar 2025, Liu et al., 2019).
- Quantization-based and clustering approaches outperform LSH and tree baselines in high-dimensional, moderate- regimes with low memory overhead (Guo et al., 2015, Auvolat et al., 2015).
- Sparse MIPS indices (SINDI) enable high-throughput retrieval in real-world, multi-lingual production settings while maintaining recall@50 99% (Li et al., 10 Sep 2025).
A subset of methods (PSP, SINDI) is validated in large-scale production (Shopee, Ant Group VSAG), supporting fast online updating and multi-threaded querying.
5. Key Phenomena: Norm Bias, Non-Metricity, and Topology
The “norm bias” intrinsic to MIPS implies true maxima are almost always among highest-norm items, as established by theoretical modeling and empirical quantile-statistics (Liu et al., 2019). This undermines direct use of metric-tree or metric-graph pruning. Asymmetric transformations and per-block normalization address imbalance, but topology-reducing transforms (e.g., Möbius, Euclidean-projected) can destroy neighbor relations and limit recall (Chen et al., 21 Apr 2025). Proximity graph methods, especially those blending IP and metric edges, reconcile efficiency with preservation of the data's manifold structure and local-global navigation for robust, scalable search (Chen et al., 21 Apr 2025, Chen et al., 10 Mar 2025).
6. Extensions and Future Directions
- Contextual and hybrid MIPS: Incorporating side information (e.g., cluster structure, auxiliary features) and supporting joint dense-sparse retrieval in hybrid indexes (Liu et al., 2018, Bruch et al., 2023).
- Reverse MIPS: Efficiently finding all users for whom a target item appears in their top-k, with index-based approaches affording 100 speedup (Amagata et al., 2021).
- Adaptive search and learning-based routing: Reinforcement and imitation learning on proximity graphs improve search efficiency by leveraging global and expert-annotated trajectories (Feng et al., 2022).
- Distribution- and topology-aware parameter tuning: Empirically validated indicators (norm coefficient of variation, Davies–Bouldin indices) support automated edge and search parameter selection (Chen et al., 21 Apr 2025).
- Streaming, dynamic, and billion-scale MIPS: Continued work on dynamic graph augmentation, amortized complexity tightening, and distributed/parallel search to handle ever-increasing data sizes.
7. Summary Table: MIPS Algorithms and Properties
| Methodology | Preprocessing | Theoretical Guarantee | Typical Speedup | Datasets/Scenarios |
|---|---|---|---|---|
| LSH/ALSH | Heavy (per-dataset) | Provable | – | Medium/high , |
| Per-block LSH | Moderate (partitioned) | Improved | – | Long-tail norm datasets |
| Bandit/MAB | None | Explicit | – | On-the-fly queries |
| Proximity Graph | Heavy | Path/solution equivalence | – | Extreme-scale/recall* |
| Quantization | Moderate (codebooks) | MC-bound on | – | Low-memory, fixed |
| Clustering/IVF | Low to moderate | Empirical, distributional | – | Hybrid sparse/dense |
| Sampling/sCEOs | None (or exp. index) | Concentration, gap | – | Top-k for small / |
| Sparse Inverted | None/moderate | Empirical | – | Production IR, RAG |
*Speedup dependent on recall, top-, and data topology.
Maximum Inner Product Search now occupies a central role in high-dimensional data retrieval. The research trajectory shows systematic improvements in theoretical understanding (e.g., query-scaling duality), algorithmic flexibility (partition, quantization, stochastic, and graph-based frameworks), and real-world performance, especially when methods are tuned to the underlying geometry, norm structure, and hardware or deployment constraints. Continued integration of adaptive, distribution-aware, and learning-based approaches—together with principled hybridization of metric and non-metric search—remains a critical direction for further practical and foundational gains.