On Symmetric and Asymmetric LSHs for Inner Product Search

Published 21 Oct 2014 in stat.ML, cs.DS, cs.IR, and cs.LG | (1410.5518v3)

Abstract: We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context. Shrivastava and Li argue that there is no symmetric LSH for the problem and propose an asymmetric LSH based on different mappings for query and database points. However, we show there does exist a simple symmetric LSH that enjoys stronger guarantees and better empirical performance than the asymmetric LSH they suggest. We also show a variant of the settings where asymmetry is in-fact needed, but there a different asymmetric LSH is required.

Abstract PDF Upgrade to Chat

Citations (165)

View on Semantic Scholar

Summary

The paper demonstrates that a symmetric Locality Sensitive Hashing (LSH) exists for Maximum Inner Product Search (MIPS) under specific data normalization and boundedness conditions, challenging prior claims.
The research identifies key parameter restrictions, showing symmetric LSH is sufficient when queries are normalized and data points are bounded, outperforming previous asymmetric methods in these cases.
Empirical evaluations support the proposed symmetric LSH's superior performance and applicability, encouraging more efficient and parameter-free hashing models in practice.

An Insight into Locality Sensitive Hashing for Inner Product Search

The paper "On Symmetric and Asymmetric LSHs for Inner Product Search," authored by Behnam Neyshabur and Nathan Srebro, deals with explorations into Locality Sensitive Hashing (LSH) applicable to Maximum Inner Product Search (MIPS). Addressing the limitations and capabilities of symmetric and asymmetric LSH when applied to inner product similarity, this research unveils critical findings relevant to efficient data retrieval in high-dimensional spaces.

Traditionally, LSH has been a pivotal technique for approximate nearest neighbor search due to its transformability to hash functions that decrease dimensionality while preserving distance properties. However, challenges arise when adapting LSH to inner product similarity, a core operation in many machine learning applications, such as recommendation systems, multi-class prediction, and vision tasks.

Historically, it was contended that symmetric LSH is inapplicable for inner product search over the entire Euclidean space, leading to the proposal of asymmetric LSH variants. This paper distinguishes itself by demonstrating the existence of a symmetric LSH under specific normalizing conditions, effectively challenging prior conclusions.

Research Contributions and Findings

The authors critically assess previous work by Shrivastava, which claimed no symmetric LSH exists for MIPS, necessitating an asymmetric approach. Through rigorous mathematical formulation and probabilistic analysis, Neyshabur and Srebro counter this by showing a simple symmetric LSH, which outperforms the aforementioned methods under normalized query and bounded data assumptions.
The research identifies parameter restrictions and comparisons between symmetric and asymmetric LSH. It defines scenarios where symmetric hashing suffices, such as when queries are normalized and data points are bounded within the unit sphere.
Empirical evaluations accentuate the superior performance and applicability of this symmetric LSH over previously proposed methods, such as l2-alsh and sign-alsh. The symmetric version exhibits improved hashing quality and theoretical properties, suggesting enhanced retrieval precision with simpler, more universal implementations.
The paper also dissects conditions under which asymmetric hashing indeed offers benefits, particularly when no a priori normalization is applied to the data. The findings establish that although symmetric hashes suffice in particular tailored settings, asymmetric hashes retain worth in their broader, unconditional application scenarios.

Theoretical and Practical Implications

The ramifications of these findings extend into both theoretical and practical realms. Theoretically, this research clears ambiguity in the potential or lack thereof for symmetric functions in locality-sensitive scenarios, aligning hashing methods with specific data characteristics and search intentions more precisely. Practically, this encourages the use of more efficient, parameter-free hashing models in real-world applications, reducing computational and implementation burdens.

Looking forward, the approach opens avenues for refining LSH frameworks across varied structures, including other types of similarity measures and high-dimensional data beyond static vector representations. Furthermore, it signals that a comprehensive understanding of the interplay between data preprocessing and hashing methods could spur further optimizations in AI systems reliant on rapid similarity checks.

The study establishes a cornerstone for balancing efficiency and precision in data-intensive systems, posing intriguing questions on the reconciliation of symmetry and asymmetry across different domains, perhaps inspiring future advances in scalable and adaptable LSH methods.