LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

Published 18 Jul 2024 in cs.LG and cs.AI | (2407.13218v3)

Abstract: This paper introduces LiNR, LinkedIn's large-scale, GPU-based retrieval system. LiNR supports a billion-sized index on GPU models. We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. A key focus is on enabling attribute-based pre-filtering for exhaustive GPU searches, addressing the common challenge of post-filtering in KNN searches that often reduces system quality. We further provide multi-embedding retrieval algorithms and strategies for tackling cold start issues in retrieval. Our advancements in supporting larger indexes through quantization are also discussed. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes. Applied to out-of-network post recommendations on LinkedIn Feed, LiNR has contributed to a 3% relative increase in professional daily active users. We envisage LiNR as a step towards integrating retrieval and ranking into a single GPU model, simplifying complex infrastructures and enabling end-to-end optimization of the entire differentiable infrastructure through gradient descent.

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces LiNR, a large-scale GPU-based neural retrieval framework at LinkedIn that integrates item indexes and neural models within a unified model binary for efficient differentiable search.
LiNR employs exhaustive and quantized KNN search enhanced by attribute-based pre-filtering to improve retrieval quality before candidate selection, unlike traditional post-filtering.
A key innovation of LiNR is its support for live updates within model-based retrieval systems in production, which improved user engagement metrics and professional daily active users by 3%.

Overview of LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

This paper presents a comprehensive exploration of LiNR (LinkedIn Neural Retrieval), a large-scale GPU-based retrieval framework developed by LinkedIn. The primary focus of LiNR is to efficiently manage a billion-sized index on GPUs by reinventing the ways search indexes are structured and leveraged through neural networks. The paper offers significant insights into the challenges and solutions devised during the deployment of differentiable search indexes using mainstream machine learning frameworks such as TensorFlow and PyTorch at a production scale.

The central theme of the paper revolves around integrating item indexes and neural model weights within a unified model binary. This integration is expressed through viewing the construction of the index as an extension of model training, thereby facilitating seamless updates and enhancements in the retrieval mechanism all through the power of GPU acceleration.

One of the pivotal breakthroughs highlighted is LiNR's ability to conduct attribute-based pre-filtering for exhaustive GPU searches, thereby addressing the common practice of post-filtering in K-Nearest Neighbor (KNN) searches, which oftentimes degrades the quality of results due to late-stage redundancies in candidate selection. The method significantly enhances retrieval quality by ensuring that attribute compliance is verified a priori, which is a stark deviation from traditional search methodologies.

Key Components and Innovations

Exhaustive and Quantized KNN:
- LiNR employs exhaustive KNN searches that incorporate attribute-based pre-filtering, improving system liquidity. The results are notably efficient in memory consumption and computational speed due to state-of-the-art quantization techniques applied to extend the capacity of larger indexes without strikingly increasing GPU requirements.
Multi-Embedding Retrieval Algorithms:
- The paper presents a comprehensive strategy of leveraging multiple embeddings for retrieval and ranking, notably improving cold start scenarios when new items need indexing without historically rich interaction data.
Model-Based Live Updates:
- LiNR sets a precedent by supporting live updates within model-based retrieval systems in a production scenario, thereby ensuring content freshness is retained and improving user engagement metrics significantly in tested environments.

Empirical Results

Empirical analyses underscore LiNR's industrial implementation with a pronounced increase of 3% in professional daily active users, attributed to its deployment in enhancing LinkedIn's out-of-network post recommendations. Such numerical results help in underlining the practical applicability and robustness of LiNR's approach to neural retrieval systems at scale.

Implications and Future Directions

Theoretically, the work invigorates ongoing discussions in the field of search and retrieval systems by illustrating a tangible step towards the integration of retrieval and ranking into a cohesive GPU model. The implications for simplifying infrastructure and facilitating end-to-end optimization through gradient descent techniques are vast, suggesting potential for similar methodologies across various recommender systems.

In conclusion, the developments and methodologies adopted and validated through LiNR provide a compelling case for revisiting entrenched retrieval strategies in the era of neural computation. The integration, optimization, and scale at which LiNR operates may potentially set a benchmark for future advancements in AI-driven information retrieval, particularly in industrial applications where high demand and real-time responses are prerequisites.

Markdown Report Issue