Generative Search & Recommendation

Updated 30 December 2025

Generative search and recommendation are advanced paradigms that reframe retrieval and ranking as sequence generation over semantic IDs, enabling enhanced personalization.
They leverage large Transformer architectures and multimodal models to convert user queries or histories into tailored document or product identifiers.
Practical evaluations show improvements in metrics like CTR, GMV, and NDCG, while challenges remain in scalability, dynamic corpus handling, and ensuring factual accuracy.

Generative search and recommendation refer to information access paradigms in which large generative models—including LLMs, multimodal generators, and sequence models—are tasked with directly producing relevant item or document identifiers (IDs), textual queries, or even entirely novel content tailored to user intent, replacing traditional discriminative retrieval, ranking, and recommendation pipelines. These approaches reframe the matching task in recommender systems and search engines as sequence generation over semantic or numerical IDs, natural-language queries, or multimodal tokens, enhancing flexibility, personalization, and adaptability in handling rich user contexts and dynamic corpora (Li et al., 2024, Rajput et al., 2023, Liu et al., 19 Oct 2025, Shi et al., 8 Apr 2025).

1. Conceptual Foundations and Unified Frameworks

Generative paradigms depart from the classical retrieve-then-rank architectures found in large-scale search and recommendation systems. Instead of embedding queries/items and employing nearest-neighbor search over high-dimensional codebases, generative methods (a) encode users/queries into flexible prompts, (b) task a generative backbone (e.g., decoder-only Transformer, encoder-decoder LLM, multimodal foundation model) to autoregressively produce a sequence—a document/item ID, keyword, query suggestion, or semantic annotation—that is mapped to actual items or documents (Li et al., 2024, Shi et al., 8 Apr 2025, Gao et al., 26 Sep 2025, Chen et al., 8 Sep 2025).

This approach enables several unified frameworks:

Both search and recommendation are cast as conditional sequence generation tasks:
- Search: $P_\theta(y \mid x)$ where $x$ is a textual/natural-language query, and $y$ is a document or item identifier (Penha et al., 2024, Shi et al., 8 Apr 2025, Gao et al., 26 Sep 2025).
- Recommendation: $P_\theta(y \mid u)$ where $u$ encodes user history, context, or preferences, and $y$ is the next item (often as a semantic ID or code) (Rajput et al., 2023, Ju et al., 29 Jul 2025, Penha et al., 14 Aug 2025, Gao et al., 16 Nov 2025).
Unified multi-task paradigms optimize both search and recommendation via a joint cross-entropy loss, possibly with contrastive or reinforcement learning objectives to maximize mutual information and preference alignment (Penha et al., 2024, Ju et al., 29 Jul 2025, Zhao et al., 9 Apr 2025, Wang et al., 29 Apr 2025, Min et al., 14 Apr 2025).

2. Representation: Semantic Identifiers and Codebooks

A central challenge for generative search and recommendation is representing items and documents in a way that is both efficient for generation and semantically meaningful. Recent approaches introduce “Semantic IDs” (SIDs)—compact, discrete sequences obtained by quantizing content or multimodal embeddings (often via residual K-means, VQ-VAE, or related quantizers) (Penha et al., 14 Aug 2025, Rajput et al., 2023, Zhang et al., 19 Sep 2025, Ju et al., 29 Jul 2025, Shi et al., 8 Apr 2025, Chen et al., 8 Sep 2025).

Construction: Items are first mapped to continuous embeddings using encoders fine-tuned on semantic (search) and collaborative (recommendation) signals (Penha et al., 14 Aug 2025). These embeddings are quantized into multi-level codebooks, producing tuples such as $[c_1, c_2, c_3]$ that serve as the SIDs. Methods include RQ-KMeans, RQ-VAE, and exclusively semantic indexing via conflict-free code assignment (Zhang et al., 19 Sep 2025, Ju et al., 29 Jul 2025).
Joint S&R: Dual-purpose SIDs incorporate both semantic (query-based) and collaborative-filtering signals by concatenating code indices from separately optimized encoders, balancing the trade-off between relevance in both search and recommendation (Shi et al., 8 Apr 2025, Penha et al., 14 Aug 2025).
ID uniqueness is guaranteed through methods like exhaustive candidate matching (ECM) or recursive residual searching (RRS), preventing conflicts and random tie-breakers (Zhang et al., 19 Sep 2025).
In multimodal systems, such as product generation or fashion try-on, SIDs can represent content across text, image, or structured category trees, enabling flexible conditional generation (Ramisa et al., 2024, Gao et al., 16 Nov 2025).

3. Model Architectures and Decoding Algorithms

Generative S&R systems predominantly employ large Transformer-based architectures, often with one or more of the following features:

Decoder-only or encoder-decoder backbones with extended vocabularies to support tokenized SIDs and flexible prompt structures (Gao et al., 26 Sep 2025, Ju et al., 29 Jul 2025, Shi et al., 8 Apr 2025, Rajput et al., 2023, Penha et al., 2024, Acharya et al., 2 Jun 2025).
Architectural modules such as Q-driven blocks (QDB), multitask bi-encoders, or dual-representation learning for context- and query-aware modeling (Yan et al., 25 Sep 2025, Penha et al., 14 Aug 2025, Zhao et al., 9 Apr 2025).
Sequence-to-sequence generation for IDs, with beam search or diffusion-style decoding algorithms to sample diverse, locally optimal candidate sequences (Gao et al., 16 Nov 2025, Rajput et al., 2023).
Hybrid contrastive and ranking losses: Contrastive InfoNCE retrieval objectives, pointwise/pairwise ranking losses (BPR and extensions), and temporal/candidate alignment in training to ensure only available negatives are sampled, mitigating pattern drift under corpus evolution (Penha et al., 2024, Gao et al., 26 Sep 2025, Yan et al., 25 Sep 2025, Shi et al., 8 Apr 2025, Zhao et al., 9 Apr 2025).
Self-evolving post-training paradigms combining supervised fine-tuning and reinforcement learning for improved reasoning and preference alignment, as in context-aware e-commerce search or conversational recommendation (Liu et al., 19 Oct 2025, Wang et al., 29 Apr 2025, Min et al., 14 Apr 2025).

4. Multimodal Generative Search and Recommendation

Contemporary systems extend beyond textual prompts and IDs to multimodal conditioning:

Multi-modal architectures are formed by aligning product data from text, images, audio, and structured attributes (e.g., 3D layouts, segmentation masks) into a joint latent code $Z$ , modeling both complementary ([GAN], [VAE], and [Diffusion]) and shared signals (Ramisa et al., 2024).
Generative models synthesize not only product IDs but actual novel items, images, or experiences, facilitating applications such as virtual try-on, "view in my room," or image-guided retrieval (Ramisa et al., 2024, Samaran et al., 2021, Guo et al., 2023).

5. Evaluation, Performance, and Practical Deployments

Evaluation protocols combine classical information retrieval and ranking metrics with novel metrics tailored for generative paradigms:

Recall@K, NDCG@K, MRR for both S&R tasks, tested over large real-world and synthetic datasets (Amazon, MovieLens, eBook search, industrial e-commerce logs) (Acharya et al., 2 Jun 2025, Li et al., 2023, Shi et al., 8 Apr 2025, Gao et al., 26 Sep 2025, Penha et al., 2024).
Metrics for diversity, coverage, collision rate, and cold-start generalization, emphasizing robustness to new items or contexts (Ju et al., 29 Jul 2025, Rajput et al., 2023, Zhang et al., 19 Sep 2025, Gao et al., 16 Nov 2025, Acharya et al., 2 Jun 2025).
Semantic retrieval (dense similarity over embeddings) outperforms lexical matching (BM25), and unified multitask semantic ID tokenization achieves balanced performance for both search and recommendation with improved tail coverage (Penha et al., 14 Aug 2025).
Online production deployments demonstrate substantial uplift in business KPIs (CTR, GMV, ACC), with single-generative S&R backbones outperforming cascaded pipelines and prior multitask models (Yan et al., 25 Sep 2025, Gao et al., 26 Sep 2025, Chen et al., 8 Sep 2025).
Human feedback and click-based alignment are incorporated via reward-model alignment (CTR predictors, RL, DPO/PPO-based listwise training) (Min et al., 14 Apr 2025, Wang et al., 29 Apr 2025).

6. Open Problems, Future Directions, and Limitations

Despite rapid progress, several open challenges and research avenues persist:

Efficient index updates under corpus dynamism: enabling adding/removing documents/items without expensive retraining or index rebuilding (Li et al., 2024, Penha et al., 14 Aug 2025, Zhang et al., 19 Sep 2025).
Scalability of generative decoding (beam search optimization, sublinear constraint decoding, hybrid recall-then-rerank pipelines) (Li et al., 2024, Gao et al., 16 Nov 2025, Ju et al., 29 Jul 2025).
Hallucination and factuality: ensuring generated identifiers correspond to actual corpus items; Trie- or FM-index constraint decoding and controlled RL are current partial solutions (Li et al., 2024, Chen et al., 8 Sep 2025).
Deeper personalization: modeling collaborative signals, user segment adaptation, and interactive multi-turn refinement (Penha et al., 2024, Acharya et al., 2 Jun 2025, Wang et al., 29 Apr 2025).
Full end-to-end multimodal generative agents (vision-language, video, cross-modal contexts) remain underexplored (Ramisa et al., 2024).
Information-theoretic optimizers and subspace partitioning provide principled enhancements to prompt-based multitask learning; maximizing mutual information per task promises further model generalization and conflict reduction (Zhao et al., 9 Apr 2025).

Generative search and recommendation articulate a flexible, unified paradigm that bridges classical IR and recommender systems with large foundation models, semantic item representations, and user-centric context modeling, yielding enhanced adaptability, personalization, and performance across text and multimodal domains. Future work focuses on large-scale deployment, multimodal expansion, efficient dynamic corpora handling, interactive feedback integration, and robust evaluation metrics.