Papers
Topics
Authors
Recent
Search
2000 character limit reached

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Published 13 Jun 2023 in cs.IR and cs.LG | (2306.08121v2)

Abstract: Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, pages 7–10, 2016.
  2. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems, pages 191–198, 2016.
  3. Jukebox: A generative model for music, 2020.
  4. How to learn item representation for cold-start multimedia recommendation? In Proceedings of the 28th ACM International Conference on Multimedia, pages 3469–3477, 2020.
  5. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  6. C. A. Gomez-Uribe and N. Hunt. The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4):1–19, 2015.
  7. Learning vector-quantized item representation for transferable sequential recommenders. arXiv preprint arXiv:2210.12316, 2022.
  8. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128, 2010.
  9. Learning multi-granular quantized embeddings for large-vocab categorical features in recommender systems. In Companion Proceedings of the Web Conference 2020, pages 562–566, 2020.
  10. Learning to embed categorical features without embedding tables for recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 840–850, 2021.
  11. A music recommendation system with a dynamic k-means clustering algorithm. In Sixth international conference on machine learning and applications (ICMLA 2007), pages 399–403. IEEE, 2007.
  12. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
  13. Autoregressive image generation using residual quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11523–11532, 2022.
  14. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
  15. Pinnerformer: Sequence modeling for user representation at pinterest. arXiv preprint arXiv:2205.04507, 2022.
  16. Recommender systems with generative retrieval. arXiv preprint arXiv:2305.05065, 2023.
  17. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 253–260, 2002.
  18. Adaptive feature sampling for recommendation with missing content feature values. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1451–1460, 2019.
  19. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  20. Dropoutnet: Addressing cold start in recommender systems. Advances in neural information processing systems, 30, 2017a.
  21. Content-based neighbor models for cold start in recommender systems. In Proceedings of the Recommender Systems Challenge 2017, pages 1–6. 2017b.
  22. Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning, pages 1113–1120, 2009.
  23. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, jul 2018. doi: 10.1145/3219819.3219890. URL https://doi.org/10.1145%2F3219819.3219890.
  24. Vector-quantized image modeling with improved vqgan. arXiv preprint arXiv:2110.04627, 2021.
  25. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  26. Soundstream: An end-to-end neural audio codec. CoRR, abs/2107.03312, 2021. URL https://arxiv.org/abs/2107.03312.
  27. Model size reduction using frequency based double hashing for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems, pages 521–526, 2020.
  28. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 43–51, 2019.
Citations (12)

Summary

  • The paper presents Semantic IDs generated via RQ-VAE to replace random IDs, improving generalization in recommendation ranking.
  • It compares n-gram and SPM-based adaptations, with SPM notably enhancing cold-start and overall CTR performance.
  • The approach preserves semantic hierarchies in item content, reducing computational overhead and improving user personalization.

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Introduction

The paper "Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations" (2306.08121) addresses the prevalent issue in recommendation systems where randomly-hashed item IDs are used, which limits the ability to generalize across similar items, especially in large, evolving item corpora. To tackle this, the authors propose the use of content-derived Semantic IDs to replace random IDs, aiming to strike a balance between memorization and generalization while maintaining model quality.

Methodology

Semantic ID Generation with RQ-VAE

The cornerstone of the proposed methodology is the generation of Semantic IDs using RQ-VAE, a Residual-Quantized Variational AutoEncoder. This process involves encoding content embeddings into discrete, hierarchically-structured IDs that preserve semantic relationships among items. The RQ-VAE uses a multi-level quantization process to convert dense item representations into compact, discrete codes, which effectively capture the hierarchical nature of item concepts. Figure 1

Figure 1: Illustration of RQ-VAE: The input vector is encoded into a latent , quantized into Semantic ID (1,4,6,2), representing hierarchical concepts.

Adapting Semantic IDs in Ranking Models

The adaptation of Semantic IDs into ranking models is performed through two main strategies: n-gram-based and SentencePiece Model (SPM)-based adaptations. The former groups semantic codes into fixed-length n-grams, while the latter uses variable-length subword units dynamically learned from the item distribution. This flexibility allows SPM to more effectively manage embedding table entries, balancing memorization needs and generalization abilities.

Experiments and Results

The proposed approach was tested in a real-world YouTube video recommendation scenario, comparing the performance of Semantic IDs against traditional random hashing and direct content embeddings. The experiments demonstrated that Semantic IDs, particularly when adapted using SPM, significantly improved cold-start performance and overall recommendation quality by enhancing generalization capabilities without sacrificing memorization.

Performance Metrics

Several key performance metrics were used to evaluate the models:

  • CTR AUC: Measures the overall click-through rate (CTR) across the dataset.
  • CTR/1D AUC: Focuses on the ability to generalize to new, never-before-seen items introduced daily.

These metrics highlighted the superior performance of SPM-based Semantic IDs in improving model generalization, especially evident in the CTR/1D AUC results. Figure 2

Figure 2

Figure 2: Overall CTR AUC showing improvements with Semantic IDs.

Figure 3

Figure 3

Figure 3: CTR/1D AUC indicating enhanced generalization to cold-start items.

Semantic Hierarchies

Semantic IDs also demonstrated the ability to capture meaningful hierarchical structures in item categories, such as sports or food vlogging videos, thus providing better contextual recommendations. Figure 4

Figure 4: A sub-trie capturing sports videos' hierarchical structures with Semantic IDs.

Discussion and Implications

The introduction of Semantic IDs addresses several limitations of traditional ID-based recommendation systems, including item sparsity and memorization constraints. By leveraging structured semantic representations, the approach not only enhances cold-start performance but also reduces computational overhead compared to dense content embeddings.

The implications for large-scale industrial recommendation systems are notable. Adopting Semantic IDs could lead to improved user personalization and engagement by facilitating dynamic adaptation to shifting content distributions. Moreover, the ability to retain semantic hierarchies opens avenues for more nuanced content discovery and recommendation strategies.

Conclusion

The study successfully demonstrates that Semantic IDs offer a viable solution for improving generalization in recommendation models. By integrating these compact, semantically-aware representations, systems can better manage evolving item corpora and enhance user experience through more accurate and contextually relevant recommendations. Future work could explore optimizing the training of RQ-VAE models and expanding the application of Semantic IDs across other domains beyond video recommendations.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 9 tweets with 5 likes about this paper.