Discrete Semantic Tokenization for Deep CTR Prediction
Abstract: Incorporating item content information into click-through rate (CTR) prediction models remains a challenge, especially with the time and space constraints of industrial scenarios. The content-encoding paradigm, which integrates user and item encoders directly into CTR models, prioritizes space over time. In contrast, the embedding-based paradigm transforms item and user semantics into latent embeddings, subsequently caching them to optimize processing time at the expense of space. In this paper, we introduce a new semantic-token paradigm and propose a discrete semantic tokenization approach, namely UIST, for user and item representation. UIST facilitates swift training and inference while maintaining a conservative memory footprint. Specifically, UIST quantizes dense embedding vectors into discrete tokens with shorter lengths and employs a hierarchical mixture inference module to weigh the contribution of each user--item token pair. Our experimental results on news recommendation showcase the effectiveness and efficiency (about 200-fold space compression) of UIST for CTR prediction.
- Neural machine translation by jointly learning to align and translate. arXiv (2014).
- Behavior sequence transformer for e-commerce recommendation in alibaba. In 1st DLP4Rec. 1–4.
- Tom Fawcett. 2006. An introduction to ROC analysis. PRL (2006).
- DeepFM: a factorization-machine based neural network for CTR prediction. arXiv (2017).
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. TOIS 20, 4 (2002).
- Language Models As Semantic Indexers. arXiv (2023).
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. ICLR (2015).
- Boosting Deep CTR Prediction with a Plug-and-Play Pre-trainer for News Recommendation. In COLING. International Committee on Computational Linguistics.
- Only Encode Once: Making Content-based News Recommender Greener.
- FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction. In AAAI.
- Recommender Systems with Generative Retrieval. arXiv (2023).
- Attention is all you need. NIPS 30 (2017).
- Deep & Cross Network for Ad Click Predictions. In ADKDD (Halifax, NS, Canada) (ADKDD’17). ACM, New York, NY, USA, Article 12, 7 pages.
- Neural News Recommendation with Multi-head Self-attention. In EMNLP-IJCNLP. 6389–6394.
- UserBERT: Pre-training User Model with Contrastive Self-supervision. In SIGIR. 2087–2092.
- NewsBERT: Distilling pre-trained language model for intelligent news application. arXiv (2021).
- Mind: A large-scale dataset for news recommendation. In ACL. 3597–3606.
- Where to go next for recommender systems? id-vs. modality-based recommender models revisited. arXiv (2023).
- Soundstream: An end-to-end neural audio codec. TASLP 30 (2021).
- Deep interest network for click-through rate prediction. In SIGKDD. 1059–1068.
- Open Benchmarking for Click-Through Rate Prediction. In CIKM. 2759–2769.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.