- The paper proposes a one-stage DAS method that integrates quantization and collaborative filtering alignment to preserve semantic integrity in recommendations.
- The methodology uses modules like UISM, ICDM, and MDAM to extract, debias, and align multi-modal embeddings, improving token coherence.
- Extensive experiments demonstrate significant gains in AUC, UAUC, and GAUC, with industrial deployment at Kuaishou boosting CTR and eCPM, notably in cold-start scenarios.
DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System
Introduction
The paper "DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System" (2508.10584) addresses prominent limitations inherent in traditional recommendation systems by proposing a novel framework—Dual-Aligned Semantic IDs (DAS). Semantic IDs, discrete identifiers derived from quantizing embeddings of Multi-modal LLMs (MLLMs), offer substantial advantages in integrating multi-modal content within recommendation systems but often remain misaligned with collaborative filtering (CF) signals, consequently hindering recommendation efficacy. Recent approaches have leveraged multi-stage alignment mechanisms, which often incur significant information loss and reduce alignment flexibility. This paper introduces a one-stage DAS method, integrating quantization and alignment processes, fostering improved alignment precision, and preserving semantic integrity with collaborative signals.
Figure 1: Comparison of Semantic IDs construction. (1) No-Aligned, (2) Two-Stage Aligned and (3) Ours: One-Stage Dual-Aligned.
Methodology
DAS innovatively amalgamates quantization and alignment processes using a one-stage framework and is composed of three primary modules: User and Item Semantic Model (UISM), ID-based CF Debias Model (ICDM), and Multi-view Dual-Aligned Mechanism (MDAM). The UISM extracts and quantizes multi-modal semantic embeddings into hierarchical Semantic IDs using RQ-VAE, simultaneously reducing the risk of codebook collapse by employing K-means clustering for initialization. The ICDM module focuses on eliminating biases within CF signals—such as popularity and conformity—employing disentangling domain adaptation networks to enhance alignment with semantic models. Lastly, MDAM maximizes mutual information between semantic IDs and collaborative representations via three contrastive alignment strategies: dual user-to-item (u2i), dual item-to-item/user-to-user (i2i/u2u), and dual co-occurrence item-to-item/user-to-user (i2i/u2u).
Figure 2: The implementation of DAS. UISM module leverages the RQ-VAEs in quantization process, ICDM module uses a disentangled debiasing network to obtain unbiased CF representations, and during the co-training process of UISM and ICDM, alignment between the CF and Semantic IDs is achieved through MDAM module.
Results and Discussion
Extensive offline experiments demonstrate that DAS consistently achieves superior accuracy across various settings, significantly outperforming conventional two-stage alignment methods. DAS achieves remarkable improvements in AUC, UAUC, and GAUC through an efficient framework, integrating quantization methods with collaborative filtering signals leading to superior semantic representation alignment. Of particular note is DAS's capability to enhance semantic token coherence, resulting in improved model performance across both cold-start and regular recommendation scenarios.



Figure 3: In-depth analysis results of DAS. (a) and (b) present the vector retrieval evaluation results for the MDAM module, while (c) and (d) illustrate the code assignment distribution, revealing the performance of the learned codebook.
Deployment of DAS within Kuaishou demonstrates its industrial applicability, comprehensively enhancing CTR prediction accuracy, yielding substantial gains in eCPM—up to 8.98%—particularly in cold-start scenarios. This deployment encompasses tens of millions of users, illustrating DAS's scalability and business value.
Figure 4: The online deployment pipeline of DAS at Kuaishou.
Conclusion
The paper introduces a robust one-stage framework—Dual-Aligned Semantic IDs (DAS)—that optimizes alignment between semantic and collaborative filtering representations within recommendation systems. DAS's innovative integration of quantization and alignment processes minimizes semantic integrity loss, ensuring enhanced mutual information between embeddings. Such improvements are validated through extensive offline and online experiments, showcasing significant predictive performance improvements and practical application scalability. The deployment across various advertising scenarios at Kuaishou cements its strong business value, discerning DAS as a pivotal advancement in recommendation systems, aligning multi-modal representation learning with collaborative filtering accuracy effectively.