Masked Diffusion Generative Recommendation

Published 27 Jan 2026 in cs.IR | (2601.19501v1)

Abstract: Generative recommendation (GR) typically first quantizes continuous item embeddings into multi-level semantic IDs (SIDs), and then generates the next item via autoregressive decoding. Although existing methods are already competitive in terms of recommendation performance, directly inheriting the autoregressive decoding paradigm from LLMs still suffers from three key limitations: (1) autoregressive decoding struggles to jointly capture global dependencies among the multi-dimensional features associated with different positions of SID; (2) using a unified, fixed decoding path for the same item implicitly assumes that all users attend to item attributes in the same order; (3) autoregressive decoding is inefficient at inference time and struggles to meet real-time requirements. To tackle these challenges, we propose MDGR, a Masked Diffusion Generative Recommendation framework that reshapes the GR pipeline from three perspectives: codebook, training, and inference. (1) We adopt a parallel codebook to provide a structural foundation for diffusion-based GR. (2) During training, we adaptively construct masking supervision signals along both the temporal and sample dimensions. (3) During inference, we develop a warm-up-based two-stage parallel decoding strategy for efficient generation of SIDs. Extensive experiments on multiple public and industrial-scale datasets show that MDGR outperforms ten state-of-the-art baselines by up to 10.78%. Furthermore, by deploying MDGR on a large-scale online advertising platform, we achieve a 1.20% increase in revenue, demonstrating its practical value. The code will be released upon acceptance.

Abstract PDF Upgrade to Chat

Summary

The paper proposes a masked diffusion framework that redefines semantic ID generation through parallel codebook construction and adaptive noise scheduling.
It introduces a two-stage inference strategy with a warm-up phase and parallel denoising, significantly reducing inference complexity and improving ranking metrics.
Empirical results demonstrate up to 10.78% NDCG improvement and notable revenue gains, underscoring its practical impact on scalable, AI-driven recommender systems.

Masked Diffusion Generative Recommendation: A Technical Summary

Background and Motivation

Generative recommendation (GR) paradigms have recently shifted towards leveraging semantic ID (SID) quantization, mapping item content into discrete token representations optimized for both scalability and retrieval efficiency. Traditional autoregressive GR models, directly inherited from language modeling architectures, employ hierarchical codebooks and sequential token decoding, resulting in several key limitations: suboptimal modeling of global dependencies among item attributes, inability to adapt the generation path to heterogeneous user preference structures, and significant inefficiencies during inference. Single-step parallel decoding mitigates some global consistency issues, but lacks mechanisms for progressive semantic refinement and order-agnostic generation.

Masked Diffusion GR: Framework Overview

MDGR proposes a fundamentally different approach, framing SID generation as a masked diffusion process over parallel codebooks. This enables bidirectional modeling, flexible order-free generation, and multi-step semantic refinement. The system is decomposed into three principal components:

Parallel Codebook Construction: Item embeddings are projected into independent subspaces and quantized via optimized product quantization (OPQ), yielding multi-token SIDs disentangled across distinct semantic dimensions.
Training Regime: Training uses curriculum-inspired noise scheduling across time (with gradually increased masking ratio) and history-aware mask allocation across samples (preferring semantically challenging or user-rare tokens), in conjunction with encoder–decoder architectures supporting bidirectional attention.
Inference Procedure: A novel two-stage decoding strategy initializes generation via a warm-up phase (stabilizing key semantic anchors one at a time), followed by high-confidence parallel denoising steps. This is further enhanced with beam search over multiple codebooks to jointly propose top- $B$ candidates per user.
Figure 1: The MDGR framework: OPQ-based parallel codebook, curriculum/historic training, and two-stage inference with beam search for SID generation.

Codebook and Generation Paradigms

Parallel SID codebooks, central to MDGR, ensure each token reflects independent semantic content rather than residual hierarchies. This architecture supports arbitrary token filling orders and simultaneous denoising, essential for capturing bidirectional dependencies among attributes.

Figure 2: Item multimodal features are quantized into SIDs; masked diffusion GR (d) flexibly fills multiple positions in parallel, unlike autoregressive (b) or single-step (c) decoders.

Masked Diffusion Training Process

MDGR models SID generation as a discrete-time Markov chain, incrementally corrupting SID tokens via adaptive masking rates and learning a conditional denoising distribution. Global curriculum noise scheduling modulates instance difficulty over training steps, while history-aware masking allocates corruption preferentially to semantically infrequent dimensions for the user. Difficulty-aware embeddings are introduced to stabilize optimization with a direct signal about corruption intensity.

This regime ensures the model progressively learns to reconstruct increasingly complex partial SIDs, improving generalization and robustness. The masked diffusion objective is a cross-entropy over uncensored tokens, indexed by both temporal noise schedules and sample-wise semantic difficulty.

Figure 3: (a) Influence of $\gamma$ on global curriculum difficulty; larger $\gamma$ results in delayed noise ramp-up. (b) Empirical dynamics of mask count distributions as training advances for $\gamma=2$ .

Efficient Inference: Warm-Up and Parallel Decoding

Efficient inference in MDGR starts from a fully masked SID and proceeds in two stages. The warm-up phase generates one token per step, always updating the most confident position as determined by the model. Once key dimensions (“anchors”) are filled, subsequent steps jointly fill $m_{par}$ tokens in parallel, each selected with respect to top confidence scores. Parallel beam search extends candidate paths combinatorially, but maintains manageable computational cost due to reduced step count and bidirectional attention.

MDGR empirically reduces inference complexity to approximately $\frac{R}{L}$ of autoregressive GR, with $R$ as the total diffusion steps and $L$ as SID length, while maintaining or improving ranking metrics.

Ablation, Efficiency Trade-off, and Hyperparameter Analysis

Extensive ablation studies highlight the criticality of parallel codebooks and adaptive noise schedules, with fully random masking regimes (standard MDM) showing up to 3.4%–3.2% degradation across metrics. Removal of confidence-guided position selection, history-aware mask allocation, or difficulty embeddings each incurs consistent drops.

During inference, increasing the parallel positions decoded per step ( $m_{par}$ ) accelerates query rate (QPS), but overly aggressive parallelization can degrade top- $K$ recall due to compromised search space coverage under fixed beam widths. The most effective regime uses $R_{warm}=4$ warm-up steps and $m_{par}=2$ for the industrial dataset.

Figure 4: (a) Recall and (b) NDCG as a function of curriculum exponent $\gamma$ ; moderate $\gamma$ yields optimal ranking performance.

Empirical Performance and Deployment

MDGR substantially surpasses ten state-of-the-art baselines (both discriminative and generative), delivering up to 10.78% improvement in top- $K$ NDCG and recall on real-world industrial datasets. Online A/B tests in a major e-commerce advertising platform report 1.20% revenue uplift, 3.69% increase in gross merchandise volume (GMV), and 2.36% improved click-through-rate (CTR).

Practical and Theoretical Implications

MDGR provides an effective paradigm for scalable, context-sensitive, and high-efficiency recommendation generation. By removing hierarchical generation constraints, adapting corruption to user preference heterogeneity, and exploiting parallel refinement with bidirectional attention, the architecture sets a new technical direction for SIDs in recommendation retrieval. The findings suggest that masked diffusion models, with task-specific noise schedules and codebook designs, generalize more robustly and efficiently than autoregressive alternatives for multi-attribute item generation.

From a theoretical perspective, this work strengthens the link between discrete diffusion models and efficient, globally consistent conditional generation in structured domains. Future avenues include refined noise scheduling, adaptive codebook expansion, and cross-modal alignment strategies to further enhance semantic expressivity and retrieval quality.

Conclusion

Masked Diffusion Generative Recommendation (MDGR) marks a significant advance in the generative recommendation landscape by introducing a fully parallel, curriculum- and history-adaptive masked diffusion approach for SID generation. Comprehensive evaluations demonstrate strong improvements in both recommendation accuracy and operational efficiency, supported by robust ablation analyses and large-scale online deployment. The masked diffusion approach described provides a promising foundation for next-generation generative retrieval strategies in AI-driven recommender systems (2601.19501).