Determining the contribution weight of residual information during denoising

Ascertain a principled scheme for weighting the contribution of residual information relative to mask embeddings when constructing input embeddings for masked positions during denoising in diffusion large language models.

Background

RCD forms residual vectors by a probability-weighted sum over the embedding codebook and interpolates these residuals with mask embeddings for masked positions. A central design question is how much residual information should contribute to the next-step input embeddings.

The paper frames this as a question yet to be solved and then proposes normalized Shannon entropy as a principled weighting, further refined via temperature scaling during inference to align distributions across training and inference.

References

Hence, the only question yet to be solved is to determine how much should residual information contribute.

Residual Context Diffusion Language Models  (2601.22954 - Hu et al., 30 Jan 2026) in Section 4.1 (Entropy Weighted Residual)