Papers
Topics
Authors
Recent
Search
2000 character limit reached

Why mask diffusion does not work

Published 29 Sep 2025 in cs.LG, cs.AI, and cs.CL | (2510.03289v1)

Abstract: The main advantages of diffusion LLMs over autoregressive (AR) models lie in their ability to support parallel generation and bidirectional attention, enabling a more controllable generation process. In recent years, open-source mask diffusion LLMs have emerged, most of which are based on a variant known as absorbing diffusion. However, this paper demonstrates why mask diffusion faces inherent difficulties in achieving parallel generation and bidirectional attention. We also propose the most effective training and inference strategies for mask diffusion.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 198 likes about this paper.

alphaXiv

  1. Why mask diffusion does not work (9 likes, 0 questions)