Supervised framework for where-to-unmask decisions in MDLMs
Develop an efficient supervised training framework that directly leverages ground-truth target sequences to learn the where-to-unmask decision policy—i.e., which positions to unmask at each reverse step—for Masked Diffusion Language Models, providing a practical alternative to heuristic confidence measures and reinforcement learning with on-policy rollouts.
References
Thus, an efficient supervised framework that directly leverages ground-truth sequences for training the where-to-unmask decision remains an open challenge.
— Where-to-Unmask: Ground-Truth-Guided Unmasking Order Learning for Masked Diffusion Language Models
(2602.09501 - Asano et al., 10 Feb 2026) in Introduction (Section 1)