Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Published 28 May 2025 in cs.CL, cs.AI, and cs.LG | (2505.22165v1)

Abstract: Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using categorical distributions, allowing for different diffusion progress across tokens but lacking fine-grained control. Continuous diffusion models map tokens to continuous spaces and apply fine-grained noise, but the diffusion progress is uniform across tokens, limiting their ability to capture semantic nuances. To address these limitations, we propose \textbf{\underline{N}}on-simultan\textbf{\underline{e}}ous C\textbf{\underline{o}}ntinuous \textbf{\underline{Diff}}usion Models (NeoDiff), a novel diffusion model that integrates the strengths of both discrete and continuous approaches. NeoDiff introduces a Poisson diffusion process for the forward process, enabling a flexible and fine-grained noising paradigm, and employs a time predictor for the reverse process to adaptively modulate the denoising progress based on token semantics. Furthermore, NeoDiff utilizes an optimized schedule for inference to ensure more precise noise control and improved performance. Our approach unifies the theories of discrete and continuous diffusion models, offering a more principled and effective framework for text generation. Experimental results on several text generation tasks demonstrate NeoDiff's superior performance compared to baselines of non-autoregressive continuous and discrete diffusion models, iterative-based methods and autoregressive diffusion-based methods. These results highlight NeoDiff's potential as a powerful tool for generating high-quality text and advancing the field of diffusion-based text generation.

Abstract PDF Upgrade to Chat

Summary

The paper introduces NeoDiff, which bridges discrete and continuous diffusion by employing a novel Poisson-based forward process and a dual time framework for fine-grained control.
It leverages a bi-temporal approach with extrinsic and intrinsic times, along with a dynamic time predictor and Bayesian optimization, to enhance the contextual recovery of text.
Evaluations show NeoDiff outperforms traditional models in NLP tasks like translation, paraphrasing, and text simplification, balancing improved quality with manageable complexity.

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Introduction

The paper introduces a new diffusion model, NeoDiff, designed to improve text generation by integrating strengths from both discrete and continuous diffusion models. Discrete models independently apply categorical distributions to tokens, enabling variable diffusion progress but lacking in fine-grained control. Conversely, continuous models provide fine-grained noise control by operating in continuous spaces but apply uniform diffusion across tokens, limiting their contextual effectiveness. NeoDiff bridges these paradigms through a non-simultaneous continuous diffusion process, employing a novel Poisson-based noising strategy.

Figure 1: Comparison of the noising paradigms employed by Non-simultaneous Continuous Diffusion and two other diffusion models.

Unified Diffusion Framework

NeoDiff is built around a bi-temporal framework with extrinsic time $t$ (global sentence progression) and intrinsic time $\tau$ (token-specific progression). This separation facilitates the introduction of independent, fine-grained control over diffusion, allowing for context-aware text generation.

NeoDiff introduces a Poisson process for the forward diffusion, supporting a variable, fine-grained token noise schedule. For the reverse process, a time predictor adaptable to token semantics modulates denoising, enhancing the contextual recovery of text.

Figure 2: An overview of NeoDiff.

Implementation and Evaluation

A key component of NeoDiff is its time predictor, which dynamically estimates a token's intrinsic time $\tau$ . This predictor is trained using pseudo-labels derived from a combination of predicted text quality scores and rank-based transformations. The extrinsic time schedule is optimized post-training through Bayesian optimization, offering task-specific enhancements in generation quality.

NeoDiff demonstrates superior performance on multiple NLP tasks, including translation, paraphrasing, and text simplification, consistently outperforming non-autoregressive and autoregressive baselines. Its fine-grained control mechanism proves significantly effective in producing high-fidelity text generation results.

Improved Processes

The introduction of a Poisson-based forward process allows multi-token noising to capture more complex semantic structures. By integrating intrinsic time $\tau$ , NeoDiff can refine individual token noise while maintaining contextual coherence across the sentence.

The context-aware reverse process utilizes learned token-level noise distributions to better guide generation, leveraging information from less noisy tokens to refine more corrupted ones.

Trade-offs and Considerations

NeoDiff’s integration of discrete and continuous diffusion models improves text generation quality but introduces complexity, manifesting as increased model parameters due to the time predictor. Nevertheless, the computational overhead is offset by task-specific calibration achieved through the Bayesian optimization of the time schedule.

Inference speed and memory usage of NeoDiff remain competitive when juxtaposed with similar models, thanks to its efficient parallel decoding strategies and task-specific optimizations.

Conclusion

NeoDiff represents a significant advancement in diffusion-based text generation by effectively unifying discrete and continuous approaches. The model’s design, targeting fine-grained control at the token level, and the innovative Poisson-based forward process, equips it with superior capabilities across diverse text generation challenges, setting a new benchmark in the field.

Markdown Report Issue