Papers
Topics
Authors
Recent
Search
2000 character limit reached

Error Propagation and Model Collapse in Diffusion Models: A Theoretical Study

Published 18 Feb 2026 in stat.ML and cs.LG | (2602.16601v1)

Abstract: Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. This allows us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. We also provide empirical results on synthetic data and images to illustrate the theory.

Summary

  • The paper demonstrates that error accumulation in recursive training can lead to model collapse if fresh data is insufficiently incorporated.
  • It introduces theoretical upper and lower divergence bounds using Girsanov’s theorem and observability coefficients to quantify score estimation errors.
  • Empirical validation on Gaussian mixtures and CIFAR-10 highlights the critical role of the fresh data fraction (α) in maintaining model stability.

Error Propagation and Model Collapse in Diffusion Models: A Theoretical Study

Introduction and Problem Setting

Machine learning systems increasingly leverage synthetic data, especially in generative model pipelines. A prominent failure mode—termed model collapse—is observed when a generative model is recursively trained on its own outputs: distributional mass drifts toward high-density cores while diversity and fidelity erode. While prior theoretical work on recursive training focused on regression or maximum-likelihood estimators, a comprehensive quantitative analysis for score-based diffusion models remained unavailable.

This work presents a rigorous analysis of error propagation and model collapse in recursively trained score-based diffusion models, where each round of training incorporates both synthetic and a fraction α\alpha of fresh data sampled from the true data distribution. The central quantities tracked are:

  • Accumulated divergence: Di=χ2(p^idata)D_i = \chi^2(\hat{p}^i \| \mathrm{data}), measuring model drift from the target distribution at generation ii,
  • Intra-generation divergence: Ii=χ2(p^i+1qi)I_i = \chi^2(\hat{p}^{i+1} \| q_i), quantifying divergence induced by one training round, where qi=αdata+(1α)p^iq_i = \alpha\, \mathrm{data} + (1-\alpha)\, \hat{p}^i.

The propagation of score estimation errors and their impact on model collapse is characterized using pathwise statistics induced by the diffusion processes. This analysis clarifies how model collapse is mitigated by fresh data and exacerbated by error accumulation.

Theoretical Framework

Recursive Training Dynamics

Each generation begins with a mixture of fresh and synthetic samples. Training a score-based diffusion model on qiq_i gives a new model p^i+1\hat{p}^{i+1}, which again partakes as synthetic data in subsequent rounds. This recursion is expressed as: p^iqi=αdata+(1α)p^ip^i+1\hat{p}^i \rightarrow q_i = \alpha\,\mathrm{data} + (1-\alpha)\hat{p}^i \rightarrow \hat{p}^{i+1}

The underlying training target becomes the score of qiq_i, not the true data distribution, introducing a structural misalignment that is exacerbated by imperfect score estimation.

Divergence Bounds: Upper and Lower

The intra-generational divergence IiI_i is tightly bounded by the pathwise energy of the score error. Two critical results arise:

  • Upper Bound (via Girsanov's theorem):

KL(p^i+1qi)12ε^i2\mathrm{KL}(\hat{p}^{i+1} \| q_i) \leq \frac{1}{2}\hat{\varepsilon}_i^2

where ε^i2\hat{\varepsilon}_i^2 is the pathwise L2L^2 energy of the score error along the learned process.

  • Lower Bound (with observability):

χ2(p^i+1qi)18ηiε,i2\chi^2(\hat{p}^{i+1} \| q_i) \geq \frac{1}{8}\eta_i \varepsilon_{\star,i}^2

Critically, the observability coefficient ηi[0,1]\eta_i \in [0,1] measures the fraction of pathwise error that leaves a statistical imprint at the endpoint. ηi\eta_i is typically nonzero in practical parametric models with state-dependent error.

The two-sided control is formalized as: c1ηiε,i2χ2(p^i+1qi)c2ε,i2c_1\,\eta_i\,\varepsilon_{\star,i}^2 \leq \chi^2(\hat{p}^{i+1} \| q_i) \leq c_2\,\varepsilon_{\star,i}^2 in the perturbative regime where score error is small.

Intergenerational Error Accumulation

The effect of the fresh data fraction α\alpha is to contract model divergence at each generation by (1α)2(1-\alpha)^2, while the newly introduced score error increases divergence: Di+1=(1α)2Di+(innovation due to score error)D_{i+1} = (1-\alpha)^2 D_i + \text{(innovation due to score error)} Closed-form analysis reveals:

  • If iε,i2=\sum_i \varepsilon_{\star,i}^2 = \infty, then accumulated divergence never vanishes—model collapse is inevitable.
  • If iε,i2<\sum_i \varepsilon_{\star,i}^2 < \infty, the accumulated divergence remains uniformly bounded.

The long-term divergence, after NN generations, admits a discounted sum structure: DN+1k=0N(1α)2(Nk)ε,k2D_{N+1} \asymp \sum_{k=0}^N (1-\alpha)^{2(N-k)} \varepsilon_{\star,k}^2 Errors from past generations are exponentially forgotten, with rate determined by α\alpha.

Numerical Experiments and Empirical Validation

Synthetic Data: Gaussian Mixture

Experiments with 10-dimensional Gaussian mixtures validate the theory. Low α\alpha (little fresh data) leads to rapid divergence: Figure 1

Figure 1: Samples from recursively trained models shown via PCA on a 10D Gaussian mixture; columns increase in α\alpha from left to right, rows progress through generations. Low α\alpha exhibits fast dispersal/collapse, while high α\alpha maintains stability.

Correspondingly, the intra-generational error bounds and the intergenerational accumulation law are empirically tight: Figure 2

Figure 2: Empirical validation of intra-generational error upper/lower bounds, supporting the tightness of the theoretical predictions.

Figure 3

Figure 3: Two-sided control of intra-generation divergence, showing close agreement between theoretical and observed χ2\chi^2 and KL divergences as functions of score error energy.

Figure 4

Figure 4: Memory heatmap visualizes the geometrically-discounted influence of errors from previous generations; sharp diagonal for high α\alpha (short memory), wide band for low α\alpha (long memory and more persistent collapse).

Key finding: For high α\alpha (e.g., α=0.9\alpha = 0.9), distributional drift is nearly eliminated, and divergence is stable across many recursive generations.

Observability of Score Error

Controlled experiments on CIFAR-10 show observability coefficient ηi\eta_i is consistently nonzero for state-dependent perturbations: Figure 5

Figure 5: Observability coefficients for several classes of perturbations in CIFAR-10, confirming that state-dependent errors are more ‘visible’ and thus lead to statistically significant divergence.

Figure 6

Figure 6: Observability coefficient η^i\hat{\eta}_i remains nonzero across generations (10D Gaussian Mixture), confirming persistent error visibility and hence the relevance of lower divergence bounds.

Visual Effects of Model Collapse

The visual impact of collapse under recursive training is apparent in sample quality and diversity: Figure 7

Figure 7: Random samples over generations under three α\alpha rates in a recursive pipeline; low α\alpha leads to rapid mode collapse, high α\alpha maintains diversity.

Implications and Outlook

The theoretical construction and empirical results make several strong contributions:

  • Provable divergence lower bounds for diffusion models via score error observability, demonstrating that error is not hidden but statistically manifest.
  • Identification of a discounted memory principle: geometric forgetting of past errors with rate set by the fresh data fraction α\alpha.
  • Contradicts the naive hypothesis that bounded per-round error always suffices for stability; accumulation can overwhelm contraction if errors are not summable.

Practical implications include principled selection of α\alpha to prevent collapse and direct estimation of safe training horizons given per-generation error statistics. The observability framework generalizes to more realistic models and high-dimensional settings, as confirmed with image datasets.

Theoretical implications include insight into structural sources of collapse, the role of conditional independence and state dependence in error propagation, and the importance of pathwise statistics.

Conclusion

This study establishes rigorous, quantitative links between pathwise score estimation error, error visibility (observability), and model collapse in recursive diffusion model training. The results precisely characterize the interplay between fresh data injection and unavoidable error accumulation, with empirical validation across both synthetic and real data domains. Open questions include analyzing large-error regimes, discrete-time implementations, and characterizing ultimate model fixed points under the recursive process. This framework provides a robust foundation for future developments in the reliable self-improvement of generative models and recursive pipelines. Figure 8

Figure 8: Observability coefficients on CIFAR-10 show stability across generations and further corroborate nonzero projection of error energy onto the output distribution.

Figure 9

Figure 9

Figure 9

Figure 9: Additional visualizations of recursive training at α=0.1\alpha=0.1 emphasize rapid collapse without sufficient data refresh.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 11 likes about this paper.