Inferential frameworks that propagate synthesis uncertainty

Develop general statistical frameworks for downstream estimation and inference that explicitly model and propagate uncertainty introduced by synthetic data generation, ensuring valid uncertainty quantification when synthetic observations are used alongside or in place of real data.

Background

Synthetic data are produced by estimated generative models and thus introduce additional sources of variability and bias beyond classical sampling noise. Most existing approaches that treat synthetic observations as equivalent to real data systematically misestimate uncertainty.

The paper calls for general inferential frameworks that explicitly account for this synthesis‑induced uncertainty, noting that such methods remain largely undeveloped despite growing reliance on synthetic data in downstream analysis.

References

Developing general inferential frameworks that explicitly account for synthesis uncertainty in downstream estimation and inference, therefore, remains an important open problem.

Harnessing Synthetic Data from Generative AI for Statistical Inference  (2603.05396 - Abdel-Azim et al., 5 Mar 2026) in Section 4, Uncertainty Propagation from Data Synthesis