Valid statistical inference under synthetic data augmentation
Develop statistical inference procedures for models trained with synthetic data augmentation that yield valid uncertainty quantification by characterizing and propagating both synthesis‑induced randomness and generative‑model error, rather than treating augmented synthetic samples as fixed or equivalent to real observations.
References
However, conducting valid statistical inference under data-augmented approaches remains challenging and largely open, due to the difficulty of characterizing both the randomness introduced by synthetic data and the errors arising from the generative model.
— Harnessing Synthetic Data from Generative AI for Statistical Inference
(2603.05396 - Abdel-Azim et al., 5 Mar 2026) in Section 3.3, Synthetic data-augmented approaches