Training deep generative models of galaxy images without reliable segmentation

Develop a viable training methodology for deep generative models of galaxy images (e.g., approaches such as Regier et al. 2015; Lanusse et al. 2021; Smith et al. 2022) that does not require access to segmented galaxy images, which are unavailable or unreliable in practice without an accurate galaxy model.

Background

In discussing future directions for reducing model misspecification, the paper reviews prior work on deep generative models for galaxy image simulation and notes that such models may offer greater flexibility than simulators like GalSim. However, supervised training typically requires segmented galaxy images, which are difficult to obtain reliably without already having an accurate galaxy model.

This creates a chicken-and-egg problem for adopting deep generative galaxy models in practice: the lack of trustworthy segmentation limits training, while the need for better models motivates using such generative approaches. The authors explicitly state that it is unclear how to proceed with training under these constraints, framing a concrete open issue for the community.

References

There has been work developing deep generative models of galaxy images, and indeed, the models may be less misspecified \citep{regier2015deep, lanusse2021deep, smith2022realistic}. However, it is unclear how to train these models as training requires access to segmented galaxy images, which cannot be reliably found without an accurate galaxy model.

Neural Posterior Estimation for Cataloging Astronomical Images from the Legacy Survey of Space and Time  (2510.15315 - Duan et al., 17 Oct 2025) in Section 6.4, A path forward: nonparametric modeling