Galaxy Phase-Space and Field-Level Cosmology: The Strength of Semi-Analytic Models

Published 11 Dec 2025 in astro-ph.CO, astro-ph.GA, and cs.LG | (2512.10222v1)

Abstract: Semi-analytic models are a widely used approach to simulate galaxy properties within a cosmological framework, relying on simplified yet physically motivated prescriptions. They have also proven to be an efficient alternative for generating accurate galaxy catalogs, offering a faster and less computationally expensive option compared to full hydrodynamical simulations. In this paper, we demonstrate that using only galaxy $3$D positions and radial velocities, we can train a graph neural network coupled to a moment neural network to obtain a robust machine learning based model capable of estimating the matter density parameters, $Ω_{\rm m}$, with a precision of approximately 10%. The network is trained on ($25 h^{{-1}$Mpc)$^3$} volumes of galaxy catalogs from L-Galaxies and can successfully extrapolate its predictions to other semi-analytic models (GAEA, SC-SAM, and Shark) and, more remarkably, to hydrodynamical simulations (Astrid, SIMBA, IllustrisTNG, and SWIFT-EAGLE). Our results show that the network is robust to variations in astrophysical and subgrid physics, cosmological and astrophysical parameters, and the different halo-profile treatments used across simulations. This suggests that the physical relationships encoded in the phase-space of semi-analytic models are largely independent of their specific physical prescriptions, reinforcing their potential as tools for the generation of realistic mock catalogs for cosmological parameter inference.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that a GNN-based method using galaxy phase-space data from SAMs accurately infers the matter density parameter (Ωm).
The model, trained exclusively on L-Galaxies, achieved an 8.6% accuracy on test volumes and maintained precision within 12% across varied SAM and hydrodynamical catalogs.
The study indicates that diverse training sets enable robust simulation-based inference, providing a scalable alternative to computationally intensive hydrodynamical simulations.

Field-Level Cosmology with Galaxy Phase-Space: Robustness of Semi-Analytic Model-Based Simulation-Based Inference

Introduction

Galaxy formation modeling remains a bottleneck in precision cosmological inference from large-scale surveys. High-fidelity modeling approaches such as hydrodynamical simulations offer accuracy but are computationally infeasible at the full survey scale and across varying cosmologies or subgrid models. This work addresses whether semi-analytic models (SAMs)—intermediate, physically motivated, and far more tractable prescriptions—provide a robust foundation for simulation-based inference (SBI) of cosmological parameters, specifically $\Omega_{\rm m}$ , when paired with field-level ML tools. The methodology leverages only galaxy 3D positions and line-of-sight velocities—parameters routinely accessible in modern surveys.

Methodology

The primary inference task is prediction of the matter density parameter $\Omega_{\rm m}$ from individual ( $25\ h^{-1}$ Mpc) $^3$ volumes of galaxies, using only their phase-space coordinates. Graph neural networks (GNNs) with a moment neural network (MNN) head are trained on catalogs generated by the L-Galaxies SAM. The graph representation uses galaxies as nodes, position- and distance-based features for edges, and graph-level features encoding galaxy number. Message-passing neural networks refine this representation through multiple layers, with ML hyperparameters tuned via Bayesian optimization. The field-level approach infers parameters directly from the entire distribution of galaxies, sidestepping summary statistics.

Critically, robustness is tested by evaluating the trained GNN on physically and algorithmically diverse galaxy catalogs: other SAMs (GAEA, SC-SAM, Shark) differing in parameterization, merger tree building, and halo identification, and on four distinct hydrodynamical simulations (Astrid, SIMBA, IllustrisTNG, SWIFT-EAGLE, plus Magneticum and a 28-parameter Sobol suite of TNG).

Robustness and Cosmological Parameter Inference

The central result is that the GNN-MNN model, trained on L-Galaxies, infers $\Omega_{\rm m}$ from unseen catalogs with an accuracy of 8.6% on L-Galaxies test volumes, and a precision generally better than 12% across all other SAM and hydrodynamical runs, as measured by the mean relative error.

Figure 1: Truth-minus-inferred values for $\Omega_{\rm m}$ across all test simulation types; error bars show predicted uncertainty. Robustness is evident across both SAM and hydrodynamical tests.

The $\chi^2$ statistic for calibration of predicted uncertainties confirms that the error model is well-calibrated except for SB28 (TNG), and Magneticum, where larger outlier fractions are observed (up to 8.3%). Removal of high-residual catalogs (typically much less than 5% of all cases) brings all performance metrics in line, confirming broad extrapolation capability.

The strong generalization is underpinned by the highly diverse training set from L-Galaxies—both in the dynamical range of galaxy number per volume and the range of galaxy properties realized. Performance is strongest when the domain of the training set fully envelops the diversity of the test data, specifically in terms of galaxy number as a function of $\Omega_{\rm m}$ .

Figure 2: Distribution of galaxy number per volume as a function of $\Omega_{\rm m}$ across all models. The broad range offered by L-Galaxies is crucial for generalization.

Models trained on less diverse SAMs (such as GAEA) exhibit significantly weaker transfer to other models and simulations, confirming the importance of the training set’s cosmology–galaxy distribution coverage.

Full Posterior Inference with Normalizing Flows

A secondary analysis replaces the MNN with a conditional normalizing flow (NF), providing not only mean and variance but the full posterior for $\Omega_{\rm m}$ from each volume. The GNN+NF model achieves similar, but slightly degraded, calibration and coverage compared to GNN+MNN. While the predicted posteriors pass Tests of Accuracy with Random Points (TARP) on most datasets, coverage degradation on some models and significant performance drop for Magneticum and GAEA catalogs highlight the challenge of generative posterior estimation when sample diversity or training set size is limited.

Figure 3: Inferred vs. true $\Omega_{\rm m}$ for the GNN+NF model. Coverage and accuracy degrade in the lower-diversity SAM and some hydrodynamical runs.

Figure 4: TARP coverage test for the GNN+NF; most simulations fall close to the ideal one-to-one credibility-coverage relation.

Theoretical and Practical Implications

A robust, field-level GNN trained solely on a single SAM can extrapolate to predict cosmological parameters across independent SAMs and full hydrodynamical simulations, as long as the training set is sufficiently diverse. The level of robustness obtained here is notable, given the wide variety in subgrid recipes, halo-finder algorithms, and merger tree procedures among test datasets. The GNN is shown to be agnostic to the analytic/numerical treatment of satellites vs. centrals and merger tree implementation.

Practically, this finding is highly significant: SAMs require orders of magnitude less wall-time and computational overhead compared to hydro simulations. They thus offer a concrete path to scalable SBI for current and upcoming massive galaxy surveys (e.g., DESI, Euclid, LSST, SKA). The requirement is that the SAM be run on a parameter space exploration (cosmological and astrophysical) broad enough to encompass all plausible realizations of the observed data, especially for key graph-level summary statistics such as galaxy number density and velocity structure.

Theoretically, this supports the argument that the galaxy phase-space distribution encodes cosmological information robustly—even when the details of baryonic feedback, satellite treatment, or merger histories differ between models. The phase-space relationships captured by the GNN are not superficial correlations but reflect invariant features of cosmic structure formation under a broad range of physical implementations.

The transferability does degrade for extreme out-of-sample conditions (e.g., the more exotic Magneticum or SB28 suite), indicating that robust inference on real data still requires careful validation and expansion of the training set domain, especially when new or unmodeled physics is possible.

Future Directions

This work motivates further development of SAM-based ML inference pipelines, incorporating more varied galaxy catalog features (including additional kinematic and environmental parameters), and explicit inclusion of observational effects (survey geometry, redshift-space distortions, selection biases). Larger sample sizes and marginalization over merger-trees, subgrid physics, and N-body code variations may be necessary for next-generation survey robustness.

Extensions toward fast generative posterior estimation via conditional NFs are promising, but require a denser and more varied simulation training suite for full posterior coverage. Integration with observational systematics through HOD or light-cone mocks remains to be systematically examined.

Conclusion

The results of this work establish that field-level inference of cosmological parameters from galaxy phase-space can be performed with high accuracy using ML models trained on fast, diverse SAMs, and that these models extrapolate robustly across disparate SAMs and hydrodynamical simulations. This makes it computationally tractable to build SBI pipelines for high-dimensional cosmological parameter inference from survey data, as long as the simulation domain matches or exceeds the statistical diversity of the real Universe. Future studies should focus on further expansion of parameter space coverage, rigorous integration of observational effects, and real-data validation.

Reference: "Galaxy Phase-Space and Field-Level Cosmology: The Strength of Semi-Analytic Models" (2512.10222).