Papers
Topics
Authors
Recent
Search
2000 character limit reached

Introducing sapphire: Towards Hybrid Physics-Informed, Data-Driven Modeling of Galaxy Formation

Published 7 Apr 2026 in astro-ph.GA | (2604.06318v1)

Abstract: Semi-analytic models (SAMs) have been treating galaxy populations as dynamical systems for $\gtrsim50$ years, but their evolution equations remain poorly constrained. We introduce sapphire, a modular, automatically differentiable, GPU-accelerated SAM written from scratch in JAX. For the first time, we compute exact Jacobian matrices of our nonlinear differential equations and show that they have interpretable, non-random structures, using the Pandya et al. (2023) physical model as an initial example. Both local and global sensitivity analyses reveal that supernova energy loading is a key astrophysical parameter for galaxy evolution. We use gradient descent and Hamiltonian Monte Carlo (HMC) to perform comprehensive mock parameter recovery tests. These indicate that the z=0 stellar-to-halo-mass relation alone does not contain enough information to infer many astrophysical parameters. Using observations of star-forming galaxies from the MaNGA survey and the Behroozi et al. (2019) empirical model as one baseline, we derive multiple posteriors assuming different combinations of data, including z=0 interstellar medium gas fractions and metallicities. The inferred physical parameters suggest that galaxies self-regulate their star formation primarily through preventative rather than ejective feedback. Both Fisher and HMC forecasts demonstrate the potential of sapphire to enable precision inference for galaxy formation, but more work is needed to expand its library of models. We discuss how our unique blend of differentiability, massive GPU parallelization, numerical robustness and principled Bayesian methods sets the stage for hybrid physics-informed, data-driven discovery of galaxy formation astrophysics and cosmology. We make sapphire publicly available at https://github.com/virajpandya/sapphire.

Summary

  • The paper introduces sapphire, a modular GPU-accelerated framework that integrates physics-informed and data-driven modeling to improve galaxy formation simulations.
  • The paper employs automatic differentiation with robust ODE solvers in JAX to perform high-resolution sensitivity analyses and systematically explore high-dimensional parameter spaces.
  • The paper demonstrates that incorporating multiple observables breaks degeneracies in key feedback parameters, enabling precise estimation of energy and mass loading factors.

Hybrid Physics-Informed, Data-Driven Modeling of Galaxy Formation with Sapphire

Introduction and Motivation

The study introduces sapphire, a modular, fully differentiable, GPU-accelerated semi-analytic model (SAM) framework designed for galaxy formation and evolution, implemented in JAX. The motivation is rooted in longstanding challenges facing both empirical and fully physical models, including the limited interpretability and causal identifiability of hydrodynamical simulations and the incomplete connection to underlying astrophysics in empirical models. Sapphire addresses these by enabling interpretable, gradient-accessible, and scalable population modeling in a Bayesian setting, with the ultimate goal of unifying SAM and simulation-based approaches to galaxy formation within a hybrid, physics-informed, data-driven paradigm. Figure 1

Figure 1: Schematic overview of sapphire, emphasizing its identification of universal dynamical ingredients, modular architecture, and pathways for hybrid data-driven corrections to governing equations.

Framework Architecture and Numerical Robustness

Sapphire is architected for modern multi-GPU infrastructure and employs automatic differentiation throughout the numerical solution of galaxy evolution ODEs. Core dynamical elements include selection and interpolation of dark matter halo merger trees, parallel ODE evolution for ensemble galaxies, differentiable kernel regression for producing scaling relations, and explicit computation of likelihoods and their parameter gradients. Figure 2

Figure 2: Flowchart of the sapphire population evolution pipeline, detailing the interface between numerics, cosmology, astrophysical parameter inference, and Bayesian optimization.

Robustness of the autodiff pipeline is maintained with validated ODE solvers (Tsit5/Bosh3), rigorous convergence and stability diagnostics, and batch vectorization across large halo ensembles. This enables efficient global exploration of high-dimensional parameter spaces and sensitivity landscapes.

Sensitivity and Identifiability Analysis

Leveraging exact Jacobians and Hessians through autodiff, sapphire enables for the first time interpretable, high-resolution sensitivity analyses of galaxy evolutionary outcomes to model parameters (wind mass, energy,โ€‰and metal loading; ISM depletion time). Individual halo Jacobians reveal non-random, astrophysically interpretable structures:

  • Energy loading amplitude (AEA_E) consistently dominates the sensitivity of all z=0z=0 state variables.
  • Mass loading has weaker, often sign-opposite influence compared to energy loading.
  • Sensitivities are robust across parameter space and wide halo mass ranges, disrupted only by extrema or imposed parameter clipping. Figure 3

    Figure 3: Example Jacobian showing the dominance of energy loading parameter gradients in determining z=0z=0 state variables for a MW-scale halo.

    Figure 4

    Figure 4: Fractional parameter sensitivity across halo mass and parameter space, with AEA_E consistently dominating except where physically clipped.

Parameter Inference and Information Content of Observables

Mock recovery experiments and Bayesian inference on observational data setsโ€”stellar-massโ€“halo-mass relation (SMHM), ISM-to-stellar-mass ratios (fISMf_{\rm ISM}), and ISM massโ€“metallicity relation (MZR)โ€”show that:

  • The SMHM relation alone underdetermines key feedback and star formation parameters due to degeneracies and saddle points.
  • Addition of fISMf_{\rm ISM} and MZR progressively breaks degeneracies, localizes the posteriors, and enables recovery of meaningful, tight parameter constraints (e.g., AEA_E, tdept_{\rm dep}).
  • In mock tests, parameter precision scales nearly linearly with observational uncertainty, permitting percent-level constraints with high-quality data. Figure 5

    Figure 5: Mock parameter recovery comparisons show the improvement in identifiability with the number and combination of observational constraints.

    Figure 6

    Figure 6: Posterior predictive checks demonstrate constraint tightening and modelโ€“data compatibility as more observables are included in the inference.

    Figure 7

    Figure 7: Joint posteriors of astrophysical parameters, highlighting tight constraints on AEA_E and improved localization when all constraints are employed.

Astrophysical Implications and Robustness

Posterior parameter estimates indicate:

  • Low mass loading (ฮทMโ‰ฒ1\eta_M\lesssim1 at MW scale and z=0z=00 in dwarfs) and high energy loading (z=0z=01โ€“z=0z=02) are favored, implying that feedback in galaxies is predominantly preventative rather than ejective. This suggests suppression of gas accretion and CGM over-pressurization are critical to self-regulation of star formation.
  • Systematically shifting or perturbing input scaling relations clearly maps to interpretable changes in the inferred feedback (e.g., increasing SMHM normalization leads to lower z=0z=03 and higher z=0z=04; Figure 8). Figure 8

Figure 8

Figure 8: Direct illustration of how systematic shifts in the SMHM relation are traced by shifts in the inferred feedback parameters.

Forecasts with further reduced observational uncertainties show the framework's potential for achieving sub-percent precision in key astrophysical parameters (Figure 9).

Predictive and Out-of-Sample Performance

Posterior predictive checks for observables not used in fitting (z=0z=05 and z=0z=06 scaling relations, SFMS, CGM thermodynamic properties) indicate:

  • The framework predicts the normalization and scatter of the SFMS at z=0z=07 with high fidelity, reproduces variations in feedback loading factors across mass and halo history, and captures the diversity of CGM states.
  • At z=0z=08, the model produces higher SMHM and lower SFRs/MZRs than empirical data, signaling the need for further extensions (e.g., explicit redshift dependence, more complex functional forms, or incorporation of black hole feedback and environmental processes). Figure 10

    Figure 10: Posterior predictive distributions for additional quantities, showcasing model generalization and areas where further model development is required.

Theoretical and Practical Implications

Sapphire's architecture and methodology carry significant implications:

  • Automatic differentiation through dynamical population models unlocks systematic, interpretable local/global sensitivity analysis previously inaccessible in the field.
  • The physically motivated, modular framework is extensible to additional processes (satellite and black hole feedback, environmental effects, non-thermal CGM physics), and can serve as a testbed for hybrid approaches (e.g., neural ODE-based data-driven model correction).
  • Sapphire's differentiable infrastructure lays foundational groundwork for implementing automated evidence calculations and hierarchical Bayesian inference across library modelsโ€”a necessity for future population-level astrophysics in the era of massive cosmological data sets.

Conclusion

Sapphire defines a new quantitative standard for interpretable, physics-informed inference on the evolution of galaxy populations. The demonstrated ability to compute exact gradients through population-scale dynamical models, access sensitivity and uncertainty landscapes, and robustly infer feedback and star formation parameters from multi-observable constraints marks a substantial advancement for semi-analytic modeling. Its modularity and extensibility provide a scalable path forward for integrating simulation-based priors, further physical process complexity, and data-driven discovery for both astrophysical and cosmological inference (2604.06318).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.