Optical-HI Joint Mock Catalogs

Updated 18 January 2026

Optical-HI joint mock catalogs are synthetic datasets that merge optical properties and HI parameters using empirical scaling relations and simulations.
They employ methods like light-cone painting and conditional Schechter sampling to create self-consistent, cross-matched galaxy samples across large survey volumes.
These catalogs support survey optimization, Bayesian inference, and validation of scaling relations for upcoming HI and optical surveys despite challenges in low-mass completeness.

Optical-HI joint mock catalogs are synthetic datasets that assign both optical properties (such as luminosities, colors, and morphologies) and neutral hydrogen (HI) parameters (21 cm fluxes, HI masses, line widths) to the same galaxy population over cosmological volumes or light cones. These catalogs support inference and optimization for next-generation surveys, including those targeting SKA, MeerKAT, ASKAP, and pathfinders. The construction methodologies, conditional distributions, selection functions, and validation metrics for these catalogs are founded on empirical scaling relations, large-volume simulations, and observed galaxy statistics, enabling robust predictions of optical-HI cross-matched samples, HI mass functions, and environmental dependencies (Obreschkow et al., 2014, Bharti et al., 11 Jan 2026, Cunnington et al., 2018, Paranjape et al., 2021, Li et al., 2022).

1. Foundations and Construction Methodologies

Optical-HI joint mocks are generated using various approaches:

Light-cone painting and HI Schechter sampling: Galaxies are assigned comoving positions across a mock survey volume, with HI masses drawn from a Schechter function fit to empirical HI mass functions (e.g., ALFALFA: $\alpha\approx-1.01$ , $\log M^*_{\rm HI}/M_\odot\approx9.8$ ) (Obreschkow et al., 2014, Bharti et al., 11 Jan 2026).
N-body + semi-analytic galaxy assignment: Underlying halo catalogs from cosmological simulations inform Halo Occupation Distribution (HOD)-based placements of centrals and satellites, upon which optical properties and HI masses are assigned using scaling relations conditional on galaxy and halo properties (Paranjape et al., 2021).
Direct optical-to-HI mapping: Empirical estimators relate $\log_{10}(M_{\rm HI}/M_*)$ linearly to optical observables such as stellar surface density, color ( $u-r$ ), stellar mass, and concentration index, with posterior parameters fit to xGASS/ALFALFA (Li et al., 2022). Scatter around the mean estimator is modeled as a Gaussian distribution whose variance depends on HI mass.

Table: Example Column Structure in Optical-HI Joint Catalogs (per (Obreschkow et al., 2014)) | Col | Parameter | Description | |-----|--------------------|-----------------------------------------| | 7 | $M_*$ | Stellar mass [M $_\odot$ ] | | 8 | $M_{\rm HI}$ | HI mass [M $_\odot$ ] | | 18 | $M_R$ | Absolute Vega R-band magnitude | | 19 | $m_R$ | Apparent Vega R-band magnitude | | 20 | $r_e$ | Optical effective radius [arcsec] |

Mock catalog completeness and the cross-matched structure are enforced by self-consistency: the same galaxy IDs, positions, and redshifts index both HI and optical data per object, eliminating post-hoc matching (Obreschkow et al., 2014).

2. HI Property Assignment and Conditional Mass Functions

HI masses in joint catalogs are assigned by several schemes:

Global HI Mass Function (HIMF): Schechter form, $\phi(M_{\rm HI}) = \ln(10)\,\phi^* \left(M_{\rm HI}/M^*_{\rm HI}\right)^{\alpha+1} e^{-M_{\rm HI}/M^*_{\rm HI}}$ , with parameters calibrated on HI source surveys (Bharti et al., 11 Jan 2026, Li et al., 2022). Completeness in $M_{\rm HI}$ is defined by survey flux density limits; e.g., the S $^3$ -SAX 100 deg $^2$ mock is complete for $M_{\rm HI} \gtrsim 10^8\,M_\odot$ for $S_{\rm HI}^{\rm peak}\geq1\,\mu$ Jy (Obreschkow et al., 2014).
Conditional HIMFs: Conditional Schechter fits $\phi_{\rm cell}(M_{\rm HI}|M_r,u-r)$ encode joint distributions with optical magnitude and color, providing fine-grained assignment of HI content as a function of galaxy type (Bharti et al., 11 Jan 2026). Mock catalogs interpolate these fits over the color-magnitude plane for synthetic samples.
Parametric HI-stellar relations: Estimators such as

$\langle \log_{10}(M_{\rm HI}/M_*) \rangle = a\log_{10}\mu_* + b(u-r) + c\log_{10}M_* + d\log_{10}\frac{R_{90}}{R_{50}} + q$

with (posterior) coefficients $(a,b,c,d,q)$ are calibrated with detection and non-detection samples and applied to optical data for HI assignment (Li et al., 2022). Gaussian scatter, with variance $\sigma(m_0)=|c_a m_0 + c_b|$ for $m_0\geq8.5$ , is applied in mock realization.

Conditional HI mass functions within group environments or halo bins (CHIMFs) are derived by mapping optical group catalog membership to the HI estimator, yielding global HIMFs and group-wise CHIMFs that are well described by Schechter forms (all/red/blue/satellite) or double-Gaussians for central galaxies (Li et al., 2022).

3. Optical Property Modeling and Cross-matching Schemes

Optical properties in joint mocks are assigned as follows:

Luminosity Function Sampling: $M_r$ is drawn via $1/V_{\max}$ fitting of the observed SDSS LF; joint distributions in color-magnitude space are utilized for further assignment (Bharti et al., 11 Jan 2026, Paranjape et al., 2021).
Color Assignment: Color distributions ( $u-r, g-r$ ) are modeled as Gaussian mixtures (red/blue) at fixed $M_r$ , with mixture fractions and shape parameters polynomially dependent on magnitude (Paranjape et al., 2021).
Stellar Mass-to-Light Prescription: Mass-to-light ratios assigned as polynomial/tanh functions of color, with scatter empirically calibrated (Paranjape et al., 2021).
Morphology/Size: Morphological type, effective radii, and numerical inclination are cataloged per galaxy (Obreschkow et al., 2014).

Optical-HI cross-matching is inherent: catalog rows are self-consistent across modalities (RA, Dec, $z$ , ID), so each galaxy is simultaneously an optical and HI object, with environmental tagging available via external friends-of-friends algorithms or group catalogs (Obreschkow et al., 2014, Li et al., 2022).

4. Completeness, Selection Functions, and Survey Simulations

Selection in joint mocks reflects both HI and optical survey limits:

HI Completeness: For the 100 deg $^2$ mock cone, $S_{\rm HI}^{\rm peak}\geq1\,\mu$ Jy ensures completeness for $M_{\rm HI}\gtrsim10^8\,M_\odot$ at $z\leq1.2$ ; shallower subcatalogs provide coarser completeness depending on $S_{\rm HI}^{\rm peak}$ (Obreschkow et al., 2014).
Optical Limits: R-band magnitude cuts (e.g., $m_R\leq20$ for GAMA-like surveys) intersect with the HI sample incompletely, especially at higher $z$ (e.g., only $\lesssim10\%$ at $z\gtrsim0.6$ ) (Obreschkow et al., 2014).
SKA Precursor Simulation: Mock catalog survey parameters include field layouts, integration times, noise floors, and channelization (e.g., MIGHTEE-HI: 20 fields, $25$ hr each, $\sim20$ deg $^2$ , $\sigma_{\rm rms}\sim10\,\mu$ Jy per $10$ km/s) (Bharti et al., 11 Jan 2026).
HI Line Sensitivity: Integrated 21 cm flux:

$S_{21}[\mathrm{Jy\,km/s}] = 2.356\times10^5\,\frac{M_{\rm HI}}{D_L^2}(1+z)$

where $D_L$ is luminosity distance; detection threshold set by $\mathrm{SNR}>5$ relative to thermal noise (Bharti et al., 11 Jan 2026, Obreschkow et al., 2014).

Selection for Mock Realizations: For mock construction, apply completeness cuts on derived $S_{21}$ and $W_{50}$ , and completeness weighting ( $V_{\max}$ , spectroscopic and density corrections) for statistical reconstructions (Li et al., 2022).

5. Stacking Analysis, Validation, and Statistical Products

Stacking of non-detect HI spectra is an integral application of joint mocks:

Stacking Protocols: Galaxies below the direct detection threshold are co-added in velocity space, boosting SNR by $\sqrt{N}$ , with mean stacked spectrum yielding $\langle M_{\rm HI}\rangle$ via

$\langle M_{\rm HI}\rangle_{\rm stack} = 2.356\times10^5\,\frac{(1+z)}{D_L^2}\sum_i \int S_i(v)\,dv$

(Bharti et al., 11 Jan 2026).

Statistical Products: Mock catalogs are used to estimate global HIMF via $1/V_{\max}$ $1/ V_{m a x}$ , conditional HIMFs in groups, HI–halo mass relations, and clustering statistics:
- HIMF: $\Phi^* = 2.61\times10^{-3}$ Mpc $^{-3}$ dex $^{-1}$ , $\log M^*/M_\odot = 10.07$ , $\alpha = -1.51$ (Li et al., 2022).
- HI–halo relation: $\langle M_{\rm HI}\rangle$ increases monotonically with $M_h$ ; satellites dominate in rich clusters (Li et al., 2022).
- Clustering: Two-point and projected correlation functions, cross-correlation between optically and HI-select samples, and velocity-width functions (e.g., $\phi(W_{50})$ ) are tabulated (Paranjape et al., 2021).
Validation: Mock outputs are validated by reproducing underlying survey distributions in $u-r$ , $M_*$ , $M_{\rm HI}$ ; bias in reconstructed HI mass and optical-HI scaling relations is minimized ( $<0.02$ dex) via the adopted estimator fits (Li et al., 2022).
Clustering-based Redshift Estimation: Joint optical-HI mocks support cross-correlation analyses (e.g., $w_{gT}(z_i)$ ) for redshift distributions estimation in photometric samples, with bias calibration per $b_{\rm HI}/b_g$ ratio (Cunnington et al., 2018).

6. Impact, Limitations, and Future Applications

Optical-HI joint mocks form the basis for end-to-end survey simulations, instrument design, and population synthesis for planned 21 cm and multiwavelength surveys. Key impacts include:

Survey Optimization: Area-depth trade studies, exposure budgeting, and stacking field selection benefit from mock-based forecast metrics (Bharti et al., 11 Jan 2026).
Bayesian Inference: Stacking and direct detection are combined to constrain HIMF parameters via hierarchical Bayesian methods (Bharti et al., 11 Jan 2026).
Model Testing: Joint mocks validate null hypotheses for assembly bias, HI-environment relations, and the universality of scaling laws within $\Lambda$ CDM (Paranjape et al., 2021, Li et al., 2022).
Limitations: Low-mass ( $M_{\rm HI}<10^8\,M_\odot$ ) and low- $z$ incompleteness, lack of satellite disk/gas stripping, and simplistic disk-bulge decomposition are noted. HI–halo relations observed from mocks are not fully captured by current hydrodynamical models, indicating gaps in theoretical understanding (Li et al., 2022).

A plausible implication is that further refinement of HI–optical scaling relations and environmental physics in mock construction is essential for upcoming SKA-era survey science. Extension to clustering, full environment tags, and multi-phase gas will enhance predictive power and enable more robust cross-survey analyses.