AbacusSummit Mock Catalogs

Updated 16 January 2026

AbacusSummit Mock Catalogs are high-fidelity synthetic datasets derived from large-volume, high-resolution N-body simulations to support precision cosmology.
They integrate robust halo merger tree construction, rigorous catalog cleaning, advanced HOD modeling, and light-cone generation to mimic galaxy, halo, and Lyα forest observables.
These catalogs are validated against theoretical predictions and observational data, enabling reliable analysis for surveys like DESI, KiDS, and HSC.

AbacusSummit Mock Catalogs are a comprehensive suite of synthetic galaxy, halo, and Lyα forest catalogs generated from the AbacusSummit cosmological $N$ -body simulations. Designed to support clustering, weak lensing, and Lyα forest analyses for major surveys such as DESI, KiDS, HSC, and others, these mocks deliver high-fidelity synthetic data for precision cosmology, survey pipeline validation, and systematics studies. The pipeline integrates accurate merger-tree construction, robust cleaning procedures, flexible halo occupation and bias assignment prescriptions, advanced light-cone and weak lensing map generation, and empirical validation against both theoretical and real survey data.

1. AbacusSummit Simulations and Data Products

AbacusSummit consists of high-resolution, large-volume $N$ -body runs tailored to the requirements of modern spectroscopic and imaging surveys. The base simulations comprise periodic boxes with $L_{\rm box} = 2000\,h^{-1}{\rm Mpc}$ , $N_{\rm part} = 6912^3$ , corresponding to mass resolution $m_{\rm part} \approx 2.1 \times 10^9\,h^{-1}\,M_\odot$ and force softening of $\approx 20\,h^{-1}\,$ kpc; "huge" boxes of $L_{\rm box} = 7500$ -- $7600\,h^{-1}$ Mpc cover the full sky at moderate mass resolution (Bose et al., 2021, Hadzhiyska et al., 2021, Hadzhiyska et al., 2023).

Key public data products include:

Halo catalogs (CompaSO finder)
Halo merger trees and cleaned catalogs
Light-cone outputs and derived halo light-cone catalogs
Mock galaxy and QSO catalogs, including SEDs, colors, and velocity information
Wide-area weak lensing maps (shear, convergence, deflection)
Lyα forest “skewer” mocks on $3$D grids, and correlated QSO catalogs

These products are regularly validated against clustering, lensing, and cross-correlation statistics.

2. Merger Trees, Cleaning, and Halo Catalog Fidelity

AbacusSummit employs a particle-based core-tracking algorithm for halo merger tree construction. Halo associations between outputs are quantified by two statistics: the donor fraction $f_{\rm donate}$ (fraction of subsampled particles transferred between progenitor and descendant) and the match fraction $f_{\rm match}$ (core particle kernel-density overlap). Progenitors with $f_{\rm donate}\geq 0.5$ are retained, and main progenitor branches are defined via recursive maximization of $f_{\rm match}$ . Haloes with non-monotonic mass growth—where the main-progenitor's mass exceeded the present value by a factor $\kappa>2$ —are flagged and “merged” back into more massive systems to correct over-splitting and deblending artifacts (Bose et al., 2021).

Cleaning significantly improves the physicality of the final halo catalogs by:

Shifting the halo mass function to closer agreement with configuration-space finders
Suppressing unphysical “floater” populations
Eliminating spurious small-scale cross-clustering features between low- and high-mass halos
Sharpening the radial velocity profiles at cluster outskirts

The cleaned catalogs form the basis for downstream galaxy mock generation.

3. Mock Galaxy Catalog Construction: HOD and Extensions

Mock galaxy population is achieved via advanced halo occupation distribution (HOD) modeling. The standard Zheng et al. (2007) HOD is implemented for most applications, while “decorated” models are applied for emission-line galaxies, flux-limited samples, and specific survey targets (Hadzhiyska et al., 2021, Smith et al., 2023).

Baseline HOD: $\langle N_{\rm cen}(M)\rangle$ (erf step for centrals); $\langle N_{\rm sat}(M)\rangle$ ( $M$ -dependent power law for satellites), extensible to simultaneous multi-threshold fits for flux-limited samples.
ELG Mocks: Use a High-Mass-Quenched (HMQ) prescription, parameterized to interpolate between multiple redshift slices and tuned to match IllustrisTNG predictions for ELG occupation.
Flux-Limited (BGS) Mocks: A tabulated-pair-count method accelerates HOD parameter fitting and prevents unphysical “HOD crossing” between magnitude thresholds. Colors and $r$ -band magnitudes are assigned using fitted color–magnitude relations (from SDSS/GAMA). Cut-sky mocks replicate the simulated box onto the sphere, modeling apparent magnitudes, k-corrections, and redshift evolution of the luminosity function (Smith et al., 2023).
Velocity Bias: Both central and satellite galaxy velocities are adjusted according to empirical bias parameters, critical for modeling redshift-space distortions.

Bayesian inference (e.g., nested sampling) is employed to fit HOD parameters to observed number densities and clustering statistics. Agreement between mocks and data is typically at the sub-percent to few-percent level, especially after catalog cleaning.

4. Light-Cone, Weak Lensing, and Cosmological Probes

AbacusSummit supports high-fidelity mock catalogs on the observer's past light cone, enabling realistic survey footprints and redshift evolution (Hadzhiyska et al., 2021):

Halo Light-Cone Construction: Halo positions and properties are interpolated to exact light-cone crossing times using linear motion models between discrete $z$ -spaced snapshots, with mass thresholds set by particle resolution ( $M_{\rm halo}\gtrsim 2.1\times 10^{11}M_\odot/h$ ).
Galaxy Light-Cone Catalogs: Populated via HOD models, with clustering multipoles and lensing cross-correlations validated against both theory and cut-sphere full-box tests.
Lensing Maps: Weak lensing fields (shear, convergence $\kappa$ , deflection) are computed in the Born approximation, discretized into fine redshift shells, and pixelized using HEALPix at $N_{\rm side}=16384$ (pixel resolution 0.21 arcmin). Maps are generated at source redshifts $z=0.15$ –2.45 and at the CMB last scattering surface ( $z\sim 1090$ ).

Validation includes power spectrum measurements ( $C_\ell^{\kappa\kappa}$ , $C_\ell^{gg}$ , $C_\ell^{\kappa g}$ ), with agreement to theoretical predictions and external software (e.g., CCL) at the $\lesssim 0.1\%$ level for auto-spectra and $\lesssim 1\%$ for cross-spectra.

5. Lyα Forest Mocks and QSO Catalogs

The Lyα forest suite leverages AbacusSummit's high mass and force resolution to construct mock Lyα skewers at $z=2.5$ via the Fluctuating Gunn–Peterson Approximation (FGPA):

Optical Depth Assignment: $\tau(\vec{x}) = A\,[\rho(\vec{x})/\bar\rho]^\beta$ , with $(A,\beta)$ fitted to match the mean transmitted flux and 1D flux power spectrum of a reference hydro simulation (IllustrisTNG).
Redshift-Space Distortions: Skewers are shifted using interpolated particle velocities and thermally broadened with a Gaussian kernel ( $\sigma_v\sim 5\,$ km/s).
QSO Mocks: Halo-based QSO catalogs are sampled to match DESI redshift and bias distributions ( $b_{\rm QSO}(z=2.5)\simeq 3.6$ ).

Scientific validations include Lyα BAO peak broadening, Lyα×QSO cross-correlation agreement with the Kaiser model ( $\sim$ 10% agreement at $s>50\,h^{-1}$ Mpc, nonlinear deviations at smaller scales), and capability to forecast Lyα–CMB lensing cross-power and the 3-point Lyα correlation function (Hadzhiyska et al., 2023).

6. Bias Assignment and Alternative Mocking Algorithms

AbacusSummit enables rapid, large- $N$ mock halo catalog generation via parameter-free bias assignment, specifically the BAM algorithm (Balaguera-Antolínez et al., 2018):

Bias Model: Multi-dimensional, empirical $P(N_H\,|\,x,\lambda,M_K)$ distributions relate halo counts per grid cell to local matter overdensity (logarithmized), cosmic web type, and knot mass.
Sampling and Iterative Kernel Correction: The initial field is populated via sampling from $P$ ; a “bias transfer kernel” iteratively corrects the sampled field’s power spectrum toward the $N$ -body reference through Fourier-space filtering and re-sampling.
Achievable Accuracy: $P(k)$ is matched to $\sim 1\%$ up to $k\sim 1\,h\,{\rm Mpc}^{-1}$ , with reduced bispectrum $Q$ agreement at the $\sim$ 10% level.
Scalability: The BAM method is scalable to the AbacusSummit's multi-cosmology grid and is well-suited for rapid covariance matrix estimation and emulator construction.

7. Data Access, Applications, and Survey Validation

AbacusSummit mock catalogs are public via multiple DOIs and endpoints (e.g., OLCF Constellation, Globus, and https://icc.dur.ac.uk/data/), with detailed file formats (ASDF for halos, HDF5 for Lyα skewers, FITS for QSO catalogs) and Python-based access tools (abacusutils) (Hadzhiyska et al., 2021, Hadzhiyska et al., 2023, Smith et al., 2023).

Key use cases include:

Validation and pipeline testing for DESI, KiDS, HSC, BGS, and CMB lensing analyses
Systematics quantification: fiber collisions, completeness, redshift-space distortions
Multi-tracer and cross-correlation analyses (galaxy–CMB, Lyα–QSO, clustering–lensing)
Large-scale covariance matrix estimation across cosmology parameter grids.

The mock catalogs achieve sub-percent agreement with theoretical and simulation clustering and lensing statistics over a broad redshift and scale range, fulfilling the accuracy requirements for Stage-IV dark energy survey cosmology and multi-probe analyses.

References

(Bose et al., 2021) Bose et al., "Constructing high-fidelity halo merger trees in AbacusSummit"
(Hadzhiyska et al., 2021) Hadzhiyska et al., "The halo light cone catalogues of AbacusSummit"
(Hadzhiyska et al., 2023) Rogers et al., "Planting a Lyman alpha forest on AbacusSummit"
(Smith et al., 2023) Smith et al., "Generating mock galaxy catalogues for flux-limited samples like the DESI Bright Galaxy Survey"
(Balaguera-Antolínez et al., 2018) Balaguera-Antolínez et al., "BAM: Bias Assignment Method to generate mock catalogs"