- The paper demonstrates that current gravitational waveform models fail to accurately recover parameters in high-mass-ratio BBH mergers, showing mismatches up to 0.4.
- It details extensive NR simulations with high resolutions and varied spin misalignments, rigorously quantifying errors relative to model predictions.
- The findings reveal significant mass and spin biases in Bayesian inference, emphasizing the need for improved NR simulations and new modeling strategies.
Introduction and Context
Binary black hole (BBH) mergers are a core source of gravitational wave (GW) signals, critically informing the fields of astrophysics, cosmology, and fundamental physics. Precise inference of source parameters from GW data depends fundamentally on efficient and accurate waveform models, which encode the predictions of general relativity (GR) for binary evolution through inspiral, merger, and ringdown. Although waveform modeling has achieved considerable success for moderate mass ratios and spin configurations, substantial gaps exist for high-mass-ratio, highly precessing systems—regimes that will increasingly populate catalogs as detector sensitivity improves. Addressing this, the authors present long-duration numerical relativity (NR) simulations of precessing BBHs with mass ratio q=18 and dimensionless spin χ1​=0.8 (secondary non-spinning), sampling five spin misalignment angles. These are leveraged to rigorously quantify the deficiencies of state-of-the-art waveform models in this challenging but astrophysically relevant sector.
Numerical Relativity Simulations
The simulations use the BAM NR code in Cartesian meshes, employing sixth-order finite differencing and fourth-order Runge-Kutta time-stepping. Each configuration is run at three resolutions up to N=1723 for convergence studies. The initial data are constructed via Bowen-York puncture methods, with spin misalignments θLS1​​∈{30∘,60∘,90∘,120∘,150∘}. The simulations cover ∼3000M of evolution (comparable to tens of orbits for these parameters), providing crucial coverage of an under-explored area of BBH parameter space.




Figure 1: The plus GW polarization of each NR simulation with total redshifted mass 150M⊙​, θLN​=30∘, DL​=300 Mpc, and varying spin misalignments. Waveforms demonstrate strong precession-driven modulations and higher-mode structure.
A critical finding is the computational toll of such simulations: the highest-resolution run requires over $13$ million CPU hours, highlighting a key bottleneck in NR-based modeling at high q.
NR Accuracy Assessment
Finite-difference errors and finite-extraction radius errors are evaluated by direct waveform comparisons (mismatch analysis) at multiple resolutions and extraction radii.
Figure 2: Mismatch between NR waveforms (configuration CF_81) at different resolutions, as a function of inclination, reveals convergence but non-negligible errors at feasible resolutions.
The analysis indicates that while highest-resolution waveforms attain mismatch errors χ1​=0.80 (relative to infinite resolution), achieving errors typical at mass ratios χ1​=0.81 is currently infeasible. Despite these errors, the impact on Bayesian parameter estimation is markedly smaller than naively suggested by mismatch values, underscoring that leading NR errors are largely orthogonal to the physical signal manifold.
The study benchmarks three current models: IMRPhenomXPNR (XPNR), IMRPhenomTPHM (TPHM), and SEOBNRv5PHM (v5PHM), selected for their use in contemporary LVK analyses and their coverage of precession and higher modes for quasi-circular, non-eccentric BBHs.

Figure 3: Plus polarization reconstruction by contemporary models (XPNR, TPHM, v5PHM) versus NR waveform for mild (χ1​=0.82) and extreme (χ1​=0.83) spin misalignment. All models display substantial amplitude and phase errors relative to NR data.
Mismatch Analysis
For all five parameter-space points, even after parameter optimization, minimal mismatches between NR and any model exceed χ1​=0.84 and reach χ1​=0.85 in some cases.

Figure 4: SNR-weighted mismatch range for all models and spin orientations. The lower panel shows enhanced mismatches once higher-order modes are included—indicative of model failures primarily in merger/post-merger configurations and higher multipoles.
Discrepancies persist even for the dominant χ1​=0.86 mode, signifying fundamental modeling errors, not merely poor higher-mode calibration.
Parameter Estimation and Bias
To directly relate model deficiencies to astrophysical inference, the authors inject NR waveforms (as "true" signals) into synthetic data for the Advanced LIGO–Virgo network and perform Bayesian parameter estimation using aligned and precessing models at SNRχ1​=0.87.



Figure 5: Posterior distributions for component masses and primary spin (magnitude and orientation) for χ1​=0.88, with inclination angles χ1​=0.89 and N=17230. True values (black marks) are frequently missed by all models.
Key results:
- Errors in recovered mass ratio (N=17231) can exceed 50%, with actual N=17232 systems measured with N=17233.
- Primary spin magnitude and direction are systematically biased; the true value is sometimes outside the 90% credible region.
- Biases persist or worsen in edge-on (N=17234) or high-precession systems.
- Restriction to quadrupolar (N=17235) modes does not eliminate biases, confirming that errors propagate from the fundamental waveform structure.

Figure 6: Restricting injection and recovery to N=17236 modes degrades posteriors, confirming that inaccuracies are not solely tied to poor higher-mode modeling.


Figure 7: For N=17237, severe mass and spin recovery biases persist and can reach N=17238100% for N=17239.
NR errors (resolution/extraction) are empirically shown to minimally affect recovered posteriors, provided sufficient, but not overwhelming, accuracy is achieved.

Figure 8: Posterior mass and spin estimates for three NR resolutions; highest and next-highest are almost indistinguishable.
Implications, Open Problems, and Future Directions
The analysis robustly demonstrates that no current waveform model (Phenom, SEOBNR, or surrogates) provides reliable mass, spin, or orientation recovery outside their calibration regimes (θLS1​​∈{30∘,60∘,90∘,120∘,150∘}0 with strong misalignment/precession). Serious modeling errors are present even for the dominant multipoles, and biases at θLS1​​∈{30∘,60∘,90∘,120∘,150∘}1 can readily exceed 100% for well-detected sources. The inability to trust present models in this regime has concrete consequences for intermediate-mass-ratio binary science, both for ground and next-generation detectors (Einstein Telescope, LISA).
Although computational costs are formidable, the demonstrated orthogonality of NR errors with measurement biases suggests that moderately accurate, long-duration NR simulations may be sufficient for building usable models in the near term—for moderate SNR events (SNRθLS1​​∈{30∘,60∘,90∘,120∘,150∘}2). Determining the exact NR accuracy thresholds for next-generation model calibration remains an open problem and merits systematic study.
Simulation and model development in the θLS1​​∈{30∘,60∘,90∘,120∘,150∘}3 regime is a pressing and currently open frontier for BBH waveform modeling. Substantially improved efficient NR codes (e.g., using mesh refinement, multipatch, or advanced gauge choices) are needed to make routine high-θLS1​​∈{30∘,60∘,90∘,120∘,150∘}4, high-precession simulation and subsequent surrogate/surrogate-based model construction tractable.
Conclusion
Through an extensive suite of high-mass-ratio precessing NR simulations, the authors diagnose and elucidate the critical failure of all current waveform models outside their calibration regime—even for key system parameters like mass ratio and spin. These results clarify the need for substantial advances in both modeling methodology and NR simulation capabilities. The findings directly inform the priorities for BBH waveform development for upcoming observing runs and future detectors and provide high-precision data to the modeling community for improved calibration and validation. Only with such advances will gravitational-wave astronomy be able to extract unbiased science from the full population of BBH mergers anticipated in the next decade.