Generalized tension metrics for multiple cosmological datasets

Published 5 Dec 2025 in astro-ph.CO, astro-ph.IM, hep-ex, hep-ph, and physics.data-an | (2512.06086v1)

Abstract: We introduce a novel estimator to quantify statistical tensions among multiple cosmological datasets simultaneously. This estimator generalizes the Difference-in-Means statistic, $Q_{\rm DM}$, to the multi-dataset regime. Our framework enables the detection of dominant tension directions in the shared parameter space. It further provides a geometric interpretation of the tension for the two- and three-dataset cases in two dimensions. According to this approach, the previously reported increase in tension between DESI and Planck from $1.9σ$ (DR1) to $2.3σ$(DR2) is reinterpreted as a more modest shift from $1.18σ^{\rm eff}$ (DR1) to $1.45σ^{\rm eff}$ (DR2). These new tools may also prove valuable across research fields where dataset discrepancies arise.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel global tension metric that generalizes traditional methods for comparing multiple high-dimensional cosmological datasets.
It employs tension vectors and a symmetric dispersion tensor to quantify dataset anisotropies and define effective significance measures.
The proposed estimator reveals that conventional one-dimensional approaches often overstate tensions, providing a scalable tool for future cosmological analyses.

Generalized Tension Metrics for Multiple Cosmological Datasets

Introduction and Motivation

Persistent discrepancies among cosmological datasets—such as the CMB-inferred and local measurements of the Hubble constant—necessitate rigorous, multi-dimensional statistical tools to assess mutual consistency of parameter inferences. Traditional estimators based on one-dimensional marginalized posteriors (e.g., the rule-of-thumb $N_\sigma$ distance between means) are limited in their ability to capture high-dimensional tensions and can significantly misrepresent the true level of disagreement, particularly as the number and quality of datasets increase. Existing global tension metrics, including the Difference-in-Means statistic $Q_{\rm DM}$ , do not adequately handle the simultaneous analysis of more than two datasets. The work presented in "Generalized tension metrics for multiple cosmological datasets" (2512.06086) introduces an estimator for quantifying statistical tension amongst multiple, potentially highly-correlated, high-dimensional posterior distributions, generalizing $Q_{\rm DM}$ and providing a geometric interpretation of multi-dataset tension.

Methodological Framework

Tension Vectors and Dispersion Tensor

Let $N$ datasets provide posterior distributions in a shared $D$ -dimensional parameter space. For every pair $(i, j)$ , define the tension vector

$\vec{r}_k = \frac{\vec{\bar{\theta}}_i - \vec{\bar{\theta}}_j}{\sqrt{\hat{C}_i + \hat{C}_j}}$

where $\vec{\bar{\theta}}_i$ and $\hat{C}_i$ are the mean and covariance of the $i$ th dataset’s posterior. The set $\{\vec{r}_k\}$ spans a parameter-difference space, and their dispersion encodes the mutual inconsistencies between datasets.

The proposed global tension estimator is

$\mathcal{Q} = \frac{1}{N_p} \sum_k |\vec{r}_k|^2$

with $N_p = N(N-1)$ tension vectors, and the symmetric dispersion tensor

$\mathcal{C}_{ab} = \frac{1}{N_p} \sum_k r^a_k r^b_k$

quantifies the geometric distribution of tensions in parameter space, with eigenvalues $\{\lambda_\alpha\}$ and corresponding eigendirections.

Figure 1: Posterior distributions for three synthetic datasets (upper panels), and the associated tension vectors in parameter-difference space (lower panels), with eigenvalues of $\mathcal{C}$ reflecting the strength and geometry of the multi-dataset tension.

Hypothesis Testing and Null Model

Under the null hypothesis $H_0$ (“datasets are statistically consistent”), the tension vectors should be centered at zero in the appropriately whitened parameter space. Analytically, if all covariances are identical, $\mathcal{Q}|_{H_0} \sim \Gamma(D,1)$ ; for more general cases, the null distribution is constructed via the joint distribution of the tension vectors, taking into account their correlations.

Geometric Interpretation and Effective Significance

For three 2D posteriors, all with identical covariances and means at the vertices of an equilateral triangle of side $L$ , the observed tension can be mapped to a geometric separation $L$ in parameter space. The authors introduce an effective significance $N_\sigma^{\rm eff}$ , defined such that for $L(N_\sigma^{\rm eff}=1)=2.143$ , the $68\%$ confidence regions are tangent, with higher $N_\sigma^{\rm eff}$ marking corresponding overlaps of $95\%$ , $99.7\%$ , etc.

Figure 2: Various configurations of three posteriors mapped onto a canonical setup of three identical covariances, with means at equilateral triangle vertices.

Figure 3: Dependence of PTE and $N_\sigma$ on the geometric side length $L$ for the reference configuration, clearly delineating effective $N_\sigma^{\rm eff}$ scales in multidimensional space.

For more than three datasets, this construction generalizes to $N$ vertices of a regular polygon, with $N_\sigma^{\rm eff}$ marking the minimum pairwise separation.

Application to Synthetic and Real Cosmological Datasets

The authors demonstrate their estimator’s practical consequences using both synthetic and real cosmological posterior distributions (Pantheon+SH0ES, Planck 2018 CMB, DESI DR2 BAO, cosmic chronometers).

Figure 4: Posterior distributions from real cosmological datasets, approximated as Gaussians in the shared parameter space.

A salient result is a systematic reduction in the tension significance when using the geometric $N_\sigma^{\rm eff}$ in higher-dimensional spaces, relative to conventional $N_\sigma$ values based on 1D analogies. For instance:

The NAÏVE tension between Planck and PPS is $5.68\sigma$ , while the geometric method yields only $3.86\sigma^{\rm eff}$ .
DESI-Planck tension increase is modest: $1.18\sigma^{\rm eff}$ (DR1) $\to$ $1.45\sigma^{\rm eff}$ (DR2), contrasting with $1.9\sigma \to 2.3\sigma$ via standard reporting.

This demonstrates that conventional methods overstate the significance of multidimensional dataset inconsistencies. The method also quantifies tension anisotropy via the eccentricity of $\mathcal{C}$ :

$\mathrm{Ecc} = \sqrt{1 - \frac{\lambda_{\min}}{\lambda_{\max}}}$

with $\mathrm{Ecc}\simeq 0$ indicating isotropic tension contributions and $\mathrm{Ecc}\simeq 1$ highlighting dominance by a single direction—insights not accessible via scalar metrics.

Figure 5: Tension vectors for real datasets, annotated with eigenvalues and eccentricity, elucidating anisotropic contributions to total disagreement.

Theoretical and Practical Implications

The methodology formalizes a tension estimator agnostic to the number of datasets and dimensionality of parameter space. It clarifies that conventional 1D analogies can yield misleading overestimations of tension in high-dimensional settings; the geometric interpretation enables direct mapping between significance levels and multidimensional overlaps.

Practically, the approach enables robust, scalable tension assessments as the number of cosmological probes proliferates (e.g., DESI, Euclid, LSST). Multimodal independent parameter inference efforts can directly leverage these metrics to detect model inadequacy, systematic error, or new physics only manifest in joint inference spaces. Additionally, the framework can be transposed to other domains (e.g., particle physics, astrophysics) where multi-experiment or multi-survey inconsistencies must be jointly quantified, such as in the discrepant W boson mass measurements [CDF Collaboration, 2022Sci...376..170C].

Conclusion

This work introduces a statistically rigorous, geometrically interpretable global tension estimator for simultaneous comparison of multiple high-dimensional posteriors. By exposing the limitations of standard 1D tools and providing effective, multidimensional significance scales, it lays the foundation for quantitative consistency checks critical to modern cosmology. The approach's generality suggests adoption across scientific domains where multi-experiment inference is central. Further research may extend the formalism to non-Gaussian posteriors and explore implications for experimental design and model selection in data-intensive regimes.