E-values as statistical evidence: A comparison to Bayes factors, likelihoods, and p-values

Published 25 Mar 2026 in stat.ME, math.ST, and stat.OT | (2603.24421v1)

Abstract: A recurring debate in the philosophy of statistics concerns what, exactly, should count as a measure of evidence for or against a given hypothesis. P-values, likelihood ratios, and Bayes factors all have their defenders. In this paper we add two additional candidates to this list: the e-value and its sequential analogue, the e-process. E-values enjoy several desirable properties as measures of evidence: they combine naturally across studies, handle composite hypotheses, provide long-run error rates, and admit a useful interpretation as the wealth accrued by a bettor in a game against the null distribution. E-processes additionally handle optional stopping and optional continuation. This work examines the extent to which e-values and e-processes satisfy the evidential desiderata of different statistical traditions, concluding that they combine attractive features of p-values, likelihood ratios, and Bayes factors, and merit serious consideration as interpretable and intuitive measures of statistical evidence.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces e-values and e-processes as robust measures that integrate advantages of both frequentist error control and Bayesian evidential support.
It demonstrates that e-values provide nonasymptotic error control, composability, and validity under optional stopping and adaptive sampling.
The comparative analysis evidences that e-statistics outperform traditional metrics like p-values and Bayes factors, especially in sequential and nonparametric settings.

E-Values as Statistical Evidence: Integration, Properties, and Comparative Assessment

Introduction and Motivation

Statistical evidence, especially in hypothesis testing, has traditionally hinged on a handful of summary statistics—namely, p-values, likelihood ratios, and Bayes factors. These measures, however, have generated considerable debate regarding their efficacy for evidential assessment due to inconsistencies and theoretical limitations within varied philosophical frameworks. The paper "E-values as statistical evidence: A comparison to Bayes factors, likelihoods, and p-values" (2603.24421) offers a comprehensive comparative analysis and advocates for e-values (and their sequential analogue, e-processes) as robust alternatives, equipped with desirable properties that address longstanding concerns in both frequentist and Bayesian traditions.

Core Constructs: E-Values and E-Processes

E-values generalize key properties of likelihood ratios and Bayes factors, but fundamentally shift the interpretation toward a game-theoretic, betting perspective. An e-variable for a hypothesis $H$ is a nonnegative random variable $E$ satisfying $\mathbb{E}_P[E] \leq 1$ for all $P \in H$ , and the realized value is termed the e-value. E-processes extend this concept sequentially, accommodating adaptive data collection policies, optional stopping, and optional continuation—areas where traditional evidential measures are notoriously fragile.

This framework seamlessly subsumes likelihood ratios for simple hypotheses: the likelihood ratio is both an e-value and the log-optimal (GRO/numeraire) choice under simple-vs-simple testing. E-processes are implemented via products of nonnegative supermartingales, gaining long-run error guarantees and strong control under data-dependent stopping rules, contrasting sharply with p-values that collapse under unplanned analyses.

Synopsis of Evidential Desiderata

The authors systematically enumerate 21 evidential desiderata emanating from likelihoodist, Bayesian, and frequentist/statistical error traditions. These are partitioned into static (single-batch, non-sequential) and dynamic (sequential, meta-analytic, or counterfactual) criteria. Salient desiderata include sample size invariance, scale/reparametrization invariance, single-hypothesis validity, long-run frequentist error control, composability, and strict/loose likelihood principles. Many of these desiderata are violated by p-values, GLRs, and standard Bayes factors in complex practical regimes such as sequential trials, optional stopping, and composite or nonparametric hypothesis settings.

Comparative Analysis

Static Properties:

E-statistics satisfy cardinal criteria—scalarity, continuity, and (for well-constructed instances) consistency. They enable direct evidence statements against single hypotheses, handling composite/null models without necessitating explicit alternatives. Through Ville’s and Markov’s inequalities, e-values encode nonasymptotic frequentist error rates, supporting post hoc adaptivity—an area where p-values and most Bayes factors provide limited or no guarantees. Unlike p-values, e-statistics naturally support sample size invariance under their betting/game-theoretic interpretation.

Dynamic/Sequential Properties:

E-processes uniquely address dynamic evidential desiderata. They facilitate valid optional stopping and flexible accumulation (composable across independent/batched or adaptively designed studies), and are robust to inter- and intra-experiment counterfactual choices about sampling plans and stopping rules. In scenarios where Bayes factors or likelihood ratios may nominally appear robust (e.g., subjective Bayes factors with fixed priors), e-values offer the operational nonasymptotic guarantees that are essential in open-ended data collection environments.

Composability and Invariance:

An essential strength of e-statistics is the admissibility and rigor of combination under arbitrary dependence structures—convex combinations of arbitrary dependent e-statistics yield valid e-statistics, a property that does not generally hold for likelihood ratios or Bayes factors outside strong independence assumptions.

Relationship to Other Measures:

E-values encompass likelihood ratios (for simple hypotheses, the likelihood ratio is the optimal e-value) and, in certain settings, optimal e-values are Bayes factors under a specific (sometimes unusual) prior. Interconversion rules (e.g., inverse Markov calibration) exist but, as shown, direct construction of e-statistics (not through p-values or Bayes factors) yields higher frequentist power and better operational properties, especially for sequential and nonparametric regimes.

Caveats and Limitations:

Non-uniqueness is intrinsic: many e-statistics can be constructed for a given hypothesis, and choice of construction impacts power/optimality (mirroring the subjectivity in choice of test statistic, prior, or loss function in other paradigms). Full coherence and strict adherence to the likelihood principle may not be universal for all e-statistics, though log-optimal constructions achieve these desiderata in well-behaved cases.

Implications and Theoretical Advances

This work recontextualizes the longstanding debates on statistical evidence, showing that e-values and e-processes form a bridge between frequentist and Bayesian paradigms, merging the former's concern for error control and the latter's focus on evidential support, while jettisoning reliance on rigid design and sampling constraints. The betting-game interpretation imparts operational clarity, direct error bounds, and sequential adaptability with far-reaching implications:

Sequential Analyses and Online Testing: E-processes support valid inference in always-valid and open-ended experimentation, such as continuous A/B testing, adaptive trials, and meta-analyses.
Composite and Nonparametric Models: The machinery handles composite/null or nonparametric nulls without the inefficiency or ambiguity that impairs GLRs and Bayes factors.
Robust Meta-Analysis: Accumulation rules for e-values facilitate transparent and unified evidence synthesis across independent and adaptively designed studies.

Future Directions

The paper identifies several open issues: extending e-process theory to broader event-based filtrations for full coherence with the likelihood principle, incorporating more nuanced optimality criteria for composite alternatives, and further strengthening the link between e-statistics and information-theoretic/model selection approaches (e.g., MDL). Additionally, unifying randomized and adaptive test frameworks via e-values could refine operational decision-making in sequential testing regimes.

Conclusion

E-values and e-processes represent a rigorously justified, adaptable, and frequentist-rooted framework for measuring statistical evidence, preserving major advantages (error rate control, composability, handling of complex hypotheses, robust optional stopping) across both classical and modern statistical paradigms. Their integration of likelihood, Bayesian, and error-probability desiderata recommends their adoption for future statistical theory and practice, especially in sequential, adaptive, and high-dimensional settings. The analysis positions e-statistics as the premier candidate for unifying diverse inferential philosophies while delivering operational guarantees necessary for modern evidence-based decision making.

Reference:

"E-values as statistical evidence: A comparison to Bayes factors, likelihoods, and p-values" (2603.24421)

Markdown Report Issue