Papers
Topics
Authors
Recent
Search
2000 character limit reached

Compound E-Variables in Multiple Testing

Updated 19 December 2025
  • Compound e-variables are multidimensional generalizations of e-variables that relax per-hypothesis constraints to control the aggregate expected error across multiple tests.
  • They leverage convex optimization and mixture likelihood ratios to connect with empirical Bayes and optimal discovery procedures for flexible FDR control.
  • Their properties—such as closure under convex combinations and Rao–Blackwellization—enhance performance in sequential analysis and multi-hypothesis testing.

Compound e-variables, alternatively termed compound e-values, are multidimensional generalizations of the e-variable methodology for hypothesis testing and evidence accumulation. Whereas a single e-variable provides a nonnegative data-dependent measure controlling the average type I error for a simple null, compound e-variables relax this requirement across multiple, possibly dependent, hypotheses by constraining the sum of expected e-values over the set of true nulls. This formal relaxation enables cross-hypothesis strength sharing, data-driven weighting, and explicit connection to multiple-testing error control procedures, notably false discovery rate (FDR) control, empirical Bayes, and the optimal discovery procedure.

1. Foundational Definitions and Distinctions

The prototypical e-variable for a null hypothesis family Θ0\Theta_0 is a nonnegative measurable function E:Ω[0,]E:\Omega\to[0,\infty] such that

PΘ0:EP[E]1.\forall P\in\Theta_0: \quad \mathbb{E}_P[E] \le 1.

Extending to the “compound” setting, for KK (possibly overlapping) nulls Θ01,,Θ0K\Theta_0^1, \ldots, \Theta_0^K, a compound e-variable is a KK-tuple (E1,,EK)(E_1, \ldots, E_K) of nonnegative random variables satisfying: θΘ0=kΘ0k,k:θΘ0kEθ[Ek]K.\forall \theta\in\Theta_0 = \bigcup_k \Theta_0^k,\quad \sum_{k:\theta\in\Theta_0^k} \mathbb{E}_\theta[E_k] \le K. This requirement imposes that, averaged over the true nulls, the expected e-value is at most one. Compound e-values can be classified as:

  • Bona fide (exact) compound e-variables: Sum of expectations over true nulls bounded by KK everywhere.
  • Approximate compound e-variables: The bound holds up to (1+ε)K(1+\varepsilon)K on high-probability events.
  • Asymptotic compound e-variables: Approximation error vanishes as KK\to\infty or sample size grows.

Simple (separable) e-values require EQ[Ek]1\mathbb{E}_Q[E_k]\le1 for each kk and null QHkQ\in H_k, whereas compound e-values replace this with control over the aggregate. The relaxation drives their strength in contexts like multiple testing, weighted inference, and empirical Bayes (Ignatiadis et al., 2024).

2. Compound e-Variables and Decision Theory

The underlying optimization for compound e-values is rooted in compound decision theory. Robbins’s theorem establishes that the expected performance of separable estimators is equivalent to Bayes risk with the empirical prior, permitting the derivation of log-optimal compound e-values through convex optimization. For testing KK pairs of distributions {qk0,qk1}\{q_k^0, q_k^1\}, the log-optimal separable compound e-value takes the form: Ek=j=1Kqj1(Xk)j=1Kqj0(Xk).E_k = \frac{\sum_{j=1}^K q_j^1(X_k)}{\sum_{j=1}^K q_j^0(X_k)}. This mixture likelihood ratio can alternatively be viewed in empirical Bayes language as the ratio of mixture densities over the empirical null and alternative (Ignatiadis et al., 2024).

Compound e-variables are closed under convex combinations and randomization, enabling combination (ensemble) or derandomization of multiple testing methods. This property underpins flexible FDR procedures derived from varied sources (e.g., knockoffs, weighted p-values) and enables stable inference by averaging over random seeds or methodologies.

3. Universality and Procedures in FDR Control

Compound e-variables are intrinsically linked to FDR control through the universal e-BH procedure—an analogue to the Benjamini-Hochberg approach. For compound e-values E1,...,EKE_1, ..., E_K, indexing order statistics E[1]E[K]E_{[1]} \geq \cdots \geq E_{[K]}, the e-BH rejection rule is: k=max{k:  kE[k]K1α},k^* = \max\bigg\{k:\; \frac{k\,E_{[k]}}{K} \geq \frac{1}{\alpha}\bigg\}, rejecting the top kk^* hypotheses.

Remarkably, every FDR-controlling rule at level α\alpha can be instantiated as an e-BH procedure with appropriate compound e-values: Ek=KαVkR1,E_k = \frac{K}{\alpha} \frac{V_k}{R \vee 1}, where VkV_k indicates rejection by the rule and R=VkR=\sum V_k. This demonstrates the universality of the compound e-value/e-BH formalism—admissible FDR procedures can be reconstructed via e-BH applied to their induced compound e-values (Ignatiadis et al., 2024).

4. Rao–Blackwellization and Utility Optimization

Rao–Blackwellization of compound e-variables mirrors the classic Rao–Blackwell theorem for estimators. For a compound e-variable (E1,,EK)(E_1,\ldots,E_K) and sufficient statistics SkS_k for each null and alternative, the conditional expectations Gk=E[EkSk]G_k = \mathbb{E}[E_k|S_k] form a Rao–Blackwellized compound e-variable. This operation is utility-dominating for all concave utilities UU: for every θΘ0kΘ1\theta\in\Theta_0^k\cup\Theta_1,

Eθ[U(Gk)]Eθ[U(Ek)].\mathbb{E}_\theta[U(G_k)] \geq \mathbb{E}_\theta[U(E_k)].

Thus, conditionalization on sufficient statistics strictly improves any risk-averse objective, including log-wealth growth in the Kelly betting context, and extends to e-processes and asymptotic e-variables (Roos et al., 18 Dec 2025).

This property motivates the operational principle: after constructing e-variables (or compound e-variables), Rao–Blackwellize by conditioning on the smallest sufficient statistic to maintain validity and unambiguously strengthen concave utility metrics.

5. Applications: Multiple Testing, Empirical Bayes, and Sequential Analysis

Compound e-variables efficiently address multiple testing, particularly for FDR control in dependent or heterogeneous settings:

  • Weighted and Non-separable Testing: Compound e-values facilitate data-driven weights and non-separable constructions, borrowing strength across hypotheses (Ignatiadis et al., 2024).
  • Empirical Bayes and Optimal Discovery Procedure (ODP): The mixture-likelihood ratio form of log-optimal compound e-values aligns with the ODP of Storey et al., and empirical Bayes estimation plugs in mixture densities, supporting data-driven compound e-values with type I error guarantees (Ignatiadis et al., 2024).
  • Asymptotic Regimes: Plug-in or estimated statistics (e.g., variances in multiple tt-tests with unequal variances) yield asymptotic compound e-values, preserving asymptotic FDR control. The ep-BH method combines these e-values with pp-values, defining data-adaptive BH thresholds (Ignatiadis et al., 2024).
  • Sequential and Process Extentions: Compound e-variables extend directly to e-processes and sequential analysis via sufficient filtrations, upholding validity at all stopping times and again dominating in concave utility (Roos et al., 18 Dec 2025).

Table: Illustrations of Compound e-Variable Applications

Area Construction/Usage Source
Multiple tt-tests Plug-in estimated variances in Ek=KSk2/σ^j2E_k=K S_k^2/\sum \hat\sigma_j^2 (Ignatiadis et al., 2024)
ODP/Empirical Bayes Mixture likelihood ratios, possible data-driven (Ignatiadis et al., 2024)
Sequential e-processes Rao–Blackwellization on sufficient filtrations (Roos et al., 18 Dec 2025)

6. Connections to Growth-Optimality and Tail Probability Bounds

In the context of multivariate alternatives and exponential families, growth-optimal e-variables—those maximizing worst-case expected log-evidence—are achieved by likelihood ratios at the information projection onto the alternative set: SGROW(Y)=pθ(μ)(Y)p0(Y),μ=argminμM1D(PμP0).S_{\text{GROW}}(Y)=\frac{p_{\theta(\mu^*)}(Y)}{p_0(Y)},\quad \mu^* = \arg\min_{\mu\in \mathcal{M}_1} D(P_\mu\|P_0). This construction yields sharp Csiszár–Sanov–Chernoff bounds for the null probability of rare alternative-region events, both in convex and “surrounding” nonconvex settings using normalized maximum likelihood (NML) mixtures and regret minimization (Grünwald et al., 2024).

These results illustrate the deep connection between growth-optimal e-variable construction, information projections in exponential families, and minimax regret frameworks—central to nonasymptotic valid inference and anytime-valid bounds.

7. Summary and Significance

Compound e-variables unify and generalize the e-variable methodology for modern multiple testing, sequential analysis, and evidence accumulation. Their core rationale is the relaxation of per-null constraints to aggregate constraints, facilitating strength sharing, flexible FDR procedures, empirical Bayes modeling, and log-optimal decision strategies. Universality, closure under averaging, and Rao–Blackwellization undergird their dominance in concave utility metrics, with direct operational consequences for ensembling, derandomization, and adaptive inference. Compound e-values thus operationalize a mathematically rigorous and practically flexible approach to controlling average type I error—encompassing and extending numerous classical and contemporary statistical techniques (Ignatiadis et al., 2024, Roos et al., 18 Dec 2025, Grünwald et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Compound E-Variables.