Anytime-Validity & Type I Error Control
- Anytime-validity is a framework that ensures statistical tests maintain Type I error control under arbitrary, adaptive stopping rules using test martingales and e-processes.
- Key methods include the construction of e-processes and adaptive spending mechanisms that provide uniform error bounds and confidence sequences across sequential observations.
- Applications span online multiple testing, A/B testing, regression, and survival analysis, offering robust inference with sustained error control in dynamic data environments.
Anytime-validity and Type I error control are central concepts in sequential and online statistical inference, where inferential validity must persist regardless of when analysis stops or resumes. Contemporary frameworks employ e-values, test martingales, and adaptive spending mechanisms to guarantee global error bounds—such as familywise error rate (FWER) or marginal Type I error—under arbitrary stopping rules and dependence structures. This article presents the mathematical foundations, construction principles, representative methodologies, and practical regimes of anytime-valid inference and robust Type I error control in modern research.
1. Mathematical Foundations of Anytime Validity
Let denote a possibly infinite data sequence with associated filtrations . An inferential procedure (test or confidence interval) is said to be anytime valid at level for a parameter if
where is a sequence of random sets (confidence intervals/bands/sequences). This ensures that the Type I error control persists across all possible interim and final analyses, including those informed by arbitrary, data-adaptive stopping rules (Maharaj et al., 2023, Lindon et al., 2020, Turner et al., 2022).
The theoretical pillar is the test martingale (or e-process), a nonnegative -adapted process , typically constructed such that for all ,
for any (possibly random) stopping time . Ville's maximal inequality then yields
so the procedure maintains exact or asymptotic Type I error.
2. Construction of E-Processes and E-Variables
The construction of e-processes and test martingales underlies almost all modern anytime-valid methods. An e-variable is a nonnegative statistic satisfying for all . A sequence forms an e-process or test martingale if is a nonnegative -martingale or supermartingale under each (Turner et al., 2022, Koning, 2023).
Key properties:
- Optional Stopping: For any (possibly data-adaptive) stopping time , for all .
- Uniformity: Markov's inequality yields time-uniform guarantees: .
- Confidence Sequences: Invert the acceptance region () to construct confidence sets satisfying (Turner et al., 2022, Koning, 2023, Lindon et al., 2020).
This framework generalizes classical likelihood-ratio based tests, accommodates composite nulls (via e-mixtures and reverse information projections), and enables both parametric and nonparametric robustification (Saha et al., 2024).
3. Representative Methodologies
a. Online FWER Control under Dependence
For online multiple testing with dependent test statistics, a framework based on consistent weights replaces discrete candidate sets (Jankovic et al., 2024). For sequential hypotheses , if is the number of false rejections among the first tests,
is guaranteed by enforcing the adaptive wealth-spending constraint: This holds asymptotically even under arbitrary dependencies, and remains valid under arbitrary stops or extensions.
b. Likelihood Ratio and Mixture Martingales
For fixed or composite hypotheses in parametric or nonparametric settings:
- Likelihood-ratio martingales form classic e-processes; for i.i.d. , is a nonnegative -martingale.
- Mixture martingales (e.g., Dirichlet-multinomial and Bayesian alternatives) extend this to composite nulls or alternatives (Lindon et al., 2020, Lindon et al., 2024).
- Bayesian mixtures allow robust power control and adaptive inference via prior tuning; all maintain anytime-valid Type I error by Ville's bound.
- In contaminated or model-uncertain regimes, robust truncation (Huber-style) of LRs yields supermartingales that preserve validity against adversarial or -contaminated distributions (Saha et al., 2024).
c. Regression, A/B Testing, and Survival Analysis
Anytime-valid confidence sequences and tests have been constructed for:
- Mean-difference or lift statistics in A/B testing (Maharaj et al., 2023), where a nonnegative self-normalized martingale establishes time-uniform bounds.
- Linear regression and regression-adjusted causal inference, where closed-form e-processes for -tests (under Gaussian linear models), robustified quadratic forms, and model-free ATE estimands yield confidence sequences with coverage holding uniformly for all samples (Lindon et al., 2022).
- Logrank tests and Cox regression in survival analysis, where each event's likelihood-ratio increment forms a step of the test martingale; the product yields an AV (anytime-valid) logrank test and associated confidence sequences (Schure et al., 2020).
4. Error Control: Theoretical Guarantees and Simulation Evidence
The core theoretical guarantee is that, under the null, the boundary crossing probability for acceptance thresholds is bounded by the nominal level uniformly over all (possibly adaptive) times and under arbitrary stopping: or, for FWER: as shown in exact or asymptotic limits (Jankovic et al., 2024, Turner et al., 2022, Lindon et al., 2020, Koning, 2023).
Extensive simulation studies demonstrate:
- In online multiple testing, continuous-spending and graph-based procedures maintain empirical FWER at the nominal level across a wide range of scenarios, including dependent tests (AR(1) correlation), platform trial designs, and various proportions of true alternatives (Jankovic et al., 2024).
- In A/B tests, empirical Type I error remains at or below nominal under continuous monitoring; standard fixed-horizon protocols are severely anti-conservative under peeking or optional stopping (Maharaj et al., 2023).
- Under adaptive or contaminated regimes, robust e-processes prevent explosive Type I error, where classical likelihood-based tests fail (Saha et al., 2024).
5. Adaptivity, Optional Stopping, and Optional Continuation
A distinctive strength of anytime-valid methodology is its robustness to adaptive design. For any e-process, test levels and inferential boundaries remain valid:
- Under arbitrary, possibly data-dependent stopping times.
- After resuming/continuing a test beyond the original planned sample size.
- When combining evidence from independent sequential tests (by product of e-values or resetting significance levels as future thresholds), maintaining overall procedure validity (Koning et al., 7 Jan 2025, Schure et al., 2020).
For example, any fixed- classical test can be "sequentialized" by constructing the Doob martingale of its test function; the process then gives exactly the fixed-horizon power at , and matches the original underlying test at (Koning et al., 7 Jan 2025).
6. Classes of Procedures and Their Domains of Application
| Domain | Test Statistic/CS Construction | Reference |
|---|---|---|
| Online multiple testing (FWER) | Consistent weights, adaptive spending | (Jankovic et al., 2024) |
| A/B testing, mean/lift | Self-normalized martingale, CLT | (Maharaj et al., 2023) |
| Linear regression, causal inference | Sequential -test, robust e-process | (Lindon et al., 2022) |
| Categorical/multinomial testing | Mixture martingale (Dirichlet-Bayes) | (Lindon et al., 2020) |
| Contamination-robust testing | Huber-truncated LR e-process | (Saha et al., 2024) |
| Ranked, nonparametric independence | Sequentially-binned Bayes-factor marting. | (Henzi et al., 2023) |
| Survival analysis, Cox regression | Likelihood-ratio event martingales | (Schure et al., 2020) |
Each entry reflects an instantiation of the abstract e-process/e-variable paradigm to distinct statistical modalities.
7. Limitations and Ongoing Research
Contemporary anytime-valid approaches achieve robustness under general stopping and dependence, but several open problems remain:
- Not all dependency structures or adaptive strategies permit exact power characterization; some procedures guarantee only asymptotic FWER or power (Jankovic et al., 2024).
- Most frameworks implement "adaptivity" (adaptive spending) but not full "discarding" (as in ADDIS–Spending) (Jankovic et al., 2024).
- Power loss relative to fixed-horizon procedures at a pre-specified can be modest but nonzero, especially for extremely conservative procedures or when minimal detectable effect is underestimated (Maharaj et al., 2023).
Ongoing research directions address integrating discarding, optimal e-variable design, and minimax power under more challenging data corruption and dependency regimes.
References:
(Lindon et al., 2020, Turner et al., 2022, Lindon et al., 2022, Maharaj et al., 2023, Henzi et al., 2023, Koning, 2023, Jankovic et al., 2024, Saha et al., 2024, Lindon et al., 2024, Koning et al., 7 Jan 2025, Schure et al., 2020).