Optional Stopping Theory

Updated 23 January 2026

Optional Stopping Theory is a framework that defines when to stop data collection based on past information without biasing statistical inference.
It underpins methods like the Doob optional stopping theorem and martingale properties to ensure valid Bayesian and frequentist hypothesis testing.
The theory supports robust adaptive testing through safe testing and e-value methodologies to control Type I error under flexible sampling.

Optional stopping theory addresses the statistical and probabilistic consequences of data-dependent or adaptive decisions regarding the termination of data collection in stochastic processes, sequential experiments, and statistical inference frameworks. Within the theory, mathematical guarantees are established for procedures that allow data monitoring and stopping at random times defined by the observed data history, subject to formal constraints on the stopping rule. This framework underpins both foundational results—such as the Doob optional stopping theorem for martingales—and contemporary methodologies, particularly in sequential hypothesis testing, Bayesian inference, and analysis of probabilistic programs. The technical landscape encompasses rigorous distinctions between types of stopping rules, varying levels of robustness to flexible sampling, and interactions with error control and decision-theoretic validity.

1. Formal Definitions: Stopping Rules, Martingales, and the Classical OST

A stopping time $\tau$ with respect to a filtration $\{\mathcal{F}_t\}$ is a random variable taking values in $\mathbb{N}\cup\{\infty\}$ (or more generally, in the index set of the process) such that $\{\tau\leq t\}\in\mathcal{F}_t$ for all $t$ . This formalizes the requirement that the decision to stop at time $t$ can depend only on information available up to $t$ , prohibiting "peeking" into the future.

The classical optional stopping theorem—as established by Doob—states that for a (sub/super)martingale $\{M_t\}$ and a stopping time $T$ satisfying technical conditions (e.g., bounded $T$ , uniformly integrable $\{M_t\}$ , or bounded increments with $\mathbb{E}T<\infty$ ), the process stopped at $T$ retains the (sub/super)martingale property: $\mathbb{E}[M_T]\leq \mathbb{E}[M_0]$ for supermartingales, and $\mathbb{E}[M_T]=\mathbb{E}[M_0]$ for martingales (Wang et al., 2021). These bounds extend, via variants, to processes with polynomial growth under moment constraints on $T$ , bounded continuation regions (Chen, 2012), and in vector-lattice (Riesz space) settings via spectral measures and Daniell integration (Grobler et al., 2020).

The core function of the theory is to certify the validity of expectation calculations, probabilistic assertions, and inference outcomes at random, data-adaptive termination times, provided the stopping rule is properly defined as a stopping time.

2. Optional Stopping in Bayesian and Frequentist Hypothesis Testing

Bayesian Framework: Validity and Limitations

Bayesian hypothesis testing frequently employs the Bayes factor

$\mathrm{BF}_t = \frac{P(X_{1:t}|H_1)}{P(X_{1:t}|H_0)}$

and posterior odds $\mathrm{PostOdds}_t=(\pi_1/\pi_0)\mathrm{BF}_t$ . Under the conditions that the prior is fixed in advance and the stopping rule is a proper stopping time (no data snooping or retroactive selection), a formal optional stopping result holds: the process $\{\mathrm{BF}_t\}$ is a nonnegative martingale under $H_0$ , and for $\tau$ any stopping time (Deng et al., 2016, Hendriksen et al., 2018),

$\mathbb{E}_{H_0}[\mathrm{BF}_\tau]=1$

with the posterior odds remaining "unbiased." This ensures that decision rules (e.g., "reject $H_0$ if $\mathrm{BF}_\tau>K_0$ ") maintain the intended frequentist false discovery rate (FDR) at $1/(1+K_0)$ , as in fixed-sample settings (Deng et al., 2016).

However, the generality of this immunity to optional stopping is nuanced. For Bayes factors constructed using so-called Type 0 (right-Haar/invariant) priors—e.g., Jeffreys prior on nuisance scale parameters in location-scale problems—full calibration and stopping-rule independence hold (Heide et al., 2017, Hendriksen et al., 2018). In contrast, default or pragmatic priors on parameters of interest (Type I, e.g., Cauchy priors on effect size) or priors that depend on design/stopping rule (Type II, e.g., $g$ -priors in regression with evolving design) void strong calibration and may distort inference under optional stopping. Strong calibration under true fixed-parameter sampling is only preserved for subjective priors or group-invariant (Type 0) cases; with Type I/II priors, post-hoc interpretation and error control may fail, especially for interpretations grounded in frequentist performance (Heide et al., 2017).

Frequentist Methods and Correction Mechanisms

In classical frequentist hypothesis testing, naïve optional stopping (unadjusted sequential peeking at $p$ -values) inflates Type I error well above the nominal level. To counteract this, the sequential probability ratio test (SPRT), group-sequential designs, $\alpha$ -spending functions, and "always-valid" $p$ -values are developed, all enforcing error control irrespective of the stopping rule but at the cost of more complex design and possibly reduced power or flexibility (Deng et al., 2016, Yang et al., 2 Mar 2025).

Bayesian tests with Type 0 priors offer automatic FDR control, but frequentist sequential methods target Type I error control, often neglecting prior information and requiring symmetric decision boundaries that may be suboptimal for applications with one-sided prior assumptions (Deng et al., 2016). The switch criterion (Pas et al., 2014) and $e$ -variable (test-martingale) methods (Turner et al., 2021) further extend this by providing Type I error control under any stopping rule, combining sequential flexibility with precise error bounds.

3. Safe Testing, E-Variables, and Optional Stopping Robustness

"Safe testing" constructs (e.g., $e$ -values/processes) offer a class of sequential tests whose design directly guarantees Type I error control under arbitrary stopping or continuation. An $e$ -variable $S$ is a nonnegative statistic such that $\mathbb{E}_P[S]\le 1$ for all $P\in H_0$ ; a process $(E_t)$ with $E_t=\prod_{i=1}^t S_i$ is a supermartingale starting at one. Ville's inequality enforces that for any $\alpha>0$ ,

$P\left(\sup_t E_t \geq 1/\alpha\right)\leq \alpha$

so rejecting $H_0$ when $E_t>1/\alpha$ at any $t$ maintains the overall Type I error at most $\alpha$ under all possible data-dependent stopping rules (Turner et al., 2021). This property holds independently of the block structure, adaptation history, or even if data collection is continued or combined with new or old datasets. Composite alternatives can be accommodated via Bayesian predictive mixtures, maintaining validity at the expense of optimality ("growth-rate-optimality"). This framework is robust to practice-induced forms of peeking and stopping, filling gaps left by classical p-value approaches especially in post-hoc or streaming data environments.

Bayesian tests for simple-vs-simple hypotheses with fixed priors also instantiate the $e$ -variable machinery, hence share this "optional stopping" safety for Type I error (Pas et al., 2014, Turner et al., 2021). For models with group-invariant structure, placing right-Haar priors on the nuisance group yields stopping-rule-insensitive calibration, but only for stopping rules that are themselves invariant (i.e., measurable w.r.t. the maximal invariant induced by the group action) (Hendriksen et al., 2018).

4. Extensions, Boundary Cases, and Optional Stopping in Complex Processes

Optional stopping theorems extend beyond classical martingale processes. For instance, maximal inequalities and optional-stopping variants are developed for supermartingales constrained to bounded continuity regions, permitting drop-in replacement for uniform integrability or finite-expectation requirements of the classical Doob theorem (Chen, 2012). Stopped processes in vector-lattice or Riesz-space frameworks generalize the theory beyond $L^1$ , relying on unbounded order convergence and Daniell integration with spectral measures, subsuming both classical and more abstract stochastic process contexts (Grobler et al., 2020).

Birth-death chains, which need not be martingales, may still satisfy "optional stopping like" theorems for certain large stopping times, exhibiting martingale-like terminal expectation limits governed by tail products of transition parameter sequences (Markowsky, 2011). This reveals deep connections between drift, recurrence, and optional stopping behaviors in non-martingale Markov systems.

In program analysis, variants of the optional stopping theorem enable sound expected-cost analysis of probabilistic programs. Here, cost-accruing process variables plus potential functions generate supermartingales, to which OST delivers upper bounds on total expected cost or runtime at random program termination times, as long as uniform integrability, bounded increments, or polynomial moment constraints are met (Wang et al., 2021, Schreuder et al., 2019).

5. Simulation, Empirical Studies, and Practical Implementation

Empirical studies and simulation-based evaluations substantiate theoretical results and reveal the operational implications of optional stopping rules. Simulation in the context of one-sample normal-mean testing shows that, for properly calibrated Bayesian testing with simple priors and well-defined stopping times (e.g., threshold-based Bayes factor crossing), the FDR remains at the nominal level and early stopping is possible with increased power (Deng et al., 2016).

In practical Bayesian experimental design, predictive Bayesian optional stopping (pBOS) combines traditional Bayesian stopping with forward rehearsal simulations to forecast the likely attainment of statistical targets (e.g., desired credible interval width). Stopping rules are adapted to anticipated futility, leading to substantial cost-benefit improvements (e.g., +118% under hard targets) when properly calibrated by regression adjustments to mitigate optimistic bias due to posterior variance inflation in simulated data (Yang et al., 2 Mar 2025).

Key practical guidelines include: predefining stopping rules as proper stopping times, avoiding data snooping or post-hoc rule selection, ensuring cumulative likelihood calculation, choosing priors with care (subjective, invariant, or weakly informative as appropriate), and reporting posterior odds or $e$ -values for inference statements, regardless of the stopping mechanism (Deng et al., 2016, Yang et al., 2 Mar 2025, Pas et al., 2014).

6. Controversies, Cautions, and Limitations

While "Bayesian methods handle optional stopping" is mathematically correct in several senses, its universal applicability is restricted by prior choice, model structure, and the precise definition of the stopping rule. For default or pragmatic priors not induced by group invariance, or in designs where the stopping rule depends on features not invariant under the statistical model's symmetry group, calibration and error control can fail, potentially producing anti-conservative results (Heide et al., 2017, Hendriksen et al., 2018). Strong guarantees hold for subjective priors or for right-Haar priors in group-invariant models and invariant stopping rules; otherwise, only weaker, prior-averaged, or semi-frequentist calibration may be achieved.

For high-stakes applications demanding robust Type I error control under arbitrary data-dependent stopping, safe testing ( $e$ -variable) frameworks or invariant Bayes factor tests with properly adapted stopping rules should be preferred. In other settings, prior-based Bayesian optional stopping and pBOS can be operationally powerful and resource-efficient, with the caveat of the necessity for methodological transparency and prior calibration.

A summary table of optional stopping validity under various priors and models is presented below:

Prior/Model Type	Strong Calibration Under Optional Stopping	Robust Type I Error	Applies to	References
Type 0 (right-Haar)	Yes	Yes	Group-invariant, nuisance-only	(Heide et al., 2017, Hendriksen et al., 2018)
Subjective Proper Prior	Yes (in personalist sense)	No (unless simple null)	Any	(Heide et al., 2017)
Type I/II (default, design-dependent)	No	No	Most default Bayes tests	(Heide et al., 2017)
Safe Testing ( $e$ -values)	Yes	Yes	Any sequential test	(Turner et al., 2021)

In conclusion, optional stopping theory rigorously characterizes the permissible boundaries of data-adaptive experiment termination and underpins the statistical safeguards required for maintaining validity, calibration, and interpretability of inferential statements in both classical and modern sequential inference settings. Its proper application depends critically on explicit formalization of the stopping rule and assumptions underlying the statistical model and prior structure.