Variance-Adaptive Confidence Sequences

Updated 25 December 2025

Variance-adaptive CSs are sequences of confidence intervals that adapt their width using the cumulative empirical variance, ensuring nonasymptotic and time-uniform coverage.
They leverage self-normalization and martingale exponential techniques to achieve optimal shrinking rates, matching limits dictated by the law of the iterated logarithm.
These methods are extendable to handle heavy-tailed data, matrix estimations, and adaptive online inference in settings like bandit algorithms and reinforcement learning.

A variance-adaptive confidence sequence (CS) is a sequence of confidence intervals for an online, possibly non-i.i.d., stochastic process, whose width adapts at each time $t$ to the empirical variance accumulated so far. Such sequences provide nonasymptotic, nonparametric, and time-uniform coverage guarantees, meaning the probability of ever excluding the true quantity of interest across all times is controlled at a prescribed level. Variance-adaptive CSs generalize classical fixed-variance (sub-Gaussian) boundaries, achieve optimal shrinking rates (including the iterated logarithm law), and have been extended to settings such as matrix mean estimation, heavy-tailed data, and sampling without replacement.

1. Foundations and Nonparametric Setting

Variance-adaptive CSs, particularly those of the “empirical-Bernstein” type, are grounded in minimal assumptions. The prototypical setup involves a sequence of real-valued random variables $X_t \in [a,b]$ , a predictable sequence of “predictions” $\hat X_t$ , and the observed filtration $\{\mathcal F_t\}$ . The only technical condition required is that the martingale difference sequence $Y_t = X_t - \mathbb E[X_t\,|\,\mathcal F_{t-1}]$ is almost surely bounded by $c = b-a$ (Howard et al., 2018).

The primary estimands are:

The mean process $\mu_t = t^{-1} \sum_{i=1}^t \mathbb E[X_i\,|\,\mathcal F_{i-1}]$
The variance process (empirical proxy) $V_t = \sum_{i=1}^t (X_i - \hat X_i)^2$

These sequences remain valid without independence, identical distribution, or strong tail assumptions.

2. Empirical-Bernstein (Variance-Adaptive) CS Construction

The empirical-Bernstein confidence sequence is built upon a self-normalization/martingale-exponential construction:

For all $\lambda \in [0, 1/c)$ ,

$\mathbb E \left[ \exp \left\{ \lambda \sum_{i=1}^t Y_i - \psi_{E,c}(\lambda) V_t \right\} \right] \leq 1$

where $\psi_{E,c}(\lambda) = c^{-2}(-\ln(1-c\lambda)-c\lambda)$ (Howard et al., 2018).

For any “subexponential” uniform boundary $u(v)$ ,

$\mathbb P\left( \sup_{t\geq1} |\widehat{\mu}_t - \mu_t| > u(V_t)/t \right) \leq 2\alpha$

where $\widehat{\mu}_t = t^{-1} \sum_{i=1}^t X_i$ , and $w_t = u(V_t)/t$ is the data-driven, variance-adaptive width.

A widely used closed-form instantiation is the “polynomial-stitched” boundary for $X_t \in [0,1]$ and coverage $1-2\alpha$ :

$C_t = \widehat{\mu}_t \pm \frac{1}{t}\left(1.7\sqrt{V_t\left[\log\log(2V_t)+3.8\right]} + 3.4\left[\log\log(2V_t)+3.8\right]\right)$

as given in Eq. (27) of (Howard et al., 2018).

3. Time-Uniform Coverage and LIL-Optimal Shrinkage

Variance-adaptive CSs provide time-uniform nonasymptotic coverage:

$\mathbb P\left(\forall t \ge 1: \mu_t \in [\widehat{\mu}_t \mp u(V_t)/t]\right) \ge 1-2\alpha$

The width $w_t$ adapts to observed variance and, for (sub-)i.i.d. data with variance $\sigma^2$ , $V_t \asymp \sigma^2 t$ , so:

$w_t \asymp \sqrt{\sigma^2 \log\log t / t}$
This matches the lower bound dictated by the law of the iterated logarithm (LIL) for uniform-in-time confidence intervals (Howard et al., 2018).

4. Comparison to Fixed-Variance and Other Adaptive CSs

A sub-Gaussian (fixed-variance) CS with worst-case variance $(b-a)^2/4$ produces

$|\widehat{\mu}_t-\mu| \le \frac{b-a}{2} \sqrt{\frac{2\log(1/\alpha)}{t}}$

which can be extremely conservative if the actual variance is small.

Empirical-Bernstein (variance-adaptive) CSs instead use the empirical $V_t$ , sharply tightening intervals when the process is low-variance. For Bernoulli-0.01 data, sub-Gaussian CSs can be $5\times$ wider than the empirical-Bernstein CS (Howard et al., 2018).

Variance-adaptive CSs have extensions for heavy-tailed and infinite-variance settings, such as Catoni-style CSs for known-variance or $p$ -th-moment bounds (Wang et al., 2022), and CSs integrating heavier-tailed nonnegativity constraints (Mineiro, 2022).

5. Methodological Extensions and Matrix Generalizations

Recent developments yield closed-form, mixture-based empirical-Bernstein CSs for both scalar and matrix means:

The latest closed-form variant (Chugg et al., 24 Dec 2025) constructs, for $X_t \in [0,1]$ ,

$V_t = \sum_{i=1}^t \psi_E(|X_i-\hat X_i|), \quad \psi_E(\lambda) = -\ln(1-\lambda) - \lambda$

and defines the width

$W_t = \frac{2}{t} \sqrt{U_t\left(\ell_\alpha + \frac{1}{2}\ln(2U_t)\right)}$

with $U_t = 1/(2\kappa^2) + V_t$ , and $\ell_\alpha$ an explicit log factor.

For a sequence of symmetric matrices $X_t$ with bounded eigenvalues, the same polynomial structure yields a CS for the maximal eigenvalue deviation:

$|\gamma_{\max}(\bar X_t - M_t)| \le W_t$

where $V_t = \sum \psi_E(|X_i - \hat X_i|)$ (matrix norm), and $M_t = t^{-1} \sum_{i=1}^t \mathbb E_{i-1}[X_i]$ (Chugg et al., 24 Dec 2025).

A key property of these new CSs is that, in the constant-mean, i.i.d. regime, the limiting width scaled by $\sqrt{t/\log t}$ is independent of the confidence level $\alpha$ —a provable improvement over previous closed-form solutions.

6. Applications and Empirical Performance

Variance-adaptive CSs are widely applicable:

Covariance matrix estimation
Sample average treatment effect inference under the Neyman-Rubin potential outcomes model
Bandit algorithms and A/B testing with continuous monitoring
Adaptive and safe inference in reinforcement learning and online learning
Sampling without replacement, yielding substantial improvements when the sample variance is much less than the worst-case variance of the population (Waudby-Smith et al., 2020)
Linear bandits, where variance-adaptive CSs are used to build ellipsoidal confidence sets for $\theta^*$ with widths scaling to the sum of observed conditional variances (Jun et al., 2024)

Empirical studies (Chugg et al., 24 Dec 2025) show these CSs achieve or outperform previous variance-adaptive CSs and maintain coverage over a time horizon of up to $10^6$ samples. Performance is especially superior in low-variance, nonstationary, or time-varying mean settings.

7. Theoretical and Practical Implications

Variance-adaptive CSs represent a sharp advance in anytime valid inference, combining:

Time-uniform coverage with LIL-optimal shrinking
Fully nonparametric applicability, using data-driven variance proxies
The ability to handle non-i.i.d., martingale-dependent, and heavy-tailed settings (with appropriate extensions)
Closed-form, practically implementable expressions (e.g., the latest mixture-Bernstein CS (Chugg et al., 24 Dec 2025))
A robust foundation in mixture-based or self-normalized martingale concentration, often using the methods of mixture martingales, Ville's inequality, and polynomial “stitching”

Their flexibility and optimality have positioned variance-adaptive CSs as standard primitives in modern sequential estimation, especially as uncertainty quantification tools in high-frequency, online, or nonstationary environments (Howard et al., 2018, Chugg et al., 24 Dec 2025, Wang et al., 2022, Mineiro, 2022, Waudby-Smith et al., 2020, Jun et al., 2024).