Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variance-Adaptive Confidence Sequences

Updated 25 December 2025
  • Variance-adaptive CSs are sequences of confidence intervals that adapt their width using the cumulative empirical variance, ensuring nonasymptotic and time-uniform coverage.
  • They leverage self-normalization and martingale exponential techniques to achieve optimal shrinking rates, matching limits dictated by the law of the iterated logarithm.
  • These methods are extendable to handle heavy-tailed data, matrix estimations, and adaptive online inference in settings like bandit algorithms and reinforcement learning.

A variance-adaptive confidence sequence (CS) is a sequence of confidence intervals for an online, possibly non-i.i.d., stochastic process, whose width adapts at each time tt to the empirical variance accumulated so far. Such sequences provide nonasymptotic, nonparametric, and time-uniform coverage guarantees, meaning the probability of ever excluding the true quantity of interest across all times is controlled at a prescribed level. Variance-adaptive CSs generalize classical fixed-variance (sub-Gaussian) boundaries, achieve optimal shrinking rates (including the iterated logarithm law), and have been extended to settings such as matrix mean estimation, heavy-tailed data, and sampling without replacement.

1. Foundations and Nonparametric Setting

Variance-adaptive CSs, particularly those of the “empirical-Bernstein” type, are grounded in minimal assumptions. The prototypical setup involves a sequence of real-valued random variables Xt[a,b]X_t \in [a,b], a predictable sequence of “predictions” X^t\hat X_t, and the observed filtration {Ft}\{\mathcal F_t\}. The only technical condition required is that the martingale difference sequence Yt=XtE[XtFt1]Y_t = X_t - \mathbb E[X_t\,|\,\mathcal F_{t-1}] is almost surely bounded by c=bac = b-a (Howard et al., 2018).

The primary estimands are:

  • The mean process μt=t1i=1tE[XiFi1]\mu_t = t^{-1} \sum_{i=1}^t \mathbb E[X_i\,|\,\mathcal F_{i-1}]
  • The variance process (empirical proxy) Vt=i=1t(XiX^i)2V_t = \sum_{i=1}^t (X_i - \hat X_i)^2

These sequences remain valid without independence, identical distribution, or strong tail assumptions.

2. Empirical-Bernstein (Variance-Adaptive) CS Construction

The empirical-Bernstein confidence sequence is built upon a self-normalization/martingale-exponential construction:

  • For all λ[0,1/c)\lambda \in [0, 1/c),

E[exp{λi=1tYiψE,c(λ)Vt}]1\mathbb E \left[ \exp \left\{ \lambda \sum_{i=1}^t Y_i - \psi_{E,c}(\lambda) V_t \right\} \right] \leq 1

where ψE,c(λ)=c2(ln(1cλ)cλ)\psi_{E,c}(\lambda) = c^{-2}(-\ln(1-c\lambda)-c\lambda) (Howard et al., 2018).

For any “subexponential” uniform boundary u(v)u(v),

P(supt1μ^tμt>u(Vt)/t)2α\mathbb P\left( \sup_{t\geq1} |\widehat{\mu}_t - \mu_t| > u(V_t)/t \right) \leq 2\alpha

where μ^t=t1i=1tXi\widehat{\mu}_t = t^{-1} \sum_{i=1}^t X_i, and wt=u(Vt)/tw_t = u(V_t)/t is the data-driven, variance-adaptive width.

A widely used closed-form instantiation is the “polynomial-stitched” boundary for Xt[0,1]X_t \in [0,1] and coverage 12α1-2\alpha:

Ct=μ^t±1t(1.7Vt[loglog(2Vt)+3.8]+3.4[loglog(2Vt)+3.8])C_t = \widehat{\mu}_t \pm \frac{1}{t}\left(1.7\sqrt{V_t\left[\log\log(2V_t)+3.8\right]} + 3.4\left[\log\log(2V_t)+3.8\right]\right)

as given in Eq. (27) of (Howard et al., 2018).

3. Time-Uniform Coverage and LIL-Optimal Shrinkage

Variance-adaptive CSs provide time-uniform nonasymptotic coverage:

P(t1:μt[μ^tu(Vt)/t])12α\mathbb P\left(\forall t \ge 1: \mu_t \in [\widehat{\mu}_t \mp u(V_t)/t]\right) \ge 1-2\alpha

The width wtw_t adapts to observed variance and, for (sub-)i.i.d. data with variance σ2\sigma^2, Vtσ2tV_t \asymp \sigma^2 t, so:

  • wtσ2loglogt/tw_t \asymp \sqrt{\sigma^2 \log\log t / t}
  • This matches the lower bound dictated by the law of the iterated logarithm (LIL) for uniform-in-time confidence intervals (Howard et al., 2018).

4. Comparison to Fixed-Variance and Other Adaptive CSs

A sub-Gaussian (fixed-variance) CS with worst-case variance (ba)2/4(b-a)^2/4 produces

μ^tμba22log(1/α)t|\widehat{\mu}_t-\mu| \le \frac{b-a}{2} \sqrt{\frac{2\log(1/\alpha)}{t}}

which can be extremely conservative if the actual variance is small.

Empirical-Bernstein (variance-adaptive) CSs instead use the empirical VtV_t, sharply tightening intervals when the process is low-variance. For Bernoulli-0.01 data, sub-Gaussian CSs can be 5×5\times wider than the empirical-Bernstein CS (Howard et al., 2018).

Variance-adaptive CSs have extensions for heavy-tailed and infinite-variance settings, such as Catoni-style CSs for known-variance or pp-th-moment bounds (Wang et al., 2022), and CSs integrating heavier-tailed nonnegativity constraints (Mineiro, 2022).

5. Methodological Extensions and Matrix Generalizations

Recent developments yield closed-form, mixture-based empirical-Bernstein CSs for both scalar and matrix means:

Vt=i=1tψE(XiX^i),ψE(λ)=ln(1λ)λV_t = \sum_{i=1}^t \psi_E(|X_i-\hat X_i|), \quad \psi_E(\lambda) = -\ln(1-\lambda) - \lambda

and defines the width

Wt=2tUt(α+12ln(2Ut))W_t = \frac{2}{t} \sqrt{U_t\left(\ell_\alpha + \frac{1}{2}\ln(2U_t)\right)}

with Ut=1/(2κ2)+VtU_t = 1/(2\kappa^2) + V_t, and α\ell_\alpha an explicit log factor.

  • For a sequence of symmetric matrices XtX_t with bounded eigenvalues, the same polynomial structure yields a CS for the maximal eigenvalue deviation:

γmax(XˉtMt)Wt|\gamma_{\max}(\bar X_t - M_t)| \le W_t

where Vt=ψE(XiX^i)V_t = \sum \psi_E(|X_i - \hat X_i|) (matrix norm), and Mt=t1i=1tEi1[Xi]M_t = t^{-1} \sum_{i=1}^t \mathbb E_{i-1}[X_i] (Chugg et al., 24 Dec 2025).

A key property of these new CSs is that, in the constant-mean, i.i.d. regime, the limiting width scaled by t/logt\sqrt{t/\log t} is independent of the confidence level α\alpha—a provable improvement over previous closed-form solutions.

6. Applications and Empirical Performance

Variance-adaptive CSs are widely applicable:

  • Covariance matrix estimation
  • Sample average treatment effect inference under the Neyman-Rubin potential outcomes model
  • Bandit algorithms and A/B testing with continuous monitoring
  • Adaptive and safe inference in reinforcement learning and online learning
  • Sampling without replacement, yielding substantial improvements when the sample variance is much less than the worst-case variance of the population (Waudby-Smith et al., 2020)
  • Linear bandits, where variance-adaptive CSs are used to build ellipsoidal confidence sets for θ\theta^* with widths scaling to the sum of observed conditional variances (Jun et al., 2024)

Empirical studies (Chugg et al., 24 Dec 2025) show these CSs achieve or outperform previous variance-adaptive CSs and maintain coverage over a time horizon of up to 10610^6 samples. Performance is especially superior in low-variance, nonstationary, or time-varying mean settings.

7. Theoretical and Practical Implications

Variance-adaptive CSs represent a sharp advance in anytime valid inference, combining:

  • Time-uniform coverage with LIL-optimal shrinking
  • Fully nonparametric applicability, using data-driven variance proxies
  • The ability to handle non-i.i.d., martingale-dependent, and heavy-tailed settings (with appropriate extensions)
  • Closed-form, practically implementable expressions (e.g., the latest mixture-Bernstein CS (Chugg et al., 24 Dec 2025))
  • A robust foundation in mixture-based or self-normalized martingale concentration, often using the methods of mixture martingales, Ville's inequality, and polynomial “stitching”

Their flexibility and optimality have positioned variance-adaptive CSs as standard primitives in modern sequential estimation, especially as uncertainty quantification tools in high-frequency, online, or nonstationary environments (Howard et al., 2018, Chugg et al., 24 Dec 2025, Wang et al., 2022, Mineiro, 2022, Waudby-Smith et al., 2020, Jun et al., 2024).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variance-Adaptive Confidence Sequence (CS).