Blinded Sample Size Re-estimation

Updated 4 February 2026

Blinded sample size re-estimation is a design adaptation method that recalculates study size using pooled variance estimates without revealing treatment effects.
The method relies on interim nuisance parameter estimates to optimize sample size, thereby reducing the risk of underpowered or overpowered trials.
It is applied in various trial settings—including superiority, non-inferiority, and multi-arm designs—with pre-specified protocols and simulation to ensure type I error control.

Blinded sample size re-estimation (BSSR) is a mid-trial design adaptation mechanism allowing the sample size of a clinical study to be reassessed based on updated estimates of nuisance parameters, usually the variance, while maintaining allocation concealment of group labels and treatment effects. BSSR circumvents the risk of underpowered or overpowered studies due to misspecification of nuisance parameters at the planning stage, by leveraging pooled or otherwise blinded data at interim, without conducting any formal treatment comparison. This practice is distinguished from unblinded SSR, where interim treatment effects or group labels are revealed, often resulting in operational bias or severe inflation of type I error rates if not properly controlled.

1. Key Principles and Statistical Framework

The central tenet of BSSR is that sample size adaptations are made solely on interim estimates of nuisance parameters (e.g., outcome variance or event rates), which are obtained from pooled data, with the randomization code or group allocation concealed from those performing the SSR. The decision to increase or decrease the future sample size (e.g., from $n_1$ to $n_1 + n_2$ ) is therefore independent of any observed treatment effect and, under ideal implementation, preserves the integrity of type I error rate control in subsequent hypothesis testing (Glimm et al., 2013).

Typical scenarios where BSSR is essential:

The primary endpoint’s variance (or risk parameters for binary/event outcomes) is unknown or highly uncertain at the study design stage.
Reliable external estimates are unavailable.
Covariate adjustment or multi-arm/multi-population designs further complicate variance estimation.

BSSR is carried out by calculating an interim variance estimate (often “one-sample”/pooled), re-computing the required total sample size with this value, and completing recruitment accordingly. At no point during the BSSR is any group label or interim treatment effect unmasked.

2. Formal Procedures and Core Methodologies

Canonical Two-Arm or One-Sample Setting

For a superiority trial testing $H_0: \mu_1 - \mu_0 = 0$ vs $H_1: \mu_1 - \mu_0 = \delta$ at one-sided $\alpha$ and power $1-\beta$ , the re-estimated total sample size at interim (after $n_{\text{int}}$ subjects) using the blinded sample variance $\widehat{\sigma}^2$ is: $\widehat{n}_{\text{fin}} = \frac{(z_\alpha + z_{1-\beta})^2}{\delta^2} \widehat{\sigma}^2$ This $\widehat{\sigma}^2$ is computed from all interim data, without reference to group assignment: $\widehat{\sigma}_{OS}^2 = \frac{1}{n_{\text{int}} - 1} \sum_{i=1}^{n_{\text{int}}} (Y_i - \overline{Y}_{\text{int}})^2$ where $Y_i$ pools both/all arms (Glimm et al., 2013, Maeda et al., 3 Feb 2026).

Refinements include using an upper confidence limit (UCL) for $\sigma^2$ to guard against underpowered designs, especially when $n_{\text{int}}$ is small: $\overline{\sigma}^2_{U,1-\gamma} = \widehat{\sigma}_{OS}^2 \cdot \frac{n_{\text{int}}-1}{d_{1-\gamma}}$ with $d_{1-\gamma}$ the $(1-\gamma)$ quantile of the $\chi^2_{n_{\text{int}}-1}$ distribution (Maeda et al., 3 Feb 2026).

Beyond the Two-Arm Design

BSSR methodologies have been extended to:

Three-arm "gold standard" trials (with non-inferiority and superiority margins), using specialized unbiased variance estimators such as the Xing–Ganju method and precomputed inflation factors for conservative power protection (Mütze et al., 2016).
ANCOVA settings with multiple covariates—re-estimating the pooled residual variance of blinded regressions for adjusted sample size calculation without distributional assumptions (Zimmermann et al., 2018, Kanata et al., 26 Aug 2025).
Crossover trials, using within- and between-patient variance components computed from blinded, period-balanced block randomization (Grayling et al., 2018).
Multi-composite or subpopulation analyses, where blinded residuals from stratified models inform re-estimation in complex closed-testing procedures (Gera et al., 2020).
Cluster-randomized stepped wedge trials, where cluster and individual variance components are derived from pooled data to update the per-cluster-period sample size (Grayling et al., 2017).
Hybrid trials leveraging external controls via blinded inverse-probability weighting to re-estimate sample size as a function of measured distributional discrepancy (Kojima et al., 18 Jun 2025).

3. Error Control, Bias, and Small-Sample Properties

BSSR is designed to preserve the nominal type I error rate because the adaptation rule does not depend on the treatment effect or interim comparisons. In large samples, this is generally achieved, but non-negligible inflation occurs in specific settings:

Small-sample "borderline" cases: For $n_1=2$ (one-sample), even a purely variance-based BSSR can result in type I error slightly above nominal due to incomplete "probability mass subtraction" for overlapping rejection regions (Glimm et al., 2013).
Non-inferiority and equivalence testing: Here, BSSR (using the stage-1 pooled variance) can conditionally decrease stage-2 sample size when interim data favor equivalence/non-inferiority, leading to notable type I error inflation (up to several percent for $n_1<20$ ) (Glimm et al., 2013, Glimm et al., 2019).
Correction methods: Permutation/rotation testing, combination $p$ -value methods, and simulation-based calibration of critical values yield exact type I error in small-sample settings. Pre-specification of the BSSR protocol and interim-case simulation is uniformly recommended.

4. Best Practices, Extensions, and Practical Implementation

Superiority Trials

For $n_1 \gtrsim 10$ , naive variance-based BSSR followed by the classical $t$ -test preserves type I error to within 0.1% of nominal (Glimm et al., 2013, Maeda et al., 3 Feb 2026).
For $n_1 < 10$ or when small-sample error is intolerable, employ exact- $\alpha$ routines (permutation, combination, or numerically calibrated thresholds).
For highly adaptive internal pilots or efficient protection against underpower, UCL-based BSSR is preferred, especially for small interim sample sizes (Maeda et al., 3 Feb 2026).

Non-inferiority and Equivalence Settings

Avoid "blind" variance re-estimation when the sample size formula depends on the observed difference—this structurally unblinds the effect direction and inflates type I error (Glimm et al., 2013, Glimm et al., 2019).
Instead, rely on fixed- $N$ designs or fully pre-specified, alpha-controlled adaptive procedures with conditional error functions or combination testing.

Multigroup/Composite/Cluster/Crossover/Hybrid Control Designs

Use tailored blinded variance estimators:
- Block/rank-based methods for balanced crossover (Grayling et al., 2018).
- ANCOVA-residual estimation with appropriate small-sample corrections for multiple covariates (Kanata et al., 26 Aug 2025, Zimmermann et al., 2018).
- For clusters, solve two-moment equations from blinded pooled and cluster-wise within means (Grayling et al., 2017).
- In hybrid control, re-estimate using IPW-adjusted variances or w/o outcome (strict blinding), depending on operational and statistical requirements (Kojima et al., 18 Jun 2025).
Inflation factors (Zucker-type) or simulations may be required for extreme pilot sizes or high multiplicity to ensure adequate power (Mütze et al., 2016, Gera et al., 2020).
For composite binary/continuous endpoints, endpoint selection and variance/correlation estimation can be performed based purely on pooled data, followed by BSSR (Roig et al., 2022, Gera et al., 2020).

Continuous Information Monitoring

BSSR can be embedded in continuously monitored designs for Gaussian or recurrent outcome settings, using blinded updates of variance, event rate, or Fisher information. Proper calibration ensures maintenance of type I error and minimal loss of power. Mixture and lumping likelihoods enable robust estimation, especially under time trends for event counts (Xu et al., 29 Jul 2025, Mütze et al., 2019).

5. Simulation Evidence and Empirical Performance

Simulation studies across numerous trial structures demonstrate:

For moderate to large $n_1$ , type I error and power are maintained at the design values.
Very small $n_1$ imparts higher variance on re-estimated $N$ ; UCL- or inflation factor-based BSSR yields power close to target, albeit at the cost of increased average sample size (Maeda et al., 3 Feb 2026, Mütze et al., 2016).
In complex designs (subgroup, composite, or cluster), BSSR preserves strong familywise error control and power if the correct multiple testing correction and pilot-variance estimation scheme are applied (Gera et al., 2020, Grayling et al., 2017).
Over- or under-powering due to variance- or effect-misspecification at design is directly corrected by BSSR when properly implemented.

6. Recommendations and Limitations

Setting	BSSR Rule/Method	Error Control/Remarks
Superiority, $n_1 \geq 10$	Naive pooled variance or UCL	Inflation $<$ 0.1%
Superiority, $n_1 < 10$	Exact- $\alpha$ permutation/combination	Exact, simulation-backed
Non-inferiority/equivalence	Do not use simple blinded SSR	Severe type I error inflation
ANCOVA, covariate-rich	Pooled residual variance, ARE-based	Robust, distribution-free
Cluster/SW, crossover	Study-specific, block/contrast methods	Needs careful randomization
Hybrid-control, external	IPW-adjusted variance or weights-only	Protection against bias
Composite/multipopulation	Blinded ANCOVA residuals by subset	Strong FWER, power by design

Caveats:

Small interim sizes increase final $N$ variability.
All SSR rules and correction factors must be fully pre-specified in the protocol.
Non-inferiority/equivalence BSSR is generally contraindicated except with complex, alpha-protected methods.
Regulatory guidance uniformly favors blinding of all interim calculations.

7. Recent Advances and Outlook

Recent developments include UCL-based SSR for guaranteed power at small interim sizes (Maeda et al., 3 Feb 2026); composite and endpoint-adaptive procedures with blinded SSR and strong error control for binary and continuous endpoints (Roig et al., 2022, Gera et al., 2020); robust, nonparametric BSSR for ANCOVA under arbitrary distributional misspecification (Kanata et al., 26 Aug 2025); and continuous information monitoring via blinded estimation in Gaussian and count processes (Xu et al., 29 Jul 2025, Mütze et al., 2019). Across these frameworks, the commonality is that all parameter updates and decisions at interim rely exclusively on blinded, pooled, or stochastic summary data, ensuring operational bias is minimized and nominal statistical properties are preserved. As regulatory adoption continues, nuanced protocol pre-specification, and rigorous simulation-based operating characteristic assessment for each adaptation, are expected to become standard practice.