Papers
Topics
Authors
Recent
Search
2000 character limit reached

Blinded Sample Size Re-estimation

Updated 4 February 2026
  • Blinded sample size re-estimation is a design adaptation method that recalculates study size using pooled variance estimates without revealing treatment effects.
  • The method relies on interim nuisance parameter estimates to optimize sample size, thereby reducing the risk of underpowered or overpowered trials.
  • It is applied in various trial settings—including superiority, non-inferiority, and multi-arm designs—with pre-specified protocols and simulation to ensure type I error control.

Blinded sample size re-estimation (BSSR) is a mid-trial design adaptation mechanism allowing the sample size of a clinical study to be reassessed based on updated estimates of nuisance parameters, usually the variance, while maintaining allocation concealment of group labels and treatment effects. BSSR circumvents the risk of underpowered or overpowered studies due to misspecification of nuisance parameters at the planning stage, by leveraging pooled or otherwise blinded data at interim, without conducting any formal treatment comparison. This practice is distinguished from unblinded SSR, where interim treatment effects or group labels are revealed, often resulting in operational bias or severe inflation of type I error rates if not properly controlled.

1. Key Principles and Statistical Framework

The central tenet of BSSR is that sample size adaptations are made solely on interim estimates of nuisance parameters (e.g., outcome variance or event rates), which are obtained from pooled data, with the randomization code or group allocation concealed from those performing the SSR. The decision to increase or decrease the future sample size (e.g., from n1n_1 to n1+n2n_1 + n_2) is therefore independent of any observed treatment effect and, under ideal implementation, preserves the integrity of type I error rate control in subsequent hypothesis testing (Glimm et al., 2013).

Typical scenarios where BSSR is essential:

  • The primary endpoint’s variance (or risk parameters for binary/event outcomes) is unknown or highly uncertain at the study design stage.
  • Reliable external estimates are unavailable.
  • Covariate adjustment or multi-arm/multi-population designs further complicate variance estimation.

BSSR is carried out by calculating an interim variance estimate (often “one-sample”/pooled), re-computing the required total sample size with this value, and completing recruitment accordingly. At no point during the BSSR is any group label or interim treatment effect unmasked.

2. Formal Procedures and Core Methodologies

Canonical Two-Arm or One-Sample Setting

For a superiority trial testing H0:μ1μ0=0H_0: \mu_1 - \mu_0 = 0 vs H1:μ1μ0=δH_1: \mu_1 - \mu_0 = \delta at one-sided α\alpha and power 1β1-\beta, the re-estimated total sample size at interim (after nintn_{\text{int}} subjects) using the blinded sample variance σ^2\widehat{\sigma}^2 is: n^fin=(zα+z1β)2δ2σ^2\widehat{n}_{\text{fin}} = \frac{(z_\alpha + z_{1-\beta})^2}{\delta^2} \widehat{\sigma}^2 This σ^2\widehat{\sigma}^2 is computed from all interim data, without reference to group assignment: σ^OS2=1nint1i=1nint(YiYint)2\widehat{\sigma}_{OS}^2 = \frac{1}{n_{\text{int}} - 1} \sum_{i=1}^{n_{\text{int}}} (Y_i - \overline{Y}_{\text{int}})^2 where YiY_i pools both/all arms (Glimm et al., 2013, Maeda et al., 3 Feb 2026).

Refinements include using an upper confidence limit (UCL) for σ2\sigma^2 to guard against underpowered designs, especially when nintn_{\text{int}} is small: σU,1γ2=σ^OS2nint1d1γ\overline{\sigma}^2_{U,1-\gamma} = \widehat{\sigma}_{OS}^2 \cdot \frac{n_{\text{int}}-1}{d_{1-\gamma}} with d1γd_{1-\gamma} the (1γ)(1-\gamma) quantile of the χnint12\chi^2_{n_{\text{int}}-1} distribution (Maeda et al., 3 Feb 2026).

Beyond the Two-Arm Design

BSSR methodologies have been extended to:

  • Three-arm "gold standard" trials (with non-inferiority and superiority margins), using specialized unbiased variance estimators such as the Xing–Ganju method and precomputed inflation factors for conservative power protection (Mütze et al., 2016).
  • ANCOVA settings with multiple covariates—re-estimating the pooled residual variance of blinded regressions for adjusted sample size calculation without distributional assumptions (Zimmermann et al., 2018, Kanata et al., 26 Aug 2025).
  • Crossover trials, using within- and between-patient variance components computed from blinded, period-balanced block randomization (Grayling et al., 2018).
  • Multi-composite or subpopulation analyses, where blinded residuals from stratified models inform re-estimation in complex closed-testing procedures (Gera et al., 2020).
  • Cluster-randomized stepped wedge trials, where cluster and individual variance components are derived from pooled data to update the per-cluster-period sample size (Grayling et al., 2017).
  • Hybrid trials leveraging external controls via blinded inverse-probability weighting to re-estimate sample size as a function of measured distributional discrepancy (Kojima et al., 18 Jun 2025).

3. Error Control, Bias, and Small-Sample Properties

BSSR is designed to preserve the nominal type I error rate because the adaptation rule does not depend on the treatment effect or interim comparisons. In large samples, this is generally achieved, but non-negligible inflation occurs in specific settings:

  • Small-sample "borderline" cases: For n1=2n_1=2 (one-sample), even a purely variance-based BSSR can result in type I error slightly above nominal due to incomplete "probability mass subtraction" for overlapping rejection regions (Glimm et al., 2013).
  • Non-inferiority and equivalence testing: Here, BSSR (using the stage-1 pooled variance) can conditionally decrease stage-2 sample size when interim data favor equivalence/non-inferiority, leading to notable type I error inflation (up to several percent for n1<20n_1<20) (Glimm et al., 2013, Glimm et al., 2019).
  • Correction methods: Permutation/rotation testing, combination pp-value methods, and simulation-based calibration of critical values yield exact type I error in small-sample settings. Pre-specification of the BSSR protocol and interim-case simulation is uniformly recommended.

4. Best Practices, Extensions, and Practical Implementation

Superiority Trials

  • For n110n_1 \gtrsim 10, naive variance-based BSSR followed by the classical tt-test preserves type I error to within 0.1% of nominal (Glimm et al., 2013, Maeda et al., 3 Feb 2026).
  • For n1<10n_1 < 10 or when small-sample error is intolerable, employ exact-α\alpha routines (permutation, combination, or numerically calibrated thresholds).
  • For highly adaptive internal pilots or efficient protection against underpower, UCL-based BSSR is preferred, especially for small interim sample sizes (Maeda et al., 3 Feb 2026).

Non-inferiority and Equivalence Settings

  • Avoid "blind" variance re-estimation when the sample size formula depends on the observed difference—this structurally unblinds the effect direction and inflates type I error (Glimm et al., 2013, Glimm et al., 2019).
  • Instead, rely on fixed-NN designs or fully pre-specified, alpha-controlled adaptive procedures with conditional error functions or combination testing.

Multigroup/Composite/Cluster/Crossover/Hybrid Control Designs

Continuous Information Monitoring

  • BSSR can be embedded in continuously monitored designs for Gaussian or recurrent outcome settings, using blinded updates of variance, event rate, or Fisher information. Proper calibration ensures maintenance of type I error and minimal loss of power. Mixture and lumping likelihoods enable robust estimation, especially under time trends for event counts (Xu et al., 29 Jul 2025, Mütze et al., 2019).

5. Simulation Evidence and Empirical Performance

Simulation studies across numerous trial structures demonstrate:

  • For moderate to large n1n_1, type I error and power are maintained at the design values.
  • Very small n1n_1 imparts higher variance on re-estimated NN; UCL- or inflation factor-based BSSR yields power close to target, albeit at the cost of increased average sample size (Maeda et al., 3 Feb 2026, Mütze et al., 2016).
  • In complex designs (subgroup, composite, or cluster), BSSR preserves strong familywise error control and power if the correct multiple testing correction and pilot-variance estimation scheme are applied (Gera et al., 2020, Grayling et al., 2017).
  • Over- or under-powering due to variance- or effect-misspecification at design is directly corrected by BSSR when properly implemented.

6. Recommendations and Limitations

Setting BSSR Rule/Method Error Control/Remarks
Superiority, n110n_1 \geq 10 Naive pooled variance or UCL Inflation << 0.1%
Superiority, n1<10n_1 < 10 Exact-α\alpha permutation/combination Exact, simulation-backed
Non-inferiority/equivalence Do not use simple blinded SSR Severe type I error inflation
ANCOVA, covariate-rich Pooled residual variance, ARE-based Robust, distribution-free
Cluster/SW, crossover Study-specific, block/contrast methods Needs careful randomization
Hybrid-control, external IPW-adjusted variance or weights-only Protection against bias
Composite/multipopulation Blinded ANCOVA residuals by subset Strong FWER, power by design

Caveats:

  • Small interim sizes increase final NN variability.
  • All SSR rules and correction factors must be fully pre-specified in the protocol.
  • Non-inferiority/equivalence BSSR is generally contraindicated except with complex, alpha-protected methods.
  • Regulatory guidance uniformly favors blinding of all interim calculations.

7. Recent Advances and Outlook

Recent developments include UCL-based SSR for guaranteed power at small interim sizes (Maeda et al., 3 Feb 2026); composite and endpoint-adaptive procedures with blinded SSR and strong error control for binary and continuous endpoints (Roig et al., 2022, Gera et al., 2020); robust, nonparametric BSSR for ANCOVA under arbitrary distributional misspecification (Kanata et al., 26 Aug 2025); and continuous information monitoring via blinded estimation in Gaussian and count processes (Xu et al., 29 Jul 2025, Mütze et al., 2019). Across these frameworks, the commonality is that all parameter updates and decisions at interim rely exclusively on blinded, pooled, or stochastic summary data, ensuring operational bias is minimized and nominal statistical properties are preserved. As regulatory adoption continues, nuanced protocol pre-specification, and rigorous simulation-based operating characteristic assessment for each adaptation, are expected to become standard practice.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Blinded Sample Size Re-estimation.