Finite Sample Minimax Regret Rules

Updated 8 January 2026

The paper presents a framework that minimizes worst-case regret in treatment selection by comparing observed outcomes to an oracle benchmark.
It employs a two-stage Neyman allocation method to allocate samples based on variance estimates, ensuring robust treatment recommendations under finite data.
Key extensions include adaptations for partial identification, nonlinear regret criteria, and nonparametric settings, enhancing practical applicability in diverse settings.

Finite sample minimax regret treatment rules provide a principled and rigorous foundation for treatment selection under uncertainty, focusing on procedures that minimize the worst-case regret—the maximal expected welfare loss compared to an oracle assignment—after observing finite experimental data. This framework is central to decision theory in statistics and economics, where regret serves as a robust metric for evaluative performance, especially when exact identification is unattainable or probabilistic models imperfectly describe experimental noise and treatment heterogeneity.

1. Formal Problem Definition and Regret Criteria

A finite sample minimax regret treatment rule is defined for a fixed, finite experiment, after which a policymaker must select among competing interventions using observed data. Consider binary treatments $D \in \{0, 1\}$ . Potential outcomes $Y_0, Y_1$ have unknown means $\mu_0, \mu_1$ (possibly belonging to a parameter space $\Theta \subset \mathbb{R}$ ) and bounded variances. The experiment consists of $N$ subjects, each assigned treatments according to a pre-specified or adaptive rule, with outcomes observed sequentially. Upon completion, the policymaker selects an assignment rule $\widehat{d}$ (possibly randomized) informed by the data.

Regret for parameter values $(\mu_0, \mu_1)$ is

$\mathrm{Regret}^\delta(N; \mu_0, \mu_1) = \mathbb{E}[\mu_{d^*} - \mu_{\widehat{d}}],$

where $d^* = \arg \max_{d} \mu_d$ . The minimax regret $R_N(\delta)$ is

$R_N(\delta) = \sup_{(\mu_0, \mu_1) \in \Theta^2} \mathrm{Regret}^\delta(N; \mu_0, \mu_1).$

The objective is to design $\delta$ such that $R_N(\delta)$ is minimized, thus ensuring optimal performance even against worst-case parameter configurations (Kato, 9 Dec 2025).

2. Minimax Lower Bounds: Information-Theoretic Limits

Tight lower bounds for minimax regret are derived via change-of-measure arguments. For regular outcomes (sub-Gaussian or exponential-family tails, compact parameter space), no adaptive rule achieves lower regret asymptotically than $O(N^{-1/2})$ with the exact constant $(\sigma_1 + \sigma_0)\Phi(-1)$ , where $\sigma_d$ is the standard deviation of outcomes under treatment $d$ and $\Phi$ is the standard normal CDF: $\inf_\delta \, \liminf_{N \to \infty} \sqrt{N} \, R_N(\delta) \geq (\overline{\sigma}_1 + \overline{\sigma}_0)\Phi(-1).$ This is constructed by local alternatives and carefully crafted likelihood bounds (Kaufmann et al.), optimizing over assignment fractions proportional to variance (Kato, 9 Dec 2025).

3. Minimax Regret Rule Construction: Two-Stage Neyman Allocation and Extensions

The canonical finite-sample minimax regret-optimal design is the two-stage Neyman allocation (TSNA):

Stage 1 (Exploration): Allocate $rN/2$ samples to each treatment for variance estimation.
Stage 2 (Adaptive Targeting): Assign remaining $N-rN$ samples in proportion to the estimated standard deviations:

$\widehat{w} = \frac{\widehat{\sigma}_1}{\widehat{\sigma}_1 + \widehat{\sigma}_0}$

Each unit is assigned treatment $D_t \sim \mathrm{Bernoulli}(\widehat{w})$ .

Decision Phase: Select the treatment with the larger sample mean as the final recommendation.

The regret under TSNA is tightly upper bounded through martingale central limit and large deviation arguments: $\mathrm{Regret}^{\mathrm{TSNA}}(N) \leq \frac{(\sigma_1 + \sigma_0)\Phi(-1)}{\sqrt{N}} + o(N^{-1/2}),$ matching the proven lower bound exactly. The design is robust to both sub-Gaussian noise and moderate deviations. Extensions to $K > 2$ treatments involve allocation fractions $w_d \propto \sigma_d$ with analogous bounds, though constant matching is more intricate (Kato, 9 Dec 2025).

4. Regret Under Partial Identification and Restricted Objective Functionals

Partial identification complicates regret assessment: when the state space $\Theta$ and corresponding outcome map $m(\theta)$ cannot fully identify welfare contrasts $U(\theta)$ , minimax regret rules may become highly non-unique or randomizing.

In severely unidentified settings, all measurable rules are welfare-admissible; maximin welfare rules ignore the data entirely (Olea et al., 2023).
For Gaussian models with centrosymmetric-convex $\Theta$ , the minimax-regret rule is a data-dependent randomizing function, uniquely characterized (among minimax-optimal rules) by a least-randomizing piecewise-linear cdf (Olea et al., 2023).
Alternative objectives, such as quantiles, can trivialize minimax regret: in key designs, any rule is minimax-optimal and the criterion cannot discriminate among them (Guggenberger et al., 6 Jan 2026).

Table: Minimax Regret Rule Characterizations Under Partial Identification

Identification	Minimax Rule	Randomization	Admissibility
Full	Threshold	None	Unique
Partial	Piecewise cdf	Sometimes	Infinite family
Quantile Obj.	Any rule	Possible	All optimal

5. Nonlinear Regret Criteria and Fractional Treatment Rules

Generalizing beyond linear expected regret, minimax treatment rules can be formulated for nonlinear transformations, such as mean-square regret: $R_{sq}(\hat{\delta}, P) = \tau^2 \mathbb{E}_{P^n}\left[(\mathbf{1}\{\tau \geq 0\} - \hat{\delta})^2\right]$ where $\tau = \mu_1 - \mu_0$ . The minimax rule in the normal experiment is a smooth logistic function of the sample mean or $t$ -statistic, quantifying strength of evidence and controlling both bias and welfare variance. Explicit formula: $\hat{\delta}^*(\bar{y}) = \frac{\exp(2\tau^*(\sqrt{n}/\sigma)\bar{y})}{1 + \exp(2\tau^*(\sqrt{n}/\sigma)\bar{y})}, \quad \tau^* \approx 1.23$ The empirical success rule is inadmissible under mean-square regret, being superseded by fractional/continuous treatment assignment probabilities that attenuate sensitivity to uncertainty (Kitagawa et al., 2022).

6. Numerical Methods and Nonparametric Extensions

For large, high-dimensional or nonparametric parameter spaces, direct minimax regret rule construction is computationally infeasible. Numerical methods based on fictitious play provide robust approximations:

Discretize the state space for nature’s action, enforcing uniform convergence.
Iteratively apply best-response updates (policymaker vs. nature), adaptively refining the rule.
“Coarsening” procedures adapt binary-outcome algorithms to continuous support via independent Bernoulli resampling (Guggenberger et al., 13 Mar 2025).

For kernel regression with binary outcomes and Lipschitz constraints, minimax regret bandwidth selection collapses to a two-dimensional optimization over (treated, control) bandwidths: $(h_1^*, h_0^*) = \arg \min_{h_1, h_0 > 0} R_{\max}(h_1, h_0)$ where worst-case regret is maximized over extremal outcome profiles, reducing computational complexity and allowing tractable grid or stochastic search (Ishihara, 2023).

7. Practical Considerations, Extensions, and Limitations

Assumptions typically include bounded variances, compact parameter spaces, and sub-Gaussian or exponential-family tail behavior. TSNA and related rules are agnostic to parametric form except for variance estimation, and can be nonparametric under tail assumptions. Extensions encompass:

$K > 2$ treatments with multivariate Neyman allocations.
Partial identification and model misspecification, where randomization may be required.
Bandwidth optimization under nonparametric regression for binary or continuous outcomes.
Quantile-based rules, where the regret criterion is non-discriminatory absent additional identification.

Limitations include non-uniqueness and lack of informativeness for quantile objectives in finite samples, computational burden for high-dimensional minimax regret rule construction, and difficulties in consistent variance estimation under severe partial identification. While Bayesian optimality can be established under regular priors, worst-case (minimax) rules provide universal guarantees at the expense of potentially conservative recommendations.

In all settings, finite sample minimax regret treatment rules constitute the sharpest boundaries for decision-making under ambiguity, balancing exploration and exploitation and precisely quantifying the irreducible risk in assigning treatments based on finite data. They remain the benchmark for optimal treatment choice in statistical decision theory and experimental design (Kato, 9 Dec 2025, Olea et al., 2023, Kitagawa et al., 2022, Ishihara, 2023, Guggenberger et al., 13 Mar 2025, Guggenberger et al., 6 Jan 2026, Yata, 2021).