Fixed-Confidence Setting (FC) in Bandits

Updated 5 February 2026

Fixed-Confidence Setting (FC) is a framework that guarantees (ε,δ)-PAC optimality by adaptively sampling until the confidence criterion is met.
It uses change-of-measure techniques and likelihood ratio stopping rules to establish tight information-theoretic lower bounds on sample complexity.
FC algorithms optimize adaptive allocation strategies in diverse bandit models, including infinite-armed, linear, and multi-objective settings.

The fixed-confidence setting (FC) in pure exploration and best-arm identification problems refers to sequential strategies that guarantee, with probability at least $1-\delta$ , the identification of an arm or set of arms that is (approximately) optimal according to a prescribed criterion, while seeking to minimize expected or high-probability sample complexity. These strategies adaptively allocate samples and stop as soon as the confidence criterion is satisfied, differentiating the FC setting from the fixed-budget regime where the sample size is fixed in advance. The FC framework is foundational in bandit theory, addressed for a variety of structured and unstructured bandit models, including infinite-armed bandits, linear bandits, Markovian models, structured payoff maps, and modern risk-averse and multi-objective contexts.

1. Formal Definition, PAC Criterion, and General Model

The core of the fixed-confidence setting is the $(\epsilon,\delta)$ -PAC guarantee: for given confidence parameter $\delta\in(0,1)$ and error tolerance $\epsilon\ge 0$ , a sequential pure-exploration policy must halt almost surely and with probability at least $1-\delta$ output an arm (or set) meeting the desired performance criterion.

In the classical $K$ -armed bandit, each arm $a\in[K]$ is equipped with an unknown mean reward $\mu_a$ . The goal in best-arm identification is to select arm $\hat a$ so that

$\mathbb{P}\left(\hat a \in \mathcal{I}_\epsilon(\mu)\right) \ge 1-\delta$

where $\mathcal{I}_\epsilon(\mu) = \left\{ a\ |\ \mu_{a^*} - \mu_a \le \epsilon\right\}$ and $a^* = \arg\max_a \mu_a$ . The expected sample complexity $\mathbb{E}[\tau]$ (or high-probability counterparts) is the central performance guarantee. In generalized settings, this is extended to:

top- $m$ arms,
arms in structured or infinite sets,
systems defined via bi-level or multi-objective optimization,
Pareto-optimality or mean–variance tradeoffs,
identity or combinatorial constraints.

The FC requirement is often termed $\delta$ -PAC (Probably Approximately Correct), and all sample complexity results are fundamentally lower-bounded by $\Omega(\log(1/\delta))$ scaling in the high-confidence regime (Garivier et al., 2016, Aziz et al., 2018).

2. Information-Theoretic Lower Bounds

Lower bounds on $\mathbb{E}[\tau]$ in the FC setting are obtained by change-of-measure arguments that relate the identification error probability to the empirical divergence between the true model and carefully constructed alternatives. The general form is: $\mathbb{E}_\mu[\tau] \geq \mathcal{H}(\mu)^{-1} \cdot \log\left(\frac{1}{c\delta}\right)$ where the “complexity” $\mathcal{H}(\mu)$ is the solution to a max-min optimization involving optimal sampling proportions and Kullback–Leibler (KL) divergences between distributions. For the $K$ -armed case with unique best arm (Garivier et al., 2016, Truong, 21 May 2025): $\mathcal{H}(\mu)^{-1} = \sup_{w\in\Delta_K} \inf_{\lambda:\, a^*(\lambda) \neq a^*(\mu)} \sum_a w_a\, KL(\mu_a, \lambda_a)$ For extensions:

Infinite-armed/reservoir models: multiplicative integrals over unknown arm distributions (Aziz et al., 2018).
Structured, Markovian, linear, and bi-level settings: complexities embed linear/convex constraints or model misspecification (Réda et al., 2021, Wang et al., 17 Jan 2025, Moulos, 2019).
Risk-averse (mean–variance): lower bound scales in $1/\varepsilon^2$ in Pareto identification (Nonaga et al., 27 Jun 2025).
Multi-objective: max–min allocations across objectives (Chen et al., 23 Jan 2025).
Nontrivial cases such as multiple optima require refined lower bounds that distinguish between identification of all optimal arms or just any one (Truong, 21 May 2025).

In all settings, the leading order is at least proportional to $\log(1/\delta)$ ; the constant reflects intrinsic statistical hardness (reward gaps, dimension, model structure).

3. Algorithmic Paradigms and Sample Complexity Upper Bounds

Optimal algorithms in the FC setting combine adaptive allocation with explicit or implicit estimators of sampling proportions, and invoke stopping rules based on GLR-type or likelihood ratio statistics. Key paradigms:

Track-and-Stop methods: Track the optimal allocation weights $w^*(\mu)$ (solution to the lower-bound optimizer) by plug-in estimates, with forced exploration for robustness. Stop when min-max likelihood ratios exceed threshold $\beta(t,\delta)$ (Garivier et al., 2016, Truong, 21 May 2025, Moulos, 2019, Lazzaro et al., 11 Jul 2025).
Top-Two and Successive Elimination: Top-Two sampling rules maintain allocations between empirical leader and challenger, with confidence intervals dictating elimination or early stopping (Jourdan et al., 2023, Azizi et al., 2021, Jang et al., 2024).
LUCB-type: Maintain lower/upper confidence bounds, eliminate arms whose upper confidence is too low; adapted for structured settings (Huang et al., 2017, Aziz et al., 2018, Zhong et al., 2020).
Multi-phase and Pruning-Optimization: Multi-resolution elimination with confidence and tolerance splitting for efficient bi-level optimization (Wang et al., 17 Jan 2025).
Specialized algorithms: For misspecified linear bandits, surrogate proportion tracking (e.g., MO-BAI) avoids real-time max-min computation (Réda et al., 2021, Chen et al., 23 Jan 2025).

Explicit upper bounds match lower bounds up to subpolynomial (typically logarithmic) factors. For example, two-phase procedures in infinite bandit models achieve

$\mathbb{E}[\tau] \le C \overline{H}_{\alpha,\epsilon} \log^2(1/\delta)$

where $C$ is explicit, and only a logarithmic factor in $\delta$ separates from the lower bound (Aziz et al., 2018).

Sample complexity scaling:

Model class	Lower Bound Scaling	Upper Bound Scaling
Classical $K$ -armed	$O(\mathcal{H}(\mu)^{-1}\log(1/\delta))$	$O(\mathcal{H}(\mu)^{-1}\log(1/\delta))$
Infinite-armed	$O(\mathcal{H}_{LB}\log(1/\delta))$	$O(\overline{\mathcal{H}}\log^2(1/\delta))$
Mean–variance Pareto	$O(K\varepsilon^{-2}\log(K/\delta))$	$O(K\varepsilon^{-2}\log(K/\delta))$
Markovian	$O(T^*(\theta)\log(1/\delta))$	$O(T^*(\theta)\log(1/\delta))$ up to consts.
Multi-objective	$O(c^*(v)\log(1/\delta))$	$O(c^*(v)\log(1/\delta))$

Here, constants inside $\mathcal{H}$ or $T^*$ capture the problem-dependent gaps, divergences, and model structure.

4. Generalizations and Structural Variations

The fixed-confidence framework extends robustly to numerous nonstandard settings:

Infinite-armed bandits and unknown reservoirs: Policies such as two-phase $(\alpha,\epsilon)$ -KL-LUCB achieve near-optimal sample complexity for identification of “top- $\alpha$ ” arms even when the arm distribution $M$ is unknown, with minimal structural assumptions (Aziz et al., 2018).
Misspecified and Linear bandits: Under controlled deviation from a linear structure, algorithms interpolate optimality between linear and full (unstructured) settings, using instance-aware confidence penalties (Réda et al., 2021).
Markovian arms: Identification in Markov chain models with exponential families requires KL-rate divergences over stochastic matrices, with Track-and-Stop algorithms adapted via concentration for Markov processes (Moulos, 2019).
Piecewise-constant bandits/change-point detection: The FC paradigm naturally extends to nonparametric model selection: algorithms sample adjacent to candidate change points, and identifiability is governed by jump magnitudes (Lazzaro et al., 11 Jul 2025).
Multi-objective and risk-averse exploration: Simultaneous identification for multiple objectives or Pareto-optimal sets is enabled by generalizing the complexity measure and incorporating mean–variance or vector-valued reward tradeoffs (Chen et al., 23 Jan 2025, Nonaga et al., 27 Jun 2025).
Community mode estimation and combinatorial settings: Mode identification in populations or under combinatorial feedback constraints adheres to the same FC machinery, with lower bounds reflecting information constraints in sampling (Pai et al., 2023, Zhong et al., 2020).
Bayesian best-arm identification: In the Bayesian FC setting, the error is averaged over a prior and the sample complexity depends crucially on the prior volume of indistinguishable models; frequentist-optimal algorithms may be arbitrarily suboptimal when the prior mass near ties is significant (Jang et al., 2024, Azizi et al., 2021).

5. Algorithmic and Statistical Challenges

The design of FC algorithms faces several persistent technical and conceptual challenges:

Tracking Optimal Sampling Proportions: Attaining the lower bound necessitates close tracking of the optimizer $w^*(\mu)$ , which can be computationally intensive, especially in structured or high-dimensional models (Huang et al., 2017, Chen et al., 23 Jan 2025).
Stopping Rules and Threshold Choice: Uniform control of error probabilities requires likelihood-ratio or confidence-bound thresholds with careful adjustment for multiple testing (arms, objectives, or candidate sets), often resulting in log-factors in $\delta$ (Garivier et al., 2016, Truong, 21 May 2025, Aziz et al., 2018).
Multiple Optima and Indistinguishable Arms: Standard policies may incur inefficiency when more than one optimal arm exists; recent work provides instance-optimal algorithms that avoid oversampling optimal arms (Truong, 21 May 2025).
Unknown Structure and Adaptivity: Exploiting linear, structured, or combinatorial information improves sample efficiency, but only if model misspecification is effectively controlled (Réda et al., 2021, Huang et al., 2017).
Computation versus Statistical Efficiency: Algorithms such as Track-and-Stop and MO-BAI optimize expected samples but may involve nontrivial computational overhead at each round due to required convex optimization (Chen et al., 23 Jan 2025, Réda et al., 2021).

6. Comparisons with Fixed-Budget and Other Settings

The fixed-confidence regime is distinct from fixed-budget strategies, but the two settings are fundamentally linked:

FC-to-FB Reductions: Recent results show that the sample complexity of the FC setting upper-bounds (up to logarithmic factors) the optimal complexity for fixed-budget best-arm identification (Balagopalan et al., 3 Feb 2026). Constructive reductions (e.g., FC2FB) combine strong FC subroutines to yield approximately optimal FB policies.
Simultaneous Algorithms: Some designs (e.g., EB-TC $\varepsilon$ ) are anytime and can operate without prior knowledge of the budget or confidence, efficiently interpolating between regimes (Jourdan et al., 2023).
Anytime and Prior-averaged Settings: Bayesian versions of the FC problem require further adaptation; frequentist FC algorithms may be highly inefficient in the Bayesian regime due to the need to control for prior mass near ties or indistinguishable arms (Jang et al., 2024, Azizi et al., 2021).

7. Open Problems and Future Research

Open theoretical questions remain:

Closing log-factors: Whether the extra $\ln^2(1/\delta)$ scaling in infinite-armed, two-phase, or multi-objective models is inescapable remains unresolved for general reservoirs (Aziz et al., 2018).
Full computational-optimality: Achieving both optimal statistical rate and low computational overhead in high-dimensional or structured bandit models is an ongoing challenge (Réda et al., 2021, Chen et al., 23 Jan 2025).
Universality in Structured/Combinatorial Models: Systematic theory relating structured identification (game trees, cascading/partial feedback, Markovian state) to the general FC framework is still developing (Huang et al., 2017, Zhong et al., 2020, Moulos, 2019).
Optimal Algorithms for Multiple Optima: Characterizing general instance-optimal policies when the reward structure admits multiple optimal solutions is under active exploration (Truong, 21 May 2025).
Bayesian vs. Frequentist Discrepancy: The sharp divergence between Bayesian and frequentist sample complexity in the presence of heavy prior mass near ties motivates new algorithmic designs and unified analysis frameworks (Jang et al., 2024).

The fixed-confidence setting forms the theoretical backbone of modern pure exploration in bandit models, quantitatively characterizing the sample cost of scientific discovery under explicit probabilistic guarantees and informing practical algorithm design from classical settings to contemporary high-dimensional and structured learning problems.