Interactive Fano Framework for Sequential Decision Making

Updated 24 January 2026

The paper introduces the Interactive Fano Framework as a generalization of classical lower bound methods, extending Fano’s, Le Cam’s, and Assouad’s lemmas to adaptive decision making.
It develops quantile-based minimax lower bounds that explicitly control tail probabilities, delivering sharper risk and sample complexity characterizations for bandits and reinforcement learning.
The method employs interactive protocols and f-divergence measures to unify classical and modern approaches, providing actionable insights for safety-critical and online learning problems.

The Interactive Fano Framework is a generalization of classical information-theoretic lower-bound methods, extending Fano's, Le Cam's, and Assouad's lemmas to fully adaptive, interactive statistical decision making. Its core contribution is providing risk level–explicit, quantile-based minimax lower bounds for losses incurred in sequential decision processes, capturing rare failures critical in safety- and robustness-sensitive environments, such as bandits and reinforcement learning. This approach yields a unified methodology for lower bounds on both expected risk and distributional tails, enabling sharper characterizations of sample complexity and algorithmic learnability in interactive settings.

1. Interactive Protocol and Minimax Quantile Formulation

An interactive statistical decision making protocol is specified by a model class $\Theta$ , where each model $M\in\Theta$ prescribes for every action $a\in\mathcal{A}$ a conditional observation law $P^M(\cdot|a)$ on an outcome space $\mathcal{X}$ . An algorithm (ALG), possibly randomized, sequentially selects actions $a_t$ at each round $t$ based on previously observed history $H^{t-1}$ . The interactive protocol proceeds for $T$ rounds, yielding a transcript $H^T = (a_1,x_1,\ldots,a_T,x_T)$ . After $T$ rounds, the algorithm incurs a nonnegative loss $L(M, H^T)$ . The law induced jointly by $M$ and ALG over $H^T$ is $P^{M,\textrm{ALG}}$ .

The minimax risk is defined by

$\mathcal{M} := \inf_{\textrm{ALG}\in\mathcal{D}}\sup_{M\in\Theta}\mathbb{E}^{M,\textrm{ALG}}[L(M,H^T)].$

Crucially, quantile-based risk is formalized as:

The $(1-\delta)$ -quantile for $(M,\textrm{ALG})$ :

$\textrm{Quantile}(1-\delta;P^{M,\textrm{ALG}}) = \inf\{ r\geq0: P^{M,\textrm{ALG}}[L(M,H^T)>r] \leq \delta \}.$

Strict minimax quantile:

$\mathcal{M}(\delta) = \inf_{\textrm{ALG}}\sup_{M} \textrm{Quantile}(1-\delta;P^{M,\textrm{ALG}})$

Lower minimax quantile (tail-probability version):

$\mathcal{M}_-(\delta) = \inf\{r: \inf_{\textrm{ALG}}\sup_{M} P^{M,\textrm{ALG}}[L(M,H^T)>r] \leq \delta\}$

The framework aims to provide $\delta$ -explicit lower bounds $\mathcal{M}_-(\delta)$ on the minimax quantile as a function of risk level.

2. High-Probability Interactive Fano Lemma

The core technical tool is the interactive high-probability Fano lemma, which bounds the minimax quantile by relating the attainable tail probabilities to average $f$ -divergence between distributions induced by any algorithm and a reference. For any $f$ -divergence $D_f(\cdot\|\cdot)$ , prior $\mu$ on $\Theta$ , reference law $Q$ on transcripts, and candidate threshold $\Delta>0$ :

Define

$\bar\rho_{\Delta,Q} := P_{M\sim\mu, X\sim Q}[L(M,X)\leq \Delta],$

$d_{f,\epsilon}(p) = \begin{cases} D_f(\textrm{Bern}(1-\epsilon)\|\textrm{Bern}(p)), \quad &p\leq 1-\epsilon,\ 0, &p>1-\epsilon. \end{cases}$

$\epsilon^* := \sup \left\{\epsilon\in[0,1]: \sup_{\textrm{ALG}} \mathbb{E}_{M\sim\mu}[D_f(P^{M,\textrm{ALG}}\|Q)] < d_{f,\epsilon}(\bar\rho_{\Delta,Q})\right\}$

Then for all $\delta<\epsilon^*$ , $\mathcal{M}_-(\delta)\geq \Delta$ .

For $f=\textrm{KL}$ and $Q$ the mixture $Q=\mathbb{E}_{M\sim\mu}P^{M,\textrm{ALG}}$ , the result admits a mutual information–based variant:

Let $p_{\max} := \sup_x \mu\{M: L(M,x)\leq\Delta\}<1$ , $I_{\mu,\textrm{ALG}}(M;X)$ be mutual information, then if for all algorithms:

$1 + \frac{I_{\mu,\textrm{ALG}}(M;X) + \log 2}{\log(1/p_{\max})} \geq \epsilon,$

then $\mathcal{M}_-(\delta)\geq \Delta$ for all $\delta<\epsilon$ (Bongole et al., 7 Oct 2025).

3. Proof Ideas, Quantile–Expectation Connections, and Conversions

The proof is grounded in the data-processing inequality and the chain rule for $f$ -divergences along the interactive trajectory, ensuring that any adaptive querying strategy is captured. By introducing the indicator $\mathbf{1}\{L(M,X)\leq\Delta\}$ , it relates loss-level tail probabilities to Bernoulli– $f$ –divergence, which, upon inversion, lower-bounds the risk with respect to quantile level. This approach generalizes classical Fano, which is inapplicable to adaptive or interactive scenarios.

Structural connections:

Quantile–to–expectation conversion: For all $\delta \in (0,1]$ ,

$\mathcal{M} \geq \delta \cdot \mathcal{M}(\delta),$

so any strict quantile lower bound immediately implies an expectation lower bound.

Strict–Lower Quantile Equivalence: $\mathcal{M}_-(\delta)\leq \mathcal{M}(\delta)\leq \mathcal{M}_-(\delta-\xi)$ for any $0<\xi<\delta$ , so the strict and lower quantiles coincide except on countable exceptional sets.

4. Applications: Bandits and Sample-Complexity Lower Bounds

A canonical instantiation is the two-armed Gaussian bandit:

Model	Mean Vectors	Key Quantities	Lower Bound
$M_1$	$(+g/2,\ -g/2)$	$KL(P_1\\|P_2)=(g^2/2)\cdot T$	$g = \sqrt{2\log(1/4\delta(1-\delta))/T}$
$M_2$	$(-g/2,\ +g/2)$	$L(M_1,x)+L(M_2,x)=gT$	$\Delta = gT/2$

The quantile bound recovers, for all $\delta\in(0,1/2)$ ,

$\mathcal{M}_-(\delta) \geq \sqrt{(T/2)\log(1/(4\delta(1-\delta)))}.$

This matches the minimax lower bounds for high-probability regret scaling as $\sqrt{T\log(1/\delta)}$ (Bongole et al., 7 Oct 2025).

The framework is directly applicable to other bandit and RL problems, yielding tight uniform-in-algorithm, risk-level-explicit lower bounds.

5. Extensions and Generalizations

Recent work generalizes the interactive Fano approach via two directions:

Replacement of hard-threshold ( $L<\Delta$ ) events by arbitrary bounded transforms of the loss. By analyzing a randomized one-bit statistic $Y=\mathbf{1}\{U\leq \phi(L(M,X))\}$ , one obtains Bernoulli- $f$ –divergence inequalities for $\mathbb{E}[\phi(L)]$ , yielding two-sided confidence intervals for expected transforms, including Bayesian CVaR (Bongole et al., 17 Jan 2026). Pinsker's inequality further quantifies the attainable bounds in terms of mutual information for bounded losses.
Functional extensions link the Fano-type lower bounds to broader risk functionals and allow explicit calibration of tail and expectation-based controls.

6. Relationship to Classical Lower Bound Methods and DEC

The interactive Fano method unifies and subsumes classical tools for minimax lower bounds:

Specializes to classical Fano, Le Cam two-point, and Assouad’s lemma in non-interactive problems.
Recovers decision–estimation coefficient (DEC)–based lower bounds developed by Foster et al., characterizing the fundamental complexity of interactive learning.
Introduces the "fractional covering number" $N_{\mathrm{frac}}(\mathcal{M},A) := \inf_{p\in\Delta(\mathcal{A})}\sup_{M\in\mathcal{M}}1/p\{a: L(M,a)\leq A\}$ as a tight, unified complexity measure for bandit and general interactive problems (Chen et al., 2024).
Enables minimax lower bounds with polynomial slack between lower and upper sample complexity in convex model classes.

7. Summary and Impact

The Interactive Fano Framework provides a risk level–explicit, quantile-calibrated lower bounding methodology for interactive statistical decision making. By directly controlling the tail probabilities and linking quantile- and expectation-based minimax risk, it illuminates the sample complexity thresholds for interactive bandit and reinforcement learning protocols, unifies classical and modern lower-bound techniques, and enables rigorous quantile-centric risk analysis. Its generality and structural properties support the derivation of tight, algorithm-independent lower bounds for high-probability and distributional performance, with direct implications for safety-critical machine learning systems and the foundational theory of online learning (Bongole et al., 7 Oct 2025, Chen et al., 2024, Bongole et al., 17 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (3)

Risk level dependent Minimax Quantile lower bounds for Interactive Statistical Decision Making (2025)

Generalizing the Fano inequality further (2026)

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interactive Fano Framework.

Interactive Fano Framework for Sequential Decision Making

1. Interactive Protocol and Minimax Quantile Formulation

2. High-Probability Interactive Fano Lemma

3. Proof Ideas, Quantile–Expectation Connections, and Conversions

4. Applications: Bandits and Sample-Complexity Lower Bounds

5. Extensions and Generalizations

6. Relationship to Classical Lower Bound Methods and DEC

7. Summary and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Interactive Fano Framework for Sequential Decision Making

1. Interactive Protocol and Minimax Quantile Formulation

2. High-Probability Interactive Fano Lemma

3. Proof Ideas, Quantile–Expectation Connections, and Conversions

4. Applications: Bandits and Sample-Complexity Lower Bounds

5. Extensions and Generalizations

6. Relationship to Classical Lower Bound Methods and DEC

7. Summary and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research