Papers
Topics
Authors
Recent
Search
2000 character limit reached

Interactive Fano Framework for Sequential Decision Making

Updated 24 January 2026
  • The paper introduces the Interactive Fano Framework as a generalization of classical lower bound methods, extending Fano’s, Le Cam’s, and Assouad’s lemmas to adaptive decision making.
  • It develops quantile-based minimax lower bounds that explicitly control tail probabilities, delivering sharper risk and sample complexity characterizations for bandits and reinforcement learning.
  • The method employs interactive protocols and f-divergence measures to unify classical and modern approaches, providing actionable insights for safety-critical and online learning problems.

The Interactive Fano Framework is a generalization of classical information-theoretic lower-bound methods, extending Fano's, Le Cam's, and Assouad's lemmas to fully adaptive, interactive statistical decision making. Its core contribution is providing risk level–explicit, quantile-based minimax lower bounds for losses incurred in sequential decision processes, capturing rare failures critical in safety- and robustness-sensitive environments, such as bandits and reinforcement learning. This approach yields a unified methodology for lower bounds on both expected risk and distributional tails, enabling sharper characterizations of sample complexity and algorithmic learnability in interactive settings.

1. Interactive Protocol and Minimax Quantile Formulation

An interactive statistical decision making protocol is specified by a model class Θ\Theta, where each model MΘM\in\Theta prescribes for every action aAa\in\mathcal{A} a conditional observation law PM(a)P^M(\cdot|a) on an outcome space X\mathcal{X}. An algorithm (ALG), possibly randomized, sequentially selects actions ata_t at each round tt based on previously observed history Ht1H^{t-1}. The interactive protocol proceeds for TT rounds, yielding a transcript HT=(a1,x1,,aT,xT)H^T = (a_1,x_1,\ldots,a_T,x_T). After TT rounds, the algorithm incurs a nonnegative loss L(M,HT)L(M, H^T). The law induced jointly by MM and ALG over HTH^T is PM,ALGP^{M,\textrm{ALG}}.

The minimax risk is defined by

M:=infALGDsupMΘEM,ALG[L(M,HT)].\mathcal{M} := \inf_{\textrm{ALG}\in\mathcal{D}}\sup_{M\in\Theta}\mathbb{E}^{M,\textrm{ALG}}[L(M,H^T)].

Crucially, quantile-based risk is formalized as:

  • The (1δ)(1-\delta)-quantile for (M,ALG)(M,\textrm{ALG}):

Quantile(1δ;PM,ALG)=inf{r0:PM,ALG[L(M,HT)>r]δ}.\textrm{Quantile}(1-\delta;P^{M,\textrm{ALG}}) = \inf\{ r\geq0: P^{M,\textrm{ALG}}[L(M,H^T)>r] \leq \delta \}.

  • Strict minimax quantile:

M(δ)=infALGsupMQuantile(1δ;PM,ALG)\mathcal{M}(\delta) = \inf_{\textrm{ALG}}\sup_{M} \textrm{Quantile}(1-\delta;P^{M,\textrm{ALG}})

  • Lower minimax quantile (tail-probability version):

M(δ)=inf{r:infALGsupMPM,ALG[L(M,HT)>r]δ}\mathcal{M}_-(\delta) = \inf\{r: \inf_{\textrm{ALG}}\sup_{M} P^{M,\textrm{ALG}}[L(M,H^T)>r] \leq \delta\}

The framework aims to provide δ\delta-explicit lower bounds M(δ)\mathcal{M}_-(\delta) on the minimax quantile as a function of risk level.

2. High-Probability Interactive Fano Lemma

The core technical tool is the interactive high-probability Fano lemma, which bounds the minimax quantile by relating the attainable tail probabilities to average ff-divergence between distributions induced by any algorithm and a reference. For any ff-divergence Df()D_f(\cdot\|\cdot), prior μ\mu on Θ\Theta, reference law QQ on transcripts, and candidate threshold Δ>0\Delta>0:

  • Define

ρˉΔ,Q:=PMμ,XQ[L(M,X)Δ],\bar\rho_{\Delta,Q} := P_{M\sim\mu, X\sim Q}[L(M,X)\leq \Delta],

df,ϵ(p)={Df(Bern(1ϵ)Bern(p)),p1ϵ, 0,p>1ϵ.d_{f,\epsilon}(p) = \begin{cases} D_f(\textrm{Bern}(1-\epsilon)\|\textrm{Bern}(p)), \quad &p\leq 1-\epsilon,\ 0, &p>1-\epsilon. \end{cases}

ϵ:=sup{ϵ[0,1]:supALGEMμ[Df(PM,ALGQ)]<df,ϵ(ρˉΔ,Q)}\epsilon^* := \sup \left\{\epsilon\in[0,1]: \sup_{\textrm{ALG}} \mathbb{E}_{M\sim\mu}[D_f(P^{M,\textrm{ALG}}\|Q)] < d_{f,\epsilon}(\bar\rho_{\Delta,Q})\right\}

  • Then for all δ<ϵ\delta<\epsilon^*, M(δ)Δ\mathcal{M}_-(\delta)\geq \Delta.

For f=KLf=\textrm{KL} and QQ the mixture Q=EMμPM,ALGQ=\mathbb{E}_{M\sim\mu}P^{M,\textrm{ALG}}, the result admits a mutual information–based variant:

  • Let pmax:=supxμ{M:L(M,x)Δ}<1p_{\max} := \sup_x \mu\{M: L(M,x)\leq\Delta\}<1, Iμ,ALG(M;X)I_{\mu,\textrm{ALG}}(M;X) be mutual information, then if for all algorithms:

1+Iμ,ALG(M;X)+log2log(1/pmax)ϵ,1 + \frac{I_{\mu,\textrm{ALG}}(M;X) + \log 2}{\log(1/p_{\max})} \geq \epsilon,

then M(δ)Δ\mathcal{M}_-(\delta)\geq \Delta for all δ<ϵ\delta<\epsilon (Bongole et al., 7 Oct 2025).

3. Proof Ideas, Quantile–Expectation Connections, and Conversions

The proof is grounded in the data-processing inequality and the chain rule for ff-divergences along the interactive trajectory, ensuring that any adaptive querying strategy is captured. By introducing the indicator 1{L(M,X)Δ}\mathbf{1}\{L(M,X)\leq\Delta\}, it relates loss-level tail probabilities to Bernoulli–ff–divergence, which, upon inversion, lower-bounds the risk with respect to quantile level. This approach generalizes classical Fano, which is inapplicable to adaptive or interactive scenarios.

Structural connections:

  • Quantile–to–expectation conversion: For all δ(0,1]\delta \in (0,1],

MδM(δ),\mathcal{M} \geq \delta \cdot \mathcal{M}(\delta),

so any strict quantile lower bound immediately implies an expectation lower bound.

  • Strict–Lower Quantile Equivalence: M(δ)M(δ)M(δξ)\mathcal{M}_-(\delta)\leq \mathcal{M}(\delta)\leq \mathcal{M}_-(\delta-\xi) for any 0<ξ<δ0<\xi<\delta, so the strict and lower quantiles coincide except on countable exceptional sets.

4. Applications: Bandits and Sample-Complexity Lower Bounds

A canonical instantiation is the two-armed Gaussian bandit:

Model Mean Vectors Key Quantities Lower Bound
M1M_1 (+g/2, g/2)(+g/2,\ -g/2) KL(P1P2)=(g2/2)TKL(P_1\|P_2)=(g^2/2)\cdot T g=2log(1/4δ(1δ))/Tg = \sqrt{2\log(1/4\delta(1-\delta))/T}
M2M_2 (g/2, +g/2)(-g/2,\ +g/2) L(M1,x)+L(M2,x)=gTL(M_1,x)+L(M_2,x)=gT Δ=gT/2\Delta = gT/2

The quantile bound recovers, for all δ(0,1/2)\delta\in(0,1/2),

M(δ)(T/2)log(1/(4δ(1δ))).\mathcal{M}_-(\delta) \geq \sqrt{(T/2)\log(1/(4\delta(1-\delta)))}.

This matches the minimax lower bounds for high-probability regret scaling as Tlog(1/δ)\sqrt{T\log(1/\delta)} (Bongole et al., 7 Oct 2025).

The framework is directly applicable to other bandit and RL problems, yielding tight uniform-in-algorithm, risk-level-explicit lower bounds.

5. Extensions and Generalizations

Recent work generalizes the interactive Fano approach via two directions:

  • Replacement of hard-threshold (L<ΔL<\Delta) events by arbitrary bounded transforms of the loss. By analyzing a randomized one-bit statistic Y=1{Uϕ(L(M,X))}Y=\mathbf{1}\{U\leq \phi(L(M,X))\}, one obtains Bernoulli-ff–divergence inequalities for E[ϕ(L)]\mathbb{E}[\phi(L)], yielding two-sided confidence intervals for expected transforms, including Bayesian CVaR (Bongole et al., 17 Jan 2026). Pinsker's inequality further quantifies the attainable bounds in terms of mutual information for bounded losses.
  • Functional extensions link the Fano-type lower bounds to broader risk functionals and allow explicit calibration of tail and expectation-based controls.

6. Relationship to Classical Lower Bound Methods and DEC

The interactive Fano method unifies and subsumes classical tools for minimax lower bounds:

  • Specializes to classical Fano, Le Cam two-point, and Assouad’s lemma in non-interactive problems.
  • Recovers decision–estimation coefficient (DEC)–based lower bounds developed by Foster et al., characterizing the fundamental complexity of interactive learning.
  • Introduces the "fractional covering number" Nfrac(M,A):=infpΔ(A)supMM1/p{a:L(M,a)A}N_{\mathrm{frac}}(\mathcal{M},A) := \inf_{p\in\Delta(\mathcal{A})}\sup_{M\in\mathcal{M}}1/p\{a: L(M,a)\leq A\} as a tight, unified complexity measure for bandit and general interactive problems (Chen et al., 2024).
  • Enables minimax lower bounds with polynomial slack between lower and upper sample complexity in convex model classes.

7. Summary and Impact

The Interactive Fano Framework provides a risk level–explicit, quantile-calibrated lower bounding methodology for interactive statistical decision making. By directly controlling the tail probabilities and linking quantile- and expectation-based minimax risk, it illuminates the sample complexity thresholds for interactive bandit and reinforcement learning protocols, unifies classical and modern lower-bound techniques, and enables rigorous quantile-centric risk analysis. Its generality and structural properties support the derivation of tight, algorithm-independent lower bounds for high-probability and distributional performance, with direct implications for safety-critical machine learning systems and the foundational theory of online learning (Bongole et al., 7 Oct 2025, Chen et al., 2024, Bongole et al., 17 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interactive Fano Framework.