Interactive Fano Framework for Sequential Decision Making
- The paper introduces the Interactive Fano Framework as a generalization of classical lower bound methods, extending Fano’s, Le Cam’s, and Assouad’s lemmas to adaptive decision making.
- It develops quantile-based minimax lower bounds that explicitly control tail probabilities, delivering sharper risk and sample complexity characterizations for bandits and reinforcement learning.
- The method employs interactive protocols and f-divergence measures to unify classical and modern approaches, providing actionable insights for safety-critical and online learning problems.
The Interactive Fano Framework is a generalization of classical information-theoretic lower-bound methods, extending Fano's, Le Cam's, and Assouad's lemmas to fully adaptive, interactive statistical decision making. Its core contribution is providing risk level–explicit, quantile-based minimax lower bounds for losses incurred in sequential decision processes, capturing rare failures critical in safety- and robustness-sensitive environments, such as bandits and reinforcement learning. This approach yields a unified methodology for lower bounds on both expected risk and distributional tails, enabling sharper characterizations of sample complexity and algorithmic learnability in interactive settings.
1. Interactive Protocol and Minimax Quantile Formulation
An interactive statistical decision making protocol is specified by a model class , where each model prescribes for every action a conditional observation law on an outcome space . An algorithm (ALG), possibly randomized, sequentially selects actions at each round based on previously observed history . The interactive protocol proceeds for rounds, yielding a transcript . After rounds, the algorithm incurs a nonnegative loss . The law induced jointly by and ALG over is .
The minimax risk is defined by
Crucially, quantile-based risk is formalized as:
- The -quantile for :
- Strict minimax quantile:
- Lower minimax quantile (tail-probability version):
The framework aims to provide -explicit lower bounds on the minimax quantile as a function of risk level.
2. High-Probability Interactive Fano Lemma
The core technical tool is the interactive high-probability Fano lemma, which bounds the minimax quantile by relating the attainable tail probabilities to average -divergence between distributions induced by any algorithm and a reference. For any -divergence , prior on , reference law on transcripts, and candidate threshold :
- Define
- Then for all , .
For and the mixture , the result admits a mutual information–based variant:
- Let , be mutual information, then if for all algorithms:
then for all (Bongole et al., 7 Oct 2025).
3. Proof Ideas, Quantile–Expectation Connections, and Conversions
The proof is grounded in the data-processing inequality and the chain rule for -divergences along the interactive trajectory, ensuring that any adaptive querying strategy is captured. By introducing the indicator , it relates loss-level tail probabilities to Bernoulli––divergence, which, upon inversion, lower-bounds the risk with respect to quantile level. This approach generalizes classical Fano, which is inapplicable to adaptive or interactive scenarios.
Structural connections:
- Quantile–to–expectation conversion: For all ,
so any strict quantile lower bound immediately implies an expectation lower bound.
- Strict–Lower Quantile Equivalence: for any , so the strict and lower quantiles coincide except on countable exceptional sets.
4. Applications: Bandits and Sample-Complexity Lower Bounds
A canonical instantiation is the two-armed Gaussian bandit:
| Model | Mean Vectors | Key Quantities | Lower Bound |
|---|---|---|---|
The quantile bound recovers, for all ,
This matches the minimax lower bounds for high-probability regret scaling as (Bongole et al., 7 Oct 2025).
The framework is directly applicable to other bandit and RL problems, yielding tight uniform-in-algorithm, risk-level-explicit lower bounds.
5. Extensions and Generalizations
Recent work generalizes the interactive Fano approach via two directions:
- Replacement of hard-threshold () events by arbitrary bounded transforms of the loss. By analyzing a randomized one-bit statistic , one obtains Bernoulli-–divergence inequalities for , yielding two-sided confidence intervals for expected transforms, including Bayesian CVaR (Bongole et al., 17 Jan 2026). Pinsker's inequality further quantifies the attainable bounds in terms of mutual information for bounded losses.
- Functional extensions link the Fano-type lower bounds to broader risk functionals and allow explicit calibration of tail and expectation-based controls.
6. Relationship to Classical Lower Bound Methods and DEC
The interactive Fano method unifies and subsumes classical tools for minimax lower bounds:
- Specializes to classical Fano, Le Cam two-point, and Assouad’s lemma in non-interactive problems.
- Recovers decision–estimation coefficient (DEC)–based lower bounds developed by Foster et al., characterizing the fundamental complexity of interactive learning.
- Introduces the "fractional covering number" as a tight, unified complexity measure for bandit and general interactive problems (Chen et al., 2024).
- Enables minimax lower bounds with polynomial slack between lower and upper sample complexity in convex model classes.
7. Summary and Impact
The Interactive Fano Framework provides a risk level–explicit, quantile-calibrated lower bounding methodology for interactive statistical decision making. By directly controlling the tail probabilities and linking quantile- and expectation-based minimax risk, it illuminates the sample complexity thresholds for interactive bandit and reinforcement learning protocols, unifies classical and modern lower-bound techniques, and enables rigorous quantile-centric risk analysis. Its generality and structural properties support the derivation of tight, algorithm-independent lower bounds for high-probability and distributional performance, with direct implications for safety-critical machine learning systems and the foundational theory of online learning (Bongole et al., 7 Oct 2025, Chen et al., 2024, Bongole et al., 17 Jan 2026).