Papers
Topics
Authors
Recent
Search
2000 character limit reached

Follow-The-Regularized-Leader (FTRL)

Updated 27 January 2026
  • Follow-The-Regularized-Leader (FTRL) is an online convex optimization paradigm that minimizes cumulative loss through regularized empirical risk minimization.
  • The framework employs adaptive learning-rate strategies like Stability–Penalty Matching to balance exploration and exploitation in adversarial, stochastic, and game-theoretic settings.
  • FTRL’s robust design uses carefully chosen regularizers and geometry-aware updates to achieve both minimax and instance-optimal regret guarantees in diverse applications.

Follow-The-Regularized-Leader (FTRL) is a foundational paradigm in online learning and online convex optimization, providing a unifying framework for algorithms that achieve low regret in adversarial, stochastic, and hybrid environments. Its modern theoretical and algorithmic development is characterized by advanced learning-rate adaptation, competitive analysis, and applicability to a wide range of sequential decision problems including multi-armed bandits, linear and contextual bandits, and learning in games.

1. Formal Framework and Algorithmic Structure

The FTRL methodology operates over a convex decision set KRd\mathcal{K} \subseteq \mathbb{R}^d via the repeated resolution of regularized empirical risk minimization problems. At round tt, the algorithm selects

xt+1argminxK{s=1tgs,x+1ηtR(x)}x_{t+1} \in \arg\min_{x \in \mathcal{K}} \left\{ \sum_{s=1}^t \langle g_s, x \rangle + \frac{1}{\eta_t} R(x) \right\}

where gsg_s is a subgradient or loss vector revealed up to round ss, R:KRR : \mathcal{K} \to \mathbb{R} is a strongly convex regularizer (e.g., negative entropy, squared Euclidean norm), and ηt\eta_t is a learning rate that may vary over time (Ito et al., 2024). This generic template admits a broad spectrum of instantiations:

  • Full-information: gsg_s is the loss or gradient over the decision variable.
  • Bandit feedback: gsg_s is constructed via unbiased importance-weighted estimators.
  • Game-theoretic settings: gsg_s may represent payoffs in adversarial or competitive games.

The overall update encapsulates both a cumulative loss minimization (via gs,x\sum \langle g_s, x \rangle) and an exploratory/stabilizing regularization (via R(x)/ηtR(x)/\eta_t), allowing explicit control over the exploitation-exploration trade-off and adaptation to problem geometry and feedback structure (Ahn et al., 2024, Moridomi et al., 2017).

2. Regret Decomposition, Competitive Analysis, and Learning-rate Adaptation

Standard FTRL analysis yields regret bounds of the form

RegretTt=1T[ηtzt+(1/ηt1/ηt1)ht]\operatorname{Regret}_T \leq \sum_{t=1}^T [\eta_t z_t + (1/\eta_t - 1/\eta_{t-1}) h_t]

where ztz_t measures stability (sensitivity of iterates to incremental loss), while hth_t quantifies the change in regularization between steps. Defining βt=1/ηt\beta_t = 1/\eta_t, this can be written as

F(β1:T;z1:T,h1:T)=t=1T[ztβt+(βtβt1)ht]F(\beta_{1:T}; z_{1:T}, h_{1:T}) = \sum_{t=1}^T \left[ \frac{z_t}{\beta_t} + (\beta_t - \beta_{t-1}) h_t \right]

and the optimal "offline" learning-rate schedule is obtained by minimizing FF over nondecreasing βt\beta_t. The "competitive ratio" framework compares the performance of an online learning-rate policy π\pi against the offline optimum, with the ratio

CR(π;z1:T,h1:T)=Fπ/F\mathrm{CR}(\pi; z_{1:T}, h_{1:T}) = F^{\pi} / F^*

and analyzes adaptation hardness for varying degrees of penalty non-monotonicity (Ito et al., 2024).

A critical advance is the development of adaptive learning-rate update rules such as Stability–Penalty Matching (SPM), which recursively select βt\beta_t to match the contributions from stability and penalty terms: βtzt=(βt1βt11)htor equivalentlyβt=βt121+1+4ztβt12/ht\beta_t z_t = (\beta_t^{-1} - \beta_{t-1}^{-1}) h_t \quad \text{or equivalently} \quad \beta_t = \beta_{t-1} \cdot \frac{2}{1 + \sqrt{1 + 4 z_t \beta_{t-1}^2 / h_t}} This matching principle yields a competitive ratio within a constant factor of the lower bound for any sequence h1:Th_{1:T} with bounded "approximate monotonicity," characterized by a parameter ξ\xi such that h1ξhth_1 \ge \xi h_t for all tt (Ito et al., 2024).

If hth_t is nonincreasing (ξ=1\xi=1), the competitive gap is constant; otherwise, it scales as Θ(ξ)\Theta(\sqrt{\xi}).

3. Regularizer Construction and Geometry Adaptation

The choice of the regularizer RR is pivotal. For general online linear optimization (OLO), the optimal regret constant is governed not only by strong convexity of RR with respect to a dual norm (induced by the geometry of K\mathcal{K} and the loss set LL), but also by tight control of the regularizer's range over the feasible set. Recent algorithmic techniques construct piecewise-quadratic or smoothed barriers, whose strong-convexity constant and upper bound are tailored to the action and loss sets, achieving regret within a universal constant of the minimax optimum (Gatmiry et al., 2024).

For certain structured problems (e.g., simplexes, Euclidean balls, positive semidefinite cones), analytic barriers such as negative entropy, Burg entropy, or log-determinant regularizers are deployed to exploit intrinsic geometry and sparsity of the losses (Moridomi et al., 2017).

Self-concordant regularizers, notably those used in the SCRiBLe algorithm, yield dimension-tight O(dnlogn)O(d \sqrt{n \log n}) regret rates for adversarial bandits over polytopes and ellipsoids, with Fenchel conjugacy and local norm properties critical for controlling variance and boundary behavior (Lévy et al., 28 Oct 2025).

4. Applications: Bandits, Best-of-Both-Worlds, and Beyond

FTRL is instantiated in several canonical problems:

  • Multi-Armed Bandits (MAB): With Tsallis or Shannon entropy regularizers, FTRL with adaptive learning rates achieves adversarial O(KT)O(\sqrt{KT}) regret and stochastic O(ilogTΔi)O(\sum_{i} \frac{\log T}{\Delta_i}) (gap-dependent) regret, simultaneously satisfying the Best-of-Both-Worlds (BOBW) property (Ito et al., 2024, Zhan et al., 26 Oct 2025, Jin et al., 2023).
  • Graph-structured Bandits: By leveraging the independence number of the feedback graph, FTRL achieves O(ζT)O(\sqrt{\zeta T}) adversarial and O(ζlogT)O(\zeta \log T) stochastic regret, with the learning rate and regularizer geometry matched to feedback structure (Ito et al., 2024).
  • Linear and Contextual Bandits: FTRL equipped with self-concordant barriers and second-order estimators achieves nearly instance-optimal rates, with O(dT)O(d \sqrt{T}) or O(dlogT)O(d \log T) regret depending on adversarial or stochastic regimes (Ito et al., 2024, Kong et al., 2023, Lévy et al., 28 Oct 2025).
  • Partial Monitoring: SPB-matching and related learning-rate selection schemes extend BOBW guarantees to settings with indirect feedback and minimax regret of Θ(T2/3)\Theta(T^{2/3}) (Tsuchiya et al., 2024).
  • Bandits with Structural Priors: Game-dependency, sparsity, and other prior knowledge can be incorporated through regularizer and learning-rate adaptation, yielding instance-dependent and structure-exploiting guarantees (Tsuchiya et al., 2023).

For closely related adversarial/optimistic settings, FTRL-type algorithms can be interpreted as gradient-based prediction algorithms or mapped to distributionally robust FTPL variants, connecting regret-optimal potential functions and efficiently implementing updates via bisection or sampling (Li et al., 2024).

5. Theoretical Guarantees and Lower Bounds

General FTRL regret bounds decompose into stability and penalty contributions; their optimization is fundamentally limited by the monotonicity of regularization parameters. The sharpest known lower bound, proved via competitive analysis, states that no online learning-rate policy can achieve better than Ω(ξ)\Omega(\sqrt{\xi}) times the offline optimum in the worst case of ξ\xi-approximately monotone penalty terms. SPM-based rules attain O(ξ)O(\sqrt{\xi})-competitive regret, which is tight (Ito et al., 2024).

For fixed strongly convex RR and learning rate η\eta, the standard regret bound with subgradients gtg_t (with dual norm \|\cdot\|_*) is

RegretT(x)R(x)η+η2t=1Tgt2\operatorname{Regret}_T(x^*) \leq \frac{R(x^*)}{\eta} + \frac{\eta}{2} \sum_{t=1}^T \|g_t\|_*^2

Adapting η\eta via SPM or similar schemes refines the trade-off and yields data-dependent regret (McMahan, 2014).

In complex feedback models (partial monitoring, time-varying constraints), penalized or modified FTRL schemes achieve O(T2/3)O(T^{2/3}) or other problem-optimal rates, given structural properties or regularizer monotonicity (Tsuchiya et al., 2024, Leith et al., 2022).

6. Extensions, Variations, and Algorithmic Unification

The FTRL paradigm subsumes numerous variants via different choices of regularizer, linearization strategy, and implicit update mechanisms:

  • Generalized Implicit FTRL (GIFTRL) extends the update to interpolations between explicit (linearized) and implicit (full-loss) FTRL, capturing Mirror-Prox and aProx within the same duality framework via Fenchel-Young inequalities (Chen et al., 2023).
  • FTRL–Proximal and Centered FTRL correspond to, respectively, time-centered and current-point-anchored regularization, revealing deep equivalences with mirror descent and dual averaging, and enabling adaptive and per-coordinate learning rates in diagonal or block-structured geometries (McMahan, 2014, Ahn et al., 2024).
  • Bandit-robust and last-iterate convergence: Recent theory addresses not only cumulative regret, but also convergence rates of sequence endpoints (last-iterate analysis), exploiting continuity and stability of FTRL trajectories, particularly with Tsallis regularization in bandit problems (Zhan et al., 26 Oct 2025, Abe et al., 2022).

Optimal design of regularizers—potentially via high-dimensional convex programming exploiting action/loss set geometry—enables dimension-tight and instance-dependent regret, further strengthening FTRL universality (Gatmiry et al., 2024, Moridomi et al., 2017).

7. Insights, Implications, and Open Problems

FTRL enjoys multiple core advantages: universality with respect to problem structure, adaptability via learning-rate and regularizer tuning, and ability to yield both minimax and instance-optimal regret guarantees. Its deployment in bandit, contextual, and constrained online optimization covers settings with direct, partial, or ambiguous feedback. Key theoretical advances highlight:

  • The fundamental role of monotonicity in penalty coefficients for learning-rate adaptation, and competitive limits for online schedules.
  • The centrality of regularizer geometry, both for worst-case and data-dependent regret, and for algorithmic tractability in high dimensions.
  • The unification of last-iterate and cumulative-regret perspectives via Bregman divergence-based analysis.

Active research directions include (i) sharpening last-iterate and simple regret rates, (ii) automated selection or learning of optimal regularizer structures for non-canonical action/loss sets, (iii) extending FTRL unification to reinforcement learning and general non-convex settings, and (iv) efficient implementations bridging FTPL and FTRL in high-dimensional or bandit contexts (Ito et al., 2024, Gatmiry et al., 2024, Lévy et al., 28 Oct 2025).

FTRL remains a cornerstone of online learning theory and algorithm design, with ongoing impact across online decision problems involving adversarial dynamics, stochasticity, and adaptivity.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Follow-The-Regularized-Leader (FTRL).