Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adversarial Online Linear Optimization

Updated 9 February 2026
  • Adversarial Online Linear Optimization is the study of sequential decision-making under adversarial environments where a learner minimizes regret by choosing actions from a convex set.
  • Algorithms such as FTRL, Mirror Descent, and coin-betting leverage regularizers and context-aware strategies to achieve optimal sublinear regret bounds.
  • Advanced techniques integrate hints, side information, and alternating regret to improve performance across full-information, combinatorial, and bandit feedback settings.

Adversarial Online Linear Optimization (OLO) is the study of sequential decision-making under worst-case (adversarial) environments, where on each round a learner selects an action from a convex set and suffers a linear loss chosen adversarially. The principal objective is to design prediction algorithms that minimize regret: the difference between the learner’s cumulative loss and that of the best fixed comparator, evaluated in hindsight, over a prescribed action set. This framework is foundational in convex online learning, algorithmic game theory, bandit optimization, and control theory.

1. Formal Problem Statement and Regret Definition

The classical adversarial OLO protocol proceeds over TT rounds. Let KRd\mathcal{K} \subseteq \mathbb{R}^d denote a closed, convex, centrally symmetric, and bounded action set (for example, a Euclidean ball, simplex, or q+\ell_q^+ ball). At round tt:

  • The learner picks xtKx_t \in \mathcal{K}.
  • The adversary reveals a loss (or cost) vector tRd\ell_t \in \mathbb{R}^d (typically with t1\|\ell_t\| \leq 1 in some norm).
  • The learner incurs linear loss t,xt\langle \ell_t, x_t \rangle.

The central metric is regret against a comparator uKu \in \mathcal{K}: RT(u)=t=1Tt,xtu,R_T(u) = \sum_{t=1}^T \langle \ell_t, x_t - u \rangle, with the worst-case regret RT=supuKRT(u)R_T = \sup_{u\in \mathcal{K}} R_T(u). The OLO goal is to construct algorithms with sublinear RTR_T under arbitrary t\ell_t sequences.

2. Algorithmic Frameworks and Optimality

The archetype OLO algorithms are based on Follow-the-Regularized-Leader (FTRL), Mirror Descent, Exponentiated Gradient (multiplicative weights), and, for certain combinatorial domains, exponential weights and their efficient variants.

FTRL plays, at round tt: xt=argminxK{ηR(x)+s=1t1s,x},x_t = \arg\min_{x \in \mathcal{K}} \Big\{ \eta R(x) + \sum_{s=1}^{t-1} \langle \ell_s, x \rangle \Big\}, where RR is a strongly convex regularizer and η\eta is a learning rate tuned to the time horizon.

A fundamental result is that, for bounded K\mathcal{K} and loss norms, FTRL with suitably chosen RR achieves minimax optimal regret O(Rate(K,L)T)O(\mathrm{Rate}(\mathcal{K},\mathcal{L}) \sqrt{T}), where the rate is determined by the geometry of K\mathcal{K} and the duality with the loss norm (Gatmiry et al., 2024). Recent results construct, for any convex symmetric pair (K,L)(\mathcal{K},\mathcal{L}), an explicit regularizer RR^* ensuring minimax-optimal regret, achievable up to a universal constant. However, selection and certification of strong convexity for RR against arbitrary norms may be computationally hard for high-dimensional settings (Gatmiry et al., 2024).

For combinatorial action sets, such as the hypercube {0,1}n\{0,1\}^n, efficient instantiations (e.g., PolyExp) relying on coordinate-separable mirror descent provide optimal expected regret O(nT)O(n\sqrt{T}) in the full-information setting and O(n2T)O(n^2 \sqrt{T}) in bandit feedback (Putta et al., 2018).

3. Parameter-Free, Self-Adaptive, and Side-Information Methods

Traditional algorithms rely on a priori tuning of learning rates or competitor norms. Parameter-free methods remove this requirement, achieving regret bounds adaptive to the norm of the comparator without prior knowledge.

The coin-betting reduction (Ryu et al., 2022) provides a unified scheme: By interpreting OLO as a repeated wealth-betting game and leveraging universal compression (e.g., context-tree weighting), parameter-free algorithms are constructed that adapt to revealed temporal structures and side information (e.g., quantized, Markov, tree-based contexts). Regret bounds adapt to the complexity of the best (possibly state-dependent) comparator, with overall guarantees O~(uT)\tilde O(\|u\|\sqrt{T}) in general, and improved rates when exploitable structure exists in the loss sequence.

Extensions to side information enable competing with the best context-dependent or tree-adaptive policy. The context-tree weighting OLO (CTW-OLO) realizes regret of the form

RT(u[T])O(uTT+ID(T)),R_T(u[T]) \leq O\big(\|u\|_T \sqrt{T} + ID(T)\big),

where uT\|u\|_T measures context-dependent norm and ID(T)ID(T) the description length of the best model (Ryu et al., 2022).

4. Beyond the Classical Regret Bound: Tradeoffs, Hints, and Alternating Regret

Recent research has refined the basic minimax Θ(T)\Theta(\sqrt{T}) regret frontier:

  • Hints and Predictable Sequences: If, at each round, a collection of KK "hint" vectors (predictors of the upcoming loss) is available, and some convex combination is positively correlated with the realized loss, one can reduce regret to O(logT)O(\log T), from the classical O(T)O(\sqrt{T}) benchmark (Bhaskara et al., 2020). The main algorithm, K-Hintsα_\alpha, combines hints via FTRL over the simplex, with smooth-hinge surrogates and a single-hint oracle, and applies a meta-level combiner to select unknown parameters.
  • Comparator-Dependent and Loss vs. Regret Tradeoffs: Stein's method, originally a probabilistic tool, enables OLO algorithms that match not just the leading T\sqrt{T} order, but also the sharp additive constants in both regret and total-loss bounds and realize optimal Pareto tradeoffs between worst-case loss and regret, pointwise in the comparator (Zhang et al., 6 Feb 2026).
  • Alternating Regret: For settings where the learner's move alternates with the adversary's, as in two-player zero-sum games, O~(T1/3)\tilde O(T^{1/3}) alternating regret is achievable for OLO (and more generally OCO), faster than the classical O(T)O(\sqrt{T}) rate. Continuous Hedge and FTRL with third-order smooth regularizers achieve these rates; lower bounds show this is optimal for a wide class of algorithms (Hait et al., 18 Feb 2025).

5. Specialized Domains and Structural Extensions

Different geometric domains and feedback models require tailored methods and analysis:

  • Combinatorial Spaces: On {0,1}n\{0,1\}^n or {1,+1}n\{-1,+1\}^n, PolyExp, equivalent to Exp2, FTRL with entropic regularizer, and FTPL with logistic perturbations, achieves O(nT)O(n\sqrt{T}) regret in full-information settings, resolving implementation and lower bound questions (Putta et al., 2018).
  • Nonnegative q\ell_q Balls: For q+\ell_q^+ domains (important in load balancing and scheduling), smooth approximations of the p\ell_p norm enable algorithms (e.g., SmoothBaseline) that for any ϵ>0\epsilon>0 achieve (1ϵ)(1-\epsilon) multiplicative regret and additive O(pm1/p/ϵ)O(p\, m^{1/p}/\epsilon) overhead, sidestepping the cost of explicit projection (Molinaro, 2016).
  • Adversarial OLO with Memory and Dynamics: When action-dependent feedback propagates through system dynamics (e.g., in online control with adversarial disturbances), reductions to Online Convex Optimization (OCO) with memory enable O(TlogT)O(\sqrt{T}\log T) regret over convex disturbance-action policies, with truncation error controlled by system stability parameters (Agarwal et al., 2019).

6. Regret Lower Bounds, Complexity, and Universality

OLO regret bounds are tight in various regimes. Table 1 summarizes key minimax rates in prototypical settings:

Domain Feedback Optimal Regret Lower Bound
Euclidean ball (2d\ell_2^d) Full info O(dT)O(\sqrt{dT}) Ω(dT)\Omega(\sqrt{dT}) (Gatmiry et al., 2024)
Simplex (Δd\Delta^d) Full info O(logd  T)O(\sqrt{\log d\; T}) Ω(logd  T)\Omega(\sqrt{\log d\; T}) (Gatmiry et al., 2024)
{0,1}n\{0,1\}^n hypercube Full info O(nT)O(n\sqrt{T}) Ω(nT)\Omega(n\sqrt{T}) (Putta et al., 2018)
{0,1}n\{0,1\}^n hypercube Bandit O(n2T)O(n^2\sqrt{T}) Ω(n2T)\Omega(n^2\sqrt{T}) (Putta et al., 2018)
OLO with K correlated hints Full info O(logT)O(\log T) (if hints) Ω(T)\Omega(\sqrt{T}) (worst-case) (Bhaskara et al., 2020)

The construction of minimax-optimal regularizers is possible for arbitrary domains, but the algorithmic complexity is exponential in dd; even determining strong convexity is NP-hard in general (Gatmiry et al., 2024). For some operator classes, e.g., combinatorial sets, polynomial-time reductions leveraging coordinate structure or entropic regularization recover efficient strategies (Putta et al., 2018).

7. Future Directions and Open Problems

Several directions remain at the frontier:

  • Data-dependent and adaptive regret: Developing OLO frameworks that adapt to intrinsic loss sequence "easiness," e.g., via path-length or predictable sequences.
  • Efficient universality: Reducing the exponential cost of computing nearly-optimal regularizers in complex domains remains open.
  • Feedback structures: Extension of OLO with partial, bandit, or delayed feedback to new adversarial models, including side information or hints.
  • Memory and control: Further integration of OCO with memory, disturbances, and dynamic constraints for robust and adaptive control under adversarial setups (Agarwal et al., 2019).
  • Tight characterizations of tradeoff frontiers: Further operationalization of probabilistic and optimization-theoretic methods (e.g., Stein’s method, PDE frameworks) for instance-optimal, non-asymptotic, and computationally efficient OLO algorithms (Zhang et al., 6 Feb 2026, Zhang et al., 2022).

References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adversarial Online Linear Optimization (OLO).