Papers
Topics
Authors
Recent
Search
2000 character limit reached

Online Convex Optimization

Updated 15 November 2025
  • Online Convex Optimization is a framework for sequential decision-making where a learner minimizes convex losses and regret over time in uncertain environments.
  • Key algorithmic paradigms, including OGD, FTRL, OMD, and ONS, offer varied update rules and optimal regret bounds tailored to different convexity and feedback settings.
  • Practical applications span online learning, adaptive control, and resource allocation, improving strategies in both static and dynamic optimization contexts.

Online convex optimization (OCO) is a foundational paradigm for sequential decision-making under uncertainty, unifying adversarial, stochastic, and dynamic optimization procedures where the objective is convex. It models interactions over rounds between a learner, who selects decisions from a convex feasible set, and an environment, which sequentially reveals convex loss functions. Performance is benchmarked by regret relative to a prescribed competitor class, typically the best fixed decision in hindsight, though stronger comparators—such as those encompassing time-varying benchmarks or policies—are also considered.

1. Formal Model and Core Principles

In OCO, the learner operates over a closed convex set KRd\mathcal{K} \subset \mathbb{R}^d, and iterates as follows for t=1,...,Tt=1,...,T:

  • Select xtKx_t \in \mathcal{K}.
  • Observe convex loss ft:KRf_t:\mathcal{K} \to \mathbb{R} after committing to xtx_t.
  • Incur loss ft(xt)f_t(x_t).

Regret with respect to any uKu\in\mathcal{K} is defined as: RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u), with the standard metric being static regret (minimum over uu) or, in adversarial/dynamic settings, more general (dynamic/pathwise) regret.

Common assumptions are that each ftf_t is convex (possibly t=1,...,Tt=1,...,T0-Lipschitz: t=1,...,Tt=1,...,T1), and t=1,...,Tt=1,...,T2 is bounded with diameter t=1,...,Tt=1,...,T3. Subdifferentials and projections onto t=1,...,Tt=1,...,T4 are assumed efficiently computable.

This model subsumes and extends classical online learning with expert advice (where t=1,...,Tt=1,...,T5 is the simplex and t=1,...,Tt=1,...,T6 is linear), bandit settings (where only t=1,...,Tt=1,...,T7 is revealed), and many variants with additional structure or constraints.

2. Canonical Algorithms and Regret Bounds

The OCO literature is anchored on several algorithmic paradigms, each tailored to different geometric, statistical, or computational objectives:

Method Per-round Update (summary) Regret Rate
Online Gradient Descent (OGD) t=1,...,Tt=1,...,T8 t=1,...,Tt=1,...,T9
Follow-The-Regularized-Leader (FTRL) xtKx_t \in \mathcal{K}0 xtKx_t \in \mathcal{K}1, mirrors OGD with suitable xtKx_t \in \mathcal{K}2
Online Mirror Descent (OMD) Mirror/proximal step in a xtKx_t \in \mathcal{K}3-divergence geometry xtKx_t \in \mathcal{K}4
Online Newton Step (ONS) Second-order update, exp-concave xtKx_t \in \mathcal{K}5 xtKx_t \in \mathcal{K}6 for xtKx_t \in \mathcal{K}7-exp-concave
Hedge/EG (Expert advice) Entropic regularization (xtKx_t \in \mathcal{K}8 negative entropy) xtKx_t \in \mathcal{K}9 (ft:KRf_t:\mathcal{K} \to \mathbb{R}0 experts)
Bandit OCO Non-constructive, gradient estimation with partial feedback ft:KRf_t:\mathcal{K} \to \mathbb{R}1 (general), ft:KRf_t:\mathcal{K} \to \mathbb{R}2 (linear)
Frank-Wolfe/Conditional Gradient Linear optimization instead of projection ft:KRf_t:\mathcal{K} \to \mathbb{R}3 (general), projection-free

Key regret bounds:

  • For convex ft:KRf_t:\mathcal{K} \to \mathbb{R}4, OGD/OMD/FTRL with appropriate step-size ft:KRf_t:\mathcal{K} \to \mathbb{R}5 yields regret ft:KRf_t:\mathcal{K} \to \mathbb{R}6.
  • If ft:KRf_t:\mathcal{K} \to \mathbb{R}7 is ft:KRf_t:\mathcal{K} \to \mathbb{R}8-strongly convex, step-size ft:KRf_t:\mathcal{K} \to \mathbb{R}9 yields xtx_t0 regret.
  • ONS achieves xtx_t1 when xtx_t2 are xtx_t3-exp-concave.
  • Projection-free methods (Frank-Wolfe) do not reach the optimal xtx_t4 regret in all cases, but are useful for complex xtx_t5.

Mirror descent and the entropic geometry yield Hedge/Multiplicative Weights for xtx_t6 as simplex, matching classical bounds in online combinatorial settings.

3. Structural Regularities and Algorithmic Adaptation

OCO frameworks have evolved to leverage structural properties of loss sequences and the ambient geometry:

Strong Convexity and Exp-Concavity

For xtx_t7-strongly convex or xtx_t8-exp-concave losses, static regret drops to xtx_t9 or ft(xt)f_t(x_t)0 respectively, as realized by variable-rate OGD or ONS.

Adaptive Bounds and Per-Coordinate Rates

FTPRL (McMahan et al., 2010) adapts regularizer strength per-coordinate, yielding regret bounds

ft(xt)f_t(x_t)1

(where ft(xt)f_t(x_t)2 = width in coordinate ft(xt)f_t(x_t)3), which can be much tighter than a global bound when losses are sparse or anisotropic.

Variation-Based Regret

For environments with temporal smoothness, dynamic regret bounds scale as

ft(xt)f_t(x_t)4

where ft(xt)f_t(x_t)5 is (gradient) variation over time (Yang et al., 2011). This line mirrors advances in dynamic tracking and provides ft(xt)f_t(x_t)6 regret whenever the environment drifts slowly.

Dynamic and Universal Regret

Regret against moving comparators (dynamic regret) provably scales with the path-length ft(xt)f_t(x_t)7 of the comparator sequence (Gokcesu et al., 2019): ft(xt)f_t(x_t)8 where ft(xt)f_t(x_t)9. Parameter-free, universal algorithms simultaneously achieve optimal uKu\in\mathcal{K}0 regret for all uKu\in\mathcal{K}1.

Contaminated and Non-Stationary Regimes

Recent advances consider contaminated OCO (Kamijima et al., 2024): if uKu\in\mathcal{K}2 out of uKu\in\mathcal{K}3 rounds violate strong convexity (or exp-concavity), the optimal regret interpolates between uKu\in\mathcal{K}4 (pure strongly convex/exp-concave) and uKu\in\mathcal{K}5 (fully general), as uKu\in\mathcal{K}6.

4. Generalizations and Advanced Frameworks

OCO admits numerous sophisticated generalizations:

Time-Varying and Stochastic Constraints

Incorporating time-varying or stochastic constraints, algorithms leveraging virtual queues and drift-Lyapunov analysis achieve uKu\in\mathcal{K}7 convergence time to uKu\in\mathcal{K}8-feasibility and near-optimality under mild Slater-type conditions (Neely et al., 2017).

OCO with Unbounded Memory

OCO has been extended to settings where the loss at time uKu\in\mathcal{K}9 depends on the entire history RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),0. The RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),1-effective memory capacity RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),2 quantifies the influence of past decisions (Kumar et al., 2022), leading to tight regret rates RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),3. This framework subsumes finite-memory and discounted-memory OCO, and yields sharper regret bounds for online control and performative prediction.

Stochastic, Bandit, and Decentralized Models

  • In the zeroth-order ("bandit") setting, only RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),4 is accessible per round. Quantum algorithms have enabled RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),5 regret (removing classical dimension dependence) for general convex and RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),6 for strongly convex loss sequences (He et al., 2020).
  • Decentralized OCO considers interacting networked agents receiving local losses. Algorithms based on accelerated gossip achieve near-optimal regret scaling as RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),7 (convex) and RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),8 (strongly convex), matching lower bounds up to logs in the number of agents RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),9, time horizon uu0, and spectral gap uu1 (Wan et al., 2024).

5. Projection-Free and Oracle-Efficient Approaches

Scenarios with complex feasible sets uu2 motivate projection-free OCO algorithms:

  • Frank–Wolfe type algorithms avoid the cost of Euclidean projections, requiring instead linear optimization oracle calls, but trade optimal regret for per-iteration efficiency.
  • Recent advancements (Mhammedi, 2021, Gatmiry et al., 2023) develop wrappers and barrier-based Newton methods that turn any OCO on a Euclidean ball into a projection-free variant for general convex domains, using only a membership oracle. Such methods yield optimal uu3 regret, with only uu4 membership calls, and are computationally favorable in high-dimensional settings where projections or linear optimization are expensive.

6. Practical Applications and Specialized Models

OCO frameworks underpin online learning, control, network resource allocation, and adaptive signal processing:

  • Resource allocation under non-stationarity benefits from discounted/forgetting-factor OCO, trading off static and dynamic regret in environments with varying temporal smoothness (Yuan, 2020).
  • Predictive OCO integrates forecasts (e.g., estimated future gradients), leading to strictly improved dynamic regret without additional environmental assumptions (Lesage-Landry et al., 2019).
  • Coordinate descent variants (Lin et al., 2022) extend OCO to high-dimensional scenarios where only partial, block-wise updates are computationally feasible, still guaranteeing optimal regret rates and efficient scaling.

7. Open Directions and Frontiers

Active research directions in OCO include:

  • Tightening regret bounds in contaminated, partially informative, or hybrid adversarial/stochastic environments.
  • Minimizing oracle complexity (projections, membership, linear oracles) in nontrivial geometries and high-dimensional domains.
  • Advanced dynamic regret with path-dependent or memory-dependent loss, extending the theory of predictor-adaptive and universal methods.
  • Decentralized, federated, and privacy-preserving OCO in distributed networks with communication constraints.
  • Quantum and bandit-feedback OCO bridging the gap between information-theoretic and computational lower bounds.

These developments continue to reinforce OCO as the central lens for designing and analyzing algorithms in adversarial and adaptive decision-making systems across machine learning, control, statistics, and operations research.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Online Convex Optimization.