Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Conditional Gradient Sliding (AdCGS)

Updated 29 January 2026
  • AdCGS is a projection-free, adaptive optimization method that uses an accelerated outer loop coupled with an inexact Frank–Wolfe inner subroutine.
  • It adaptively calibrates step sizes and local smoothness parameters using curvature estimates, eliminating the need for line search and global parameter tuning.
  • The method achieves optimal convergence rates for both smooth and strongly convex objectives while significantly reducing gradient and LMO call complexities.

The Adaptive Conditional Gradient Sliding (AdCGS) method is a projection-free, first-order optimization algorithm designed for constrained convex minimization. AdCGS achieves optimal accelerated convergence rates matching projection-based accelerated methods but is fully projection-free, relies solely on access to a linear minimization oracle (LMO), and is free of both line search and global parameter tuning. Leveraging an outer accelerated framework and an inner inexact Frank–Wolfe (conditional gradient) subroutine, AdCGS adaptively calibrates stepsizes and local smoothness parameters using local curvature estimates. The method demonstrates notable reductions in gradient and LMO call complexity compared to both classical and adaptive conditional gradient methods, as well as projection-based methods, particularly in high-dimensional and strongly convex settings (Takahashi, 28 Jan 2026).

1. Problem Framework and Oracle Structure

AdCGS addresses convex minimization problems of the form

minxPf(x)\min_{x \in P} f(x)

where f:RnRf: \mathbb{R}^n \rightarrow \mathbb{R} is convex and continuously differentiable, and PRnP \subset \mathbb{R}^n is a compact convex set. The feasible set is assumed to have bounded diameter D=maxx,yPxyD = \max_{x, y \in P} \|x - y\|.

Crucially, the model assumes availability of:

  • A first-order (gradient) oracle producing f(x)\nabla f(x) for xPx \in P.
  • An LMO that, for any vector cRnc \in \mathbb{R}^n, solves minuPc,u\min_{u \in P} \langle c, u \rangle.

The objective function ff may be LL-smooth (f(x)f(y)Lxy\|\nabla f(x) - \nabla f(y)\| \leq L\|x - y\|), and optionally μ\mu-strongly convex, although no explicit knowledge of LL or μ\mu is required by the algorithm. Equivalently, smoothness can be restated as Df(x,y):=f(x)f(y)f(y),xy(L/2)xy2D_f(x, y) := f(x) - f(y) - \langle \nabla f(y), x - y \rangle \leq (L/2)\|x - y\|^2.

2. Algorithmic Structure and Adaptive Step Selection

AdCGS follows the conditional gradient sliding architecture but eliminates projection and global linesearch. It integrates an accelerated outer loop with an inner conditional gradient routine applied to strongly convex surrogate subproblems.

Outer (accelerated) loop:

At each iteration kk, AdCGS maintains iterates xk1,yk1x_{k-1}, y_{k-1}, and solves a proximal subproblem: minzPf(xk1),z+12ηkzyk12,\min_{z \in P} \left\langle \nabla f(x_{k-1}), z \right\rangle + \frac{1}{2\eta_k}\|z - y_{k-1}\|^2, using an inexact Frank–Wolfe algorithm (see below) up to gap δk\delta_k. The update rules for the outer iterates are:

  • yk=(1βk)yk1+βkzky_k = (1 - \beta_k) y_{k-1} + \beta_k z_k (typically β1=0,βk=β(0,1)\beta_1 = 0, \beta_k = \beta \in(0, 1) for k2k \geq 2),
  • xk=τk1+τkxk1+11+τkzkx_k = \frac{\tau_k}{1 + \tau_k} x_{k-1} + \frac{1}{1 + \tau_k} z_k, with polynomially growing τk\tau_k.

Inner Frank–Wolfe (conditional gradient) subroutine:

To solve the surrogate: minzPf(xk1),z+12ηkzyk12,\min_{z \in P} \left\langle \nabla f(x_{k-1}), z \right\rangle + \frac{1}{2\eta_k}\|z - y_{k-1}\|^2, the Frank–Wolfe update is called until the Wolfe gap

Gk(zt):=maxvPf(xk1)+(ztyk1)/ηk,ztvδk,G_k(z_t) := \max_{v \in P} \langle \nabla f(x_{k-1}) + (z_t - y_{k-1})/\eta_k, z_t - v \rangle \leq \delta_k,

with at most Tk=6D2/(ηkδk)T_k = \lceil 6D^2/(\eta_k \delta_k) \rceil LMO calls per outer iteration.

Adaptive stepsize and local smoothness estimation:

  • For k=1k = 1, set L1=f(x1)f(x0)/x1x0L_1 = \|\nabla f(x_1) - \nabla f(x_0)\| / \|x_1 - x_0\|, and choose η2min{(1β)η1,1/(4L1)}\eta_2 \leq \min\{(1 - \beta)\eta_1, 1/(4L_1)\}.
  • For k2k \geq 2, estimate local smoothness as

Lk={0if Df(xk1,xk)=0, f(xk)f(xk1)22Df(xk1,xk)otherwise.L_k = \left\{ \begin{array}{ll} 0 & \text{if } D_f(x_{k-1}, x_k) = 0, \ \frac{\|\nabla f(x_k) - \nabla f(x_{k-1})\|^2}{2 D_f(x_{k-1}, x_k)} & \text{otherwise}. \end{array} \right.

Then ηk+1\eta_{k+1} is chosen as

ηk+1min{2(1β)2ηk,(τk1+1)/(τkηk),τk/(4Lk)}\eta_{k+1} \leq \min\{2(1-\beta)^2\eta_k, (\tau_{k-1}+1)/(\tau_k\eta_k), \tau_k/(4L_k)\}

ensuring technical conditions (5)-(7) in (Takahashi, 28 Jan 2026).

No line search is invoked after the first iteration. This scheme contrasts with earlier variants (CGS-ls, UCGS) which maintain a linesearch or backtracking component (Ouyang et al., 2021, Nazari et al., 2020).

3. Theoretical Guarantees and Complexity Bounds

The convergence analysis establishes the following complexity properties (Takahashi, 28 Jan 2026):

Convex objective (LL-smooth):

  • Primal gap rate: For well-chosen parameters, for all kk,

f(xk)f(x)12L~kk(k+1)Rf(x_k) - f(x^*) \leq \frac{12\tilde{L}_k}{k(k+1)} \mathscr{R}

where L~k=max{1/[4(1β)η1],maxikLi}\tilde{L}_k = \max\{1/[4(1-\beta)\eta_1], \max_{i\leq k} L_i\}, and R\mathscr{R} collects initial distance terms.

  • Oracle complexity (FOO calls): To reach f(xN)f(x)ϵf(x_N) - f(x^*) \leq \epsilon, requires

N=O(L~Nz0x2ϵ)N = O\left(\sqrt{\frac{\tilde{L}_N \|z_0 - x^*\|^2}{\epsilon}}\right)

gradient (FOO) calls.

  • LMO complexity: Each outer iteration requires Tk=O(L~k)T_k = O(\tilde{L}_k) LMO calls (up to parameters), so overall LMO calls also scale as O(N)O(N).

Strongly convex objective (μ\mu-strongly convex):

  • Restarting mechanism: Restart AdCGS every fixed NN iterations.
  • After ss stages,

f(ws)f(x)φ0/2sf(w_s) - f(x^*) \leq \varphi_0 / 2^s

  • To reach accuracy ϵ\epsilon,
    • Total FOO calls: O(L/μlog(1/ϵ))O(\sqrt{L/\mu} \cdot \log(1/\epsilon))
    • Total LMO calls: O(μLD2/ϵ+L/μlog(1/ϵ))O(\mu L D^2/\epsilon + \sqrt{L/\mu} \log(1/\epsilon))
  • No geometric assumptions on PP: Linear convergence does not require PP to be a polytope or strongly convex.

Summary table of oracle costs:

Setting FOO Calls LMO Calls
Convex, ε-accuracy O(L/ϵ)O(\sqrt{L/\epsilon}) O(L/ϵ)O(\sqrt{L/\epsilon})
Strongly convex, ε O(L/μlog(1/ϵ))O(\sqrt{L/\mu}\log(1/\epsilon)) O(μLD2/ϵ+L/μlog(1/ϵ))O(\mu L D^2/\epsilon + \sqrt{L/\mu}\log(1/\epsilon))

For all settings, no projection oracle and no global parameter tuning are required, marking a distinct advantage over projection-based and even previous projection-free accelerated methods (Takahashi, 28 Jan 2026, Nazari et al., 2020, Ouyang et al., 2021).

4. Relation to Previous Sliding Methods

AdCGS extends and improves upon the conditional gradient sliding (CGS) method of Lan & Zhou (2016) and its line-search variants (Nazari et al., 2020). The classical CGS algorithm requires knowing the global Lipschitz constant and a pre-specified number of iterations. CGS-ls (Nazari et al., 2020) addresses this by introducing a backtracking linesearch to estimate the local LL adaptively, but still incurs line search overhead and requires careful tuning of stopping criteria.

Universal Conditional Gradient Sliding (UCGS) (Ouyang et al., 2021) generalized the approach further by allowing for Hölder-smooth objectives (ν(0,1]\nu \in (0,1]), adaptively estimating both smoothness constant and exponent via backtracking, and by providing universal complexity bounds in both gradient and LMO calls. However, UCGS maintains a doubling/bisection line search at every iteration.

AdCGS (Takahashi, 28 Jan 2026) achieves matching theoretical rates and adaptivity without any explicit line search beyond the first iteration, employing a local curvature estimate based on observed progress, leading to more efficient practical performance and reduced tuning overhead.

5. Practical Implementation and Empirical Evaluation

AdCGS has been empirically validated across multiple problem families:

  • Least-squares on the simplex: On synthetic datasets with ARm×nA \in \mathbb{R}^{m \times n}, AdCGS approximately halves the number of first-order oracle (FOO) calls and reduces LMO calls and runtime compared to CGS and line-search Frank–Wolfe.
  • p\ell_p-regression for p<2p < 2: AdCGS matches or outperforms projection-based accelerated methods in terms of both FOO calls and runtime, while projection-based methods require solving expensive projections.
  • Sparse logistic regression: In high-dimensional regimes (nn up to $5000$, mm up to 5×1045 \times 10^4), AdCGS reduces FOO calls, LMO calls, and wall clock time by 20–50% compared to CGS, AC-CGM, and accelerated gradient (projection-based) baselines.

In all cases, AdCGS achieves the theoretical O(1/k2)O(1/k^2) decrease in primal gap without parameter tuning, projection, or line search, and robustly handles objectives lacking global Lipschitz smoothness (Takahashi, 28 Jan 2026).

AdCGS is closely related to:

  • CGS-ls (Nazari et al., 2020), which adapts LL via repeated doubling and line search at each iteration but incurs line search overhead.
  • UCGS (Ouyang et al., 2021), which further generalizes to Hölder-smooth objectives via a linesearch/backtracking on both the smoothness constant and exponent.
  • Fixed-iteration sliding variants, which pre-specify the number of outer iterations and regulate subproblem inexactness schedule accordingly.

A notable extension includes the option to reduce complexity for LMO calls by warm-starting the Frank–Wolfe subproblem or varying the accuracy schedule adaptively based on observed progress (Ouyang et al., 2021). In UCGS, an adaptive backtracking on the “smoothness exponent” ν\nu is suggested to further improve adaptation in the weakly smooth regime, though this is not required in AdCGS (Ouyang et al., 2021).

AdCGS may be further tailored to specialized constraint sets via problem-specific LMO implementations, especially in large-scale or combinatorial settings.

7. Context and Impact

AdCGS represents the current optimal-in-theory projection-free acceleration paradigm for smooth and strongly convex optimization. Its adaptivity, elimination of projections and line search, and empirical superiority in several application domains distinguish it from both classical Frank–Wolfe variants and the majority of other projection-free accelerated methods. Its ability to achieve linear convergence for strongly convex objectives without additional geometric constraints on the feasible set further extends its range of practical applicability (Takahashi, 28 Jan 2026). The method is especially relevant in settings where projection is computationally prohibitive, such as high-dimensional sparse learning, semidefinite optimization, and combinatorial polytopes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Conditional Gradient Sliding Method (AdCGS).