Adaptive Conditional Gradient Sliding (AdCGS)

Updated 29 January 2026

AdCGS is a projection-free, adaptive optimization method that uses an accelerated outer loop coupled with an inexact Frank–Wolfe inner subroutine.
It adaptively calibrates step sizes and local smoothness parameters using curvature estimates, eliminating the need for line search and global parameter tuning.
The method achieves optimal convergence rates for both smooth and strongly convex objectives while significantly reducing gradient and LMO call complexities.

The Adaptive Conditional Gradient Sliding (AdCGS) method is a projection-free, first-order optimization algorithm designed for constrained convex minimization. AdCGS achieves optimal accelerated convergence rates matching projection-based accelerated methods but is fully projection-free, relies solely on access to a linear minimization oracle (LMO), and is free of both line search and global parameter tuning. Leveraging an outer accelerated framework and an inner inexact Frank–Wolfe (conditional gradient) subroutine, AdCGS adaptively calibrates stepsizes and local smoothness parameters using local curvature estimates. The method demonstrates notable reductions in gradient and LMO call complexity compared to both classical and adaptive conditional gradient methods, as well as projection-based methods, particularly in high-dimensional and strongly convex settings (Takahashi, 28 Jan 2026).

1. Problem Framework and Oracle Structure

AdCGS addresses convex minimization problems of the form

$\min_{x \in P} f(x)$

where $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is convex and continuously differentiable, and $P \subset \mathbb{R}^n$ is a compact convex set. The feasible set is assumed to have bounded diameter $D = \max_{x, y \in P} \|x - y\|$ .

Crucially, the model assumes availability of:

A first-order (gradient) oracle producing $\nabla f(x)$ for $x \in P$ .
An LMO that, for any vector $c \in \mathbb{R}^n$ , solves $\min_{u \in P} \langle c, u \rangle$ .

The objective function $f$ may be $L$ -smooth ( $\|\nabla f(x) - \nabla f(y)\| \leq L\|x - y\|$ ), and optionally $\mu$ -strongly convex, although no explicit knowledge of $L$ or $\mu$ is required by the algorithm. Equivalently, smoothness can be restated as $D_f(x, y) := f(x) - f(y) - \langle \nabla f(y), x - y \rangle \leq (L/2)\|x - y\|^2$ .

2. Algorithmic Structure and Adaptive Step Selection

AdCGS follows the conditional gradient sliding architecture but eliminates projection and global linesearch. It integrates an accelerated outer loop with an inner conditional gradient routine applied to strongly convex surrogate subproblems.

Outer (accelerated) loop:

At each iteration $k$ , AdCGS maintains iterates $x_{k-1}, y_{k-1}$ , and solves a proximal subproblem: $\min_{z \in P} \left\langle \nabla f(x_{k-1}), z \right\rangle + \frac{1}{2\eta_k}\|z - y_{k-1}\|^2,$ using an inexact Frank–Wolfe algorithm (see below) up to gap $\delta_k$ . The update rules for the outer iterates are:

$y_k = (1 - \beta_k) y_{k-1} + \beta_k z_k$ (typically $\beta_1 = 0, \beta_k = \beta \in(0, 1)$ for $k \geq 2$ ),
$x_k = \frac{\tau_k}{1 + \tau_k} x_{k-1} + \frac{1}{1 + \tau_k} z_k$ , with polynomially growing $\tau_k$ .

Inner Frank–Wolfe (conditional gradient) subroutine:

To solve the surrogate: $\min_{z \in P} \left\langle \nabla f(x_{k-1}), z \right\rangle + \frac{1}{2\eta_k}\|z - y_{k-1}\|^2,$ the Frank–Wolfe update is called until the Wolfe gap

$G_k(z_t) := \max_{v \in P} \langle \nabla f(x_{k-1}) + (z_t - y_{k-1})/\eta_k, z_t - v \rangle \leq \delta_k,$

with at most $T_k = \lceil 6D^2/(\eta_k \delta_k) \rceil$ LMO calls per outer iteration.

Adaptive stepsize and local smoothness estimation:

For $k = 1$ , set $L_1 = \|\nabla f(x_1) - \nabla f(x_0)\| / \|x_1 - x_0\|$ , and choose $\eta_2 \leq \min\{(1 - \beta)\eta_1, 1/(4L_1)\}$ .
For $k \geq 2$ , estimate local smoothness as

$L_k = \left\{ \begin{array}{ll} 0 & \text{if } D_f(x_{k-1}, x_k) = 0, \ \frac{\|\nabla f(x_k) - \nabla f(x_{k-1})\|^2}{2 D_f(x_{k-1}, x_k)} & \text{otherwise}. \end{array} \right.$

Then $\eta_{k+1}$ is chosen as

$\eta_{k+1} \leq \min\{2(1-\beta)^2\eta_k, (\tau_{k-1}+1)/(\tau_k\eta_k), \tau_k/(4L_k)\}$

ensuring technical conditions (5)-(7) in (Takahashi, 28 Jan 2026).

No line search is invoked after the first iteration. This scheme contrasts with earlier variants (CGS-ls, UCGS) which maintain a linesearch or backtracking component (Ouyang et al., 2021, Nazari et al., 2020).

3. Theoretical Guarantees and Complexity Bounds

The convergence analysis establishes the following complexity properties (Takahashi, 28 Jan 2026):

Convex objective ( $L$ -smooth):

Primal gap rate: For well-chosen parameters, for all $k$ ,

$f(x_k) - f(x^*) \leq \frac{12\tilde{L}_k}{k(k+1)} \mathscr{R}$

where $\tilde{L}_k = \max\{1/[4(1-\beta)\eta_1], \max_{i\leq k} L_i\}$ , and $\mathscr{R}$ collects initial distance terms.

Oracle complexity (FOO calls): To reach $f(x_N) - f(x^*) \leq \epsilon$ , requires

$N = O\left(\sqrt{\frac{\tilde{L}_N \|z_0 - x^*\|^2}{\epsilon}}\right)$

gradient (FOO) calls.

LMO complexity: Each outer iteration requires $T_k = O(\tilde{L}_k)$ LMO calls (up to parameters), so overall LMO calls also scale as $O(N)$ .

Strongly convex objective ( $\mu$ -strongly convex):

Restarting mechanism: Restart AdCGS every fixed $N$ iterations.
After $s$ stages,

$f(w_s) - f(x^*) \leq \varphi_0 / 2^s$

To reach accuracy $\epsilon$ $ϵ$ ,
- Total FOO calls: $O(\sqrt{L/\mu} \cdot \log(1/\epsilon))$
- Total LMO calls: $O(\mu L D^2/\epsilon + \sqrt{L/\mu} \log(1/\epsilon))$
No geometric assumptions on $P$ : Linear convergence does not require $P$ to be a polytope or strongly convex.

Summary table of oracle costs:

Setting	FOO Calls	LMO Calls
Convex, ε-accuracy	$O(\sqrt{L/\epsilon})$	$O(\sqrt{L/\epsilon})$
Strongly convex, ε	$O(\sqrt{L/\mu}\log(1/\epsilon))$	$O(\mu L D^2/\epsilon + \sqrt{L/\mu}\log(1/\epsilon))$

For all settings, no projection oracle and no global parameter tuning are required, marking a distinct advantage over projection-based and even previous projection-free accelerated methods (Takahashi, 28 Jan 2026, Nazari et al., 2020, Ouyang et al., 2021).

4. Relation to Previous Sliding Methods

AdCGS extends and improves upon the conditional gradient sliding (CGS) method of Lan & Zhou (2016) and its line-search variants (Nazari et al., 2020). The classical CGS algorithm requires knowing the global Lipschitz constant and a pre-specified number of iterations. CGS-ls (Nazari et al., 2020) addresses this by introducing a backtracking linesearch to estimate the local $L$ adaptively, but still incurs line search overhead and requires careful tuning of stopping criteria.

Universal Conditional Gradient Sliding (UCGS) (Ouyang et al., 2021) generalized the approach further by allowing for Hölder-smooth objectives ( $\nu \in (0,1]$ ), adaptively estimating both smoothness constant and exponent via backtracking, and by providing universal complexity bounds in both gradient and LMO calls. However, UCGS maintains a doubling/bisection line search at every iteration.

AdCGS (Takahashi, 28 Jan 2026) achieves matching theoretical rates and adaptivity without any explicit line search beyond the first iteration, employing a local curvature estimate based on observed progress, leading to more efficient practical performance and reduced tuning overhead.

5. Practical Implementation and Empirical Evaluation

AdCGS has been empirically validated across multiple problem families:

Least-squares on the simplex: On synthetic datasets with $A \in \mathbb{R}^{m \times n}$ , AdCGS approximately halves the number of first-order oracle (FOO) calls and reduces LMO calls and runtime compared to CGS and line-search Frank–Wolfe.
$\ell_p$ -regression for $p < 2$ : AdCGS matches or outperforms projection-based accelerated methods in terms of both FOO calls and runtime, while projection-based methods require solving expensive projections.
Sparse logistic regression: In high-dimensional regimes ( $n$ up to $5000$, $m$ up to $5 \times 10^4$ ), AdCGS reduces FOO calls, LMO calls, and wall clock time by 20–50% compared to CGS, AC-CGM, and accelerated gradient (projection-based) baselines.

In all cases, AdCGS achieves the theoretical $O(1/k^2)$ decrease in primal gap without parameter tuning, projection, or line search, and robustly handles objectives lacking global Lipschitz smoothness (Takahashi, 28 Jan 2026).

AdCGS is closely related to:

CGS-ls (Nazari et al., 2020), which adapts $L$ via repeated doubling and line search at each iteration but incurs line search overhead.
UCGS (Ouyang et al., 2021), which further generalizes to Hölder-smooth objectives via a linesearch/backtracking on both the smoothness constant and exponent.
Fixed-iteration sliding variants, which pre-specify the number of outer iterations and regulate subproblem inexactness schedule accordingly.

A notable extension includes the option to reduce complexity for LMO calls by warm-starting the Frank–Wolfe subproblem or varying the accuracy schedule adaptively based on observed progress (Ouyang et al., 2021). In UCGS, an adaptive backtracking on the “smoothness exponent” $\nu$ is suggested to further improve adaptation in the weakly smooth regime, though this is not required in AdCGS (Ouyang et al., 2021).

AdCGS may be further tailored to specialized constraint sets via problem-specific LMO implementations, especially in large-scale or combinatorial settings.

7. Context and Impact

AdCGS represents the current optimal-in-theory projection-free acceleration paradigm for smooth and strongly convex optimization. Its adaptivity, elimination of projections and line search, and empirical superiority in several application domains distinguish it from both classical Frank–Wolfe variants and the majority of other projection-free accelerated methods. Its ability to achieve linear convergence for strongly convex objectives without additional geometric constraints on the feasible set further extends its range of practical applicability (Takahashi, 28 Jan 2026). The method is especially relevant in settings where projection is computationally prohibitive, such as high-dimensional sparse learning, semidefinite optimization, and combinatorial polytopes.

Markdown Report Issue Upgrade to Chat

References (3)

Adaptive Conditional Gradient Sliding: Projection-Free and Line-Search-Free Acceleration (2026)

Universal Conditional Gradient Sliding for Convex Optimization (2021)

Backtracking linesearch for conditional gradient sliding (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Conditional Gradient Sliding Method (AdCGS).