Adaptive Conditional Gradient Sliding (AdCGS)
- AdCGS is a projection-free, adaptive optimization method that uses an accelerated outer loop coupled with an inexact Frank–Wolfe inner subroutine.
- It adaptively calibrates step sizes and local smoothness parameters using curvature estimates, eliminating the need for line search and global parameter tuning.
- The method achieves optimal convergence rates for both smooth and strongly convex objectives while significantly reducing gradient and LMO call complexities.
The Adaptive Conditional Gradient Sliding (AdCGS) method is a projection-free, first-order optimization algorithm designed for constrained convex minimization. AdCGS achieves optimal accelerated convergence rates matching projection-based accelerated methods but is fully projection-free, relies solely on access to a linear minimization oracle (LMO), and is free of both line search and global parameter tuning. Leveraging an outer accelerated framework and an inner inexact Frank–Wolfe (conditional gradient) subroutine, AdCGS adaptively calibrates stepsizes and local smoothness parameters using local curvature estimates. The method demonstrates notable reductions in gradient and LMO call complexity compared to both classical and adaptive conditional gradient methods, as well as projection-based methods, particularly in high-dimensional and strongly convex settings (Takahashi, 28 Jan 2026).
1. Problem Framework and Oracle Structure
AdCGS addresses convex minimization problems of the form
where is convex and continuously differentiable, and is a compact convex set. The feasible set is assumed to have bounded diameter .
Crucially, the model assumes availability of:
- A first-order (gradient) oracle producing for .
- An LMO that, for any vector , solves .
The objective function may be -smooth (), and optionally -strongly convex, although no explicit knowledge of or is required by the algorithm. Equivalently, smoothness can be restated as .
2. Algorithmic Structure and Adaptive Step Selection
AdCGS follows the conditional gradient sliding architecture but eliminates projection and global linesearch. It integrates an accelerated outer loop with an inner conditional gradient routine applied to strongly convex surrogate subproblems.
Outer (accelerated) loop:
At each iteration , AdCGS maintains iterates , and solves a proximal subproblem: using an inexact Frank–Wolfe algorithm (see below) up to gap . The update rules for the outer iterates are:
- (typically for ),
- , with polynomially growing .
Inner Frank–Wolfe (conditional gradient) subroutine:
To solve the surrogate: the Frank–Wolfe update is called until the Wolfe gap
with at most LMO calls per outer iteration.
Adaptive stepsize and local smoothness estimation:
- For , set , and choose .
- For , estimate local smoothness as
Then is chosen as
ensuring technical conditions (5)-(7) in (Takahashi, 28 Jan 2026).
No line search is invoked after the first iteration. This scheme contrasts with earlier variants (CGS-ls, UCGS) which maintain a linesearch or backtracking component (Ouyang et al., 2021, Nazari et al., 2020).
3. Theoretical Guarantees and Complexity Bounds
The convergence analysis establishes the following complexity properties (Takahashi, 28 Jan 2026):
Convex objective (-smooth):
- Primal gap rate: For well-chosen parameters, for all ,
where , and collects initial distance terms.
- Oracle complexity (FOO calls): To reach , requires
gradient (FOO) calls.
- LMO complexity: Each outer iteration requires LMO calls (up to parameters), so overall LMO calls also scale as .
Strongly convex objective (-strongly convex):
- Restarting mechanism: Restart AdCGS every fixed iterations.
- After stages,
- To reach accuracy ,
- Total FOO calls:
- Total LMO calls:
- No geometric assumptions on : Linear convergence does not require to be a polytope or strongly convex.
Summary table of oracle costs:
| Setting | FOO Calls | LMO Calls |
|---|---|---|
| Convex, ε-accuracy | ||
| Strongly convex, ε |
For all settings, no projection oracle and no global parameter tuning are required, marking a distinct advantage over projection-based and even previous projection-free accelerated methods (Takahashi, 28 Jan 2026, Nazari et al., 2020, Ouyang et al., 2021).
4. Relation to Previous Sliding Methods
AdCGS extends and improves upon the conditional gradient sliding (CGS) method of Lan & Zhou (2016) and its line-search variants (Nazari et al., 2020). The classical CGS algorithm requires knowing the global Lipschitz constant and a pre-specified number of iterations. CGS-ls (Nazari et al., 2020) addresses this by introducing a backtracking linesearch to estimate the local adaptively, but still incurs line search overhead and requires careful tuning of stopping criteria.
Universal Conditional Gradient Sliding (UCGS) (Ouyang et al., 2021) generalized the approach further by allowing for Hölder-smooth objectives (), adaptively estimating both smoothness constant and exponent via backtracking, and by providing universal complexity bounds in both gradient and LMO calls. However, UCGS maintains a doubling/bisection line search at every iteration.
AdCGS (Takahashi, 28 Jan 2026) achieves matching theoretical rates and adaptivity without any explicit line search beyond the first iteration, employing a local curvature estimate based on observed progress, leading to more efficient practical performance and reduced tuning overhead.
5. Practical Implementation and Empirical Evaluation
AdCGS has been empirically validated across multiple problem families:
- Least-squares on the simplex: On synthetic datasets with , AdCGS approximately halves the number of first-order oracle (FOO) calls and reduces LMO calls and runtime compared to CGS and line-search Frank–Wolfe.
- -regression for : AdCGS matches or outperforms projection-based accelerated methods in terms of both FOO calls and runtime, while projection-based methods require solving expensive projections.
- Sparse logistic regression: In high-dimensional regimes ( up to $5000$, up to ), AdCGS reduces FOO calls, LMO calls, and wall clock time by 20–50% compared to CGS, AC-CGM, and accelerated gradient (projection-based) baselines.
In all cases, AdCGS achieves the theoretical decrease in primal gap without parameter tuning, projection, or line search, and robustly handles objectives lacking global Lipschitz smoothness (Takahashi, 28 Jan 2026).
6. Extensions, Generalizations, and Related Algorithms
AdCGS is closely related to:
- CGS-ls (Nazari et al., 2020), which adapts via repeated doubling and line search at each iteration but incurs line search overhead.
- UCGS (Ouyang et al., 2021), which further generalizes to Hölder-smooth objectives via a linesearch/backtracking on both the smoothness constant and exponent.
- Fixed-iteration sliding variants, which pre-specify the number of outer iterations and regulate subproblem inexactness schedule accordingly.
A notable extension includes the option to reduce complexity for LMO calls by warm-starting the Frank–Wolfe subproblem or varying the accuracy schedule adaptively based on observed progress (Ouyang et al., 2021). In UCGS, an adaptive backtracking on the “smoothness exponent” is suggested to further improve adaptation in the weakly smooth regime, though this is not required in AdCGS (Ouyang et al., 2021).
AdCGS may be further tailored to specialized constraint sets via problem-specific LMO implementations, especially in large-scale or combinatorial settings.
7. Context and Impact
AdCGS represents the current optimal-in-theory projection-free acceleration paradigm for smooth and strongly convex optimization. Its adaptivity, elimination of projections and line search, and empirical superiority in several application domains distinguish it from both classical Frank–Wolfe variants and the majority of other projection-free accelerated methods. Its ability to achieve linear convergence for strongly convex objectives without additional geometric constraints on the feasible set further extends its range of practical applicability (Takahashi, 28 Jan 2026). The method is especially relevant in settings where projection is computationally prohibitive, such as high-dimensional sparse learning, semidefinite optimization, and combinatorial polytopes.