Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lower Bounds for Linear Minimization Oracle Methods Optimizing over Strongly Convex Sets

Published 26 Feb 2026 in math.OC | (2602.22608v1)

Abstract: We consider the oracle complexity of constrained convex optimization given access to a Linear Minimization Oracle (LMO) for the constraint set and a gradient oracle for the $L$-smooth, strongly convex objective. This model includes Frank-Wolfe methods and their many variants. Over the problem class of strongly convex constraint sets $S$, our main result proves that no such deterministic method can guarantee a final objective gap less than $\varepsilon$ in fewer than $Ω(\sqrt{L\, \mathrm{diam}(S)2/\varepsilon})$ iterations. Our lower bound matches, up to constants, the accelerated Frank-Wolfe theory of Garber and Hazan (2015). Together, these establish this as the optimal complexity for deterministic LMO methods over strongly convex constraint sets. Second, we consider optimization over $β$-smooth sets, finding that in the modestly smooth regime of $β=Ω(1/\sqrt{\varepsilon})$, no complexity improvement for span-based LMO methods is possible against either compact convex sets or strongly convex sets.

Summary

  • The paper establishes a matching lower bound of Θ(√(L·diam(S)²/ε)) for deterministic FO-LMO methods on strongly convex sets.
  • It employs zero-chain adversarial techniques to construct hard instances that restrict information gain per oracle call.
  • The results reveal that the geometry of constraint sets fundamentally limits acceleration in projection-free optimization and informs future research.

Lower Bounds for Linear Minimization Oracle Methods over Strongly Convex Sets

Problem Context and Oracle Model

The paper investigates the oracle complexity of convex optimization with constraint sets that possess structural properties (strong convexity or smoothness), where access is provided both to a gradient oracle for the objective function and a Linear Minimization Oracle (LMO) for the constraint set. This encompasses the Frank-Wolfe (FW) algorithm and its numerous projection-free variants. The optimization problem is formulated as minimizing an LL-smooth, strongly convex function ff over a constraint set SS, where the algorithm iteratively queries gradients and utilizes the LMO.

The focus is on high-dimensional settings and deterministic first-order algorithms using only gradient and LMO calls. The computational guarantee relates to the number of such oracle calls required to achieve an ε\varepsilon-suboptimal solution.

Structural Assumptions: Strong Convexity and Smoothness

Strong convexity and smoothness for sets are defined in terms of curvature and boundary regularity, parallel to standard function notions. For sets, α\alpha-strong convexity means all chords are thickened by a ball proportional to the squared distance between endpoints, while β\beta-smoothness constrains the variation of boundary normals. These structures impact attainable convergence rates by projection-free methods.

Complexity Bounds for LMO-Based Methods

Classic results establish O(1/T)O(1/T) convergence rates for FW over general convex sets, with iteration complexity for ε\varepsilon-suboptimality scaling as O(Ldiam(S)2/ε)O(L\, \mathrm{diam}(S)^2/\varepsilon) [jaggi2013revisiting, lan2013complexity]. However, Garber and Hazan [garber2015faster] showed that when both ff and SS are strongly convex, acceleration to O(1/T2)O(1/T^2) is achievable, yielding iteration complexity O(Ldiam(S)2/ε)O(\sqrt{L\, \mathrm{diam}(S)^2/\varepsilon}).

This paper rigorously proves that no deterministic FO-LMO method—regardless of its span or convex hull restrictions—can surpass this accelerated O(1/T2)O(1/T^2) rate for α\alpha-strongly convex sets, establishing a matching lower bound of Ω(Ldiam(S)2/ε)\Omega(\sqrt{L\, \mathrm{diam}(S)^2/\varepsilon}), up to constants. It shows that this rate is universally optimal within the considered oracle model and problem class.

Hard Instance Construction and Zero-Chain Adversarial Techniques

A central technical device is the construction of hard instances: for any number of iterations TT, the authors build an α\alpha-strongly convex constraint set with carefully designed geometry (via intersections of shifted balls) such that the LMO exposes only a single new coordinate of information per step. This "zero-chain" property parallels classic adversarial designs for unconstrained gradient methods [nemirovski1983problem], but here is adapted to projection-free, constrained settings.

For any FO-LMO method, an adversarial "resisting oracle" dynamically assigns coordinates and weights to maximize the information gap, ensuring that after TT steps, at least dTd-T coordinates remain zeroed or undetermined, certifying a lower bound on the achievable suboptimality.

Results for Smooth Sets and Span-Based Algorithms

The bounds extend to β\beta-smooth sets: in the regime β=Ω(1/ε)\beta = \Omega(1/\sqrt{\varepsilon}), the iteration complexity for span-based LMO methods remains O(Ldiam(S)2/ε)O(L\, \mathrm{diam}(S)^2/\varepsilon) (for convex sets) or O(Ldiam(S)2/ε)O(\sqrt{L\, \mathrm{diam}(S)^2/\varepsilon}) (for strongly convex sets), with no further improvement due to smoothness unless β\beta becomes extremely large. The lower bounds are tight in these domains, confirming that modest smoothness fails to enable additional acceleration.

Implications, Gaps, and Future Directions

The paper closes the gap between accelerated upper bounds for FW over strongly convex sets [garber2015faster] and matching lower bounds, now proven for the full class of deterministic FO-LMO algorithms. The constant-factor gap between upper and lower complexity remains, which may be addressable by performance estimation techniques [luner2024performance, drori2014performance].

Practically, these results clarify the limitations of projection-free optimization protocols, particularly in high-dimensional or adversarial regimes, highlighting that the geometry of constraint sets, not just the conditioning of the objective, imposes fundamental barriers to linear convergence.

Theoretically, the generalization of zero-chain and adversarial oracle models opens pathways for future tight lower bounds in other settings such as affine-invariant algorithms [Pena2023, Wirth2025]. The paper also positions the interplay between LMO and gauge oracles as a fruitful direction [liu2023gauges, samakhoana2024scalable]. More nuanced complexity analysis contingent on explicit diameter or strong convexity parameters is left for further work.

Conclusion

This paper establishes that for optimization over α\alpha-strongly convex sets, the optimal iteration complexity for deterministic linear minimization oracle methods is Θ(Ldiam(S)2/ε)\Theta(\sqrt{L\, \mathrm{diam}(S)^2/\varepsilon}), with sharp lower bounds matching the best-known accelerated projection-free methods. The results are robust to oracle model assumptions and set structures, and show that further acceleration is fundamentally prevented by constraint geometry, even for perfectly conditioned objectives. The analytic framework and adversarial constructions deployed herein provide definitive complexity guarantees and suggest future lines of inquiry in algorithmic optimality and geometric optimization theory (2602.22608).

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Overview

This paper studies how fast certain “projection-free” optimization methods can possibly work. These methods, like Frank–Wolfe, use a special helper called a Linear Minimization Oracle (LMO) to move around inside a constraint set without computing expensive projections. The authors prove limits (lower bounds) on the speed of all deterministic methods that rely on LMOs when the constraint set is nicely curved (strongly convex) and the function being minimized is smooth and strongly convex. They also look at sets whose boundaries are smooth and show that, in a certain regime, you still can’t beat known rates.

Key Questions

  • If the constraint set is strongly convex and the objective is smooth and strongly convex, what is the fastest possible rate any deterministic LMO-based method can achieve?
  • Can we go faster than the best known Frank–Wolfe rates in these cases?
  • If the constraint set is only “modestly smooth,” do span-based LMO methods gain any speed advantage?

Methods and Approach

Think of the optimization problem like trying to find the lowest point in a smooth bowl (the objective function) while being forced to stay inside a curved fence (the constraint set).

  • Smooth and strongly convex functions: A smooth function means its slope doesn’t change too wildly. Strongly convex means the bowl is well-curved and has a unique bottom. Together, this usually makes optimization easier.
  • Strongly convex sets: A strongly convex set is like a fence with no flat edges—its boundary is nicely curved everywhere.
  • Linear Minimization Oracle (LMO): Given a direction to look, the LMO returns the point in the set that is “most in that direction,” i.e., it solves “which point in S is most aligned with moving downhill?” This is cheaper than projecting, which would be “float back to the closest point inside the set,” but gives less information.

The central idea is to build “hard” problem instances and adversarial oracles that force any LMO-based method to learn only a tiny bit per step. The authors adapt the classic “zero-chain” technique: in each iteration, the algorithm’s oracles reveal at most one new piece of useful information (like unlocking one coordinate in a high-dimensional problem). They do this in two stages:

  • LMO-span model: First, they analyze a restricted family of methods that choose directions from the span of past information and keep iterates in the convex hull of previously returned LMO points. They construct a special strongly convex set where the LMO behaves like “soft-thresholding” (it cuts off small components) and prove that each step only uncovers one more coordinate. This creates a bottleneck that limits speed.
  • General FO-LMO model: Next, they remove the restriction by inventing a “resisting oracle.” This oracle picks a hidden permutation of the set’s weights and reveals them in a way that still only grants one coordinate’s worth of information per step, no matter what deterministic method is used. This shows the lower bound applies to all deterministic first-order LMO methods, not just the span-restricted ones.

They also consider “smooth sets,” whose boundary normals change smoothly. By “smoothing” their hard sets with a small ball and analyzing the effect, they show that for modest smoothness levels, span-based methods don’t get meaningful acceleration.

Main Findings

  • Strongly convex sets: Any deterministic LMO-based method needs at least on the order of

Ldiam(S)2ε\sqrt{\frac{L\,\mathrm{diam}(S)^2}{\varepsilon}}

iterations to get error at most ε\varepsilon. Here, LL is the smoothness of the function, and diam(S)\mathrm{diam}(S) is the set’s diameter (its “width” across).

This matches, up to constants, the best known accelerated Frank–Wolfe rate of f(xT)f(x)O ⁣(Ldiam(S)2T2)f(x_T)-f(x^\star)\le \mathcal{O}\!\left(\frac{L\,\mathrm{diam}(S)^2}{T^2}\right), meaning TT must be about Ldiam(S)2/ε\sqrt{L\,\mathrm{diam}(S)^2/\varepsilon}. So you cannot do better than the 1/T21/T^2 rate with deterministic LMO methods on strongly convex sets.

  • Modestly smooth sets: When the set’s smoothness level β\beta is only modest (specifically β=Ω(1/ε)\beta=\Omega(1/\sqrt{\varepsilon})), span-based LMO methods cannot improve the known optimal complexities:
    • For general convex sets: no better than Θ ⁣(Ldiam(S)2ε)\Theta\!\left(\frac{L\,\mathrm{diam}(S)^2}{\varepsilon}\right).
    • For strongly convex sets: no better than Θ ⁣(Ldiam(S)2ε)\Theta\!\left(\sqrt{\frac{L\,\mathrm{diam}(S)^2}{\varepsilon}}\right).

In short, “just” making the set modestly smooth doesn’t unlock faster rates for these methods.

Why this matters: It proves a hard limit on how fast projection-free methods can go, even when the objective is perfectly conditioned (smooth and strongly convex). The difficulty comes from the constraint set and the limited information LMOs provide.

Implications and Potential Impact

  • Projection-free methods (like Frank–Wolfe) are attractive in large-scale problems because LMOs are cheap compared to projections. However, this paper shows a fundamental barrier: even with a very nice objective, if you rely on LMOs and the set can be adversarially curved, you cannot get linear convergence (i.e., you cannot beat the 1/T21/T^2 rate in the strongly convex setting).
  • For practitioners, this suggests that to get faster convergence (like linear rates), you may need:
    • Different or stronger oracles (e.g., projection oracles),
    • Additional structure in the set (beyond general strong convexity),
    • Or randomized methods or other algorithmic ideas outside this deterministic LMO framework.
  • For theory, the “zero-chain for sets” and resisting-oracle construction are powerful tools. They open the door to proving more lower bounds for constrained, projection-free optimization and help guide the search for truly optimal algorithms and tight constants.

Knowledge Gaps

Below is a single, focused list of concrete gaps, limitations, and open questions that remain unresolved and could guide future research.

  • Close the constant-factor gap in the optimal strongly convex rate: identify minimax-optimal FO-LMO algorithms and matching worst-case instances to resolve the current disparity between the lower bound constant (1/528) and the best-known upper bound constant (9/2), potentially via Performance Estimation Problem (PEP) techniques.
  • Develop hard instances for arbitrary α and diameter selections: construct strongly convex sets with freely chosen diameter diam(S) ≤ 2/α (not tied to the current Θ(1/(α d)) scaling) and derive lower bounds that reflect the full interplay between α and diam(S).
  • Establish lower bounds in low dimensions: extend universal lower bounds (currently relying on d ≳ T for zero-chain arguments) to fixed small dimensions (e.g., d=2 or d independent of T) against all deterministic FO-LMO methods, beyond specific Frank-Wolfe variants.
  • Extend smooth-set results to all deterministic FO-LMO methods: generalize the modestly smooth-set lower bounds (currently proved for LMO-span methods) to the full FO-LMO family, especially for constant β (rather than β = Ω(1/√ε)).
  • Determine whether constant smoothness enables acceleration: rigorously resolve if β-smooth (with β = Θ(1)) constraint sets permit rate improvements beyond convex or strongly convex baselines for FO-LMO methods (current evidence is partial and mostly numerical).
  • Affine-invariant lower bounds: translate the Euclidean-norm-based theory into affine-covariant terms that match Frank-Wolfe’s geometry (e.g., via curvature-like affine-invariant set parameters), and provide corresponding minimax lower bounds.
  • Randomization: derive oracle complexity lower bounds for randomized FO-LMO methods; determine whether randomization can circumvent the resisting-oracle constructions used for deterministic methods.
  • Inexact or noisy oracles: analyze the robustness of the lower bounds under approximate LMOs, noisy gradients, or limited oracle accuracy, including how inexactness degrades the optimal rates.
  • Multi-oracle or hybrid models: characterize complexity when algorithms are allowed additional oracles (e.g., separation, prox, oracles for support/gauge) alongside the LMO; identify conditions under which these hybrid models can strictly improve rates.
  • Geometry-aware lower bounds beyond diameter/α: incorporate finer geometric descriptors (e.g., pyramidal width, curvature constants, smoothness of normals) into lower bounds to align with away-step and pairwise FW upper bounds that depend on such parameters.
  • Families where LMOs simulate projections: precisely classify constraint sets (e.g., Euclidean balls) where LMOs can recover projections, enabling linear convergence; quantify how such structure changes the minimax complexity.
  • Explicit conditioning dependence: provide lower bounds that depend transparently on objective conditioning (μ/L), rather than relying on μ=L and arguing extension to μ≤L; identify sharp μ/L-dependent rates for FO-LMO methods.
  • Feasibility-preserving algorithms: refine lower bounds under the constraint that iterates must remain feasible (some FO-LMO schemes may allow infeasible iterates); quantify any trade-offs between feasibility maintenance and worst-case complexity.
  • Parallel or multiple LMO calls per iteration: investigate whether allowing multiple LMO queries per iteration (or batched queries) yields provable rate improvements, and establish corresponding lower bounds with k-oracle calls per iteration.
  • Non-Euclidean geometries and mirror maps: extend the theory to Bregman/Fenchel geometries underpinning mirror-descent-like FW variants; provide lower bounds that capture non-Euclidean structure and potential preconditioning effects.
  • Alternative performance metrics: derive lower bounds in terms of Frank-Wolfe dual gap, feasibility violation, and other practical termination criteria, not just objective suboptimality.
  • Second-order or momentum information: assess whether access to limited second-order information (e.g., curvature along FW directions) or momentum can alter minimax rates under LMO constraints; establish corresponding lower bounds.
  • Stochastic settings: formulate lower bounds for stochastic FO-LMO optimization (e.g., stochastic gradients or LMOs), including the role of noise level and sample complexity.
  • Dimension-efficient zero-chain constructions: design adversarial sets with smoother boundaries or different combinatorial structures that enforce zero-chain behavior with smaller dimension requirements (d sublinear in T).
  • Strongly convex and smooth sets with unique normals: construct explicit hard instances that are both strongly convex and β-smooth (unique unit normals everywhere) and prove lower bounds that do not rely on nonsmooth boundary intersections.

Practical Applications

Overview

This paper establishes fundamental limits for “projection-free” first-order methods that rely on a Linear Minimization Oracle (LMO), including Frank–Wolfe (FW) variants. The main result proves a tight lower bound on iteration complexity for minimizing an L-smooth, strongly convex objective over strongly convex sets using deterministic LMO-based methods: no method can beat Ω(√(L·diam(S)²/ε)) iterations (i.e., at best an O(1/T²) rate in objective gap). A secondary contribution shows that for modestly smooth sets (β = Ω(1/√ε)), span-based LMO methods cannot improve over the known optimal rates for general compact sets or strongly convex sets. The technical innovations include constructing a strongly convex feasible region whose LMO exhibits a zero-chain property, and a resisting-oracle argument that extends the lower bound to all deterministic FO-LMO methods.

Below are practical applications, grouped by immediacy and mapped to relevant sectors. For each, we outline actionable uses, potential tools/workflows, and feasibility assumptions.

Immediate Applications

  • Algorithm selection and expectation setting in optimization and machine learning pipelines (software, ML, data science)
    • Use the tight lower bound to choose between LMO-based methods (e.g., FW variants) and projection-based methods when targeting high accuracy ε.
    • Practical rule: if you need ε-accuracy and only have an LMO over a strongly convex set, plan for at least on the order of √(L·diam(S)²/ε) iterations; if unacceptable, consider algorithms with projections, interior-point methods, or constraint reformulations.
    • Potential tools/workflows:
    • A “method advisor” module in optimization libraries that estimates L and diam(S), then provides a minimal iteration budget and method recommendations.
    • Dashboards that compare empirical progress to the lower-bound curve to flag when a projection-free approach is fundamentally rate-limited.
    • Assumptions/dependencies:
    • Deterministic FO-LMO setting; lower bound targets worst-case instances.
    • Requires rough estimates of L and diam(S); exact diameters may be hard for complex sets, but conservative bounds (e.g., known radius) often suffice.
  • Budgeting, stopping criteria, and SLAs for large-scale convex optimization (software platforms, MLOps)
    • Translate the bound into planning tools: given a time budget and per-iteration cost, determine realistic target ε, or conversely, compute minimal time to achieve ε.
    • Use the Ω(√(L·diam(S)²/ε)) bound to design principled early stopping rules (e.g., if the observed rate is near the bound, further speedups are unlikely without changing the method/oracle).
    • Assumptions/dependencies:
    • Requires stable per-iteration cost estimates and an LMO implementation.
    • Bound is worst-case; actual instances may be easier, but this avoids overpromising.
  • Robust benchmarking and QA for FW-style solvers (software, academia)
    • Adopt the paper’s hard-instance generators to test solver claims against adversarial cases and avoid overfitting to “easy” benchmarks.
    • Implement the LMO testbed:
    • Use the paper’s soft-thresholding form for the LMO (a 1D root-finding for λ) to generate oracle responses.
    • Include the resisting-oracle variant to ensure methods do not secretly exploit fixed-instance structure.
    • Assumptions/dependencies:
    • Needs a reliable 1D solver for the LMO multiplier; monotonicity makes this straightforward.
    • High-dimensional settings (d ≳ number of iterations) better expose worst cases.
  • Hybrid method switching when projections are available or can be emulated (software, ML)
    • When S is a Euclidean ball (or very close), projections are trivial and fast, and linear convergence for strongly convex objectives is feasible via projected gradient methods.
    • The paper notes: if diam(S) = 2/α and S is a ball, an LMO can be used to compute projections—then FO-LMO includes projected gradient methods with linear rates.
    • Workflow:
    • Detect when S is (or can be reformulated as) a ball (or near-ball).
    • Switch from FW to projected-gradient or proximal methods to exploit linear convergence.
    • Assumptions/dependencies:
    • Requires knowledge that S is a ball or can be approximated as such without breaking problem semantics.
    • If only an LMO interface is exposed (e.g., legacy code), mapping to projection may need additional derivations or access.
  • Practical guidance for constrained ML tasks with expensive projections (ML, healthcare, finance, energy)
    • For constraints where LMO is cheap and projection is expensive (e.g., ℓ1/ℓ∞-type sets, trace norm), use these bounds to set realistic accuracy targets (e.g., modest ε for practical time frames).
    • Sectors/use cases:
    • Finance: portfolio optimization with norm or simplex-like constraints.
    • Healthcare: radiation therapy planning with convex dose constraints.
    • Energy: convex relaxations in dispatch/planning where LMOs are easy.
    • Assumptions/dependencies:
    • Many popular sets (polytopes, simplices) are not strongly convex; standard bounds (O(1/T)) still apply. The main message survives: projection-free methods cannot generally achieve linear rates without additional structure.
  • Curriculum and internal training for optimization teams (academia, industry R&D)
    • Use the zero-chain and resisting-oracle constructions to educate teams about fundamental limitations and to avoid misdirected efforts at “universal acceleration” of LMO-based methods.
    • Assumptions/dependencies:
    • Requires minimal implementation effort for illustrative experiments.

Long-Term Applications

  • Development of randomized or adaptive FO-LMO methods (software, academia)
    • The lower bound targets deterministic methods; investigate whether carefully designed randomness or adaptivity can bypass worst-case constructions (or at least improve constants).
    • Potential outcomes:
    • New FW variants with provable gains on broad, non-worst-case classes.
    • Robust hybrid strategies that incorporate occasional projections or other oracles.
    • Assumptions/dependencies:
    • Requires new theoretical advances to establish whether randomness can fundamentally help in the LMO model.
  • Affine-invariant and PEP-driven optimal method design (academia, software)
    • Extend Performance Estimation Problem (PEP) frameworks to LMO settings (especially over strongly convex/smooth sets) to:
    • Identify constant-optimal methods (tightening the 1/528 vs 9/2 gap).
    • Produce certifiably optimal stepsize/line-search policies.
    • Products/tools:
    • PEP-based method tuner auto-generating optimal FW variants for a given oracle model.
    • Assumptions/dependencies:
    • Requires generalizing PEP to the LMO setting with set-geometry constraints; nontrivial but motivated by prior PEP advances.
  • Automated “feasibility-aware” method advisors in general-purpose solvers (software platforms)
    • Build system tools that:
    • Detect or bound set parameters (diameter, α-strong convexity, β-smoothness) and objective smoothness/conditioning online.
    • Recommend or auto-switch among LMO, projected, or proximal methods based on target ε and time budget.
    • Assumptions/dependencies:
    • Estimating α, β, and diam(S) online is challenging; will rely on conservative bounds or user-provided metadata.
  • Set reparameterization and preconditioning to approach ball-like geometry (ML, robotics, energy)
    • Learn affine transformations or equivalent constraint formulations that make S closer to a ball to exploit faster projection-based methods.
    • Examples:
    • Robotics motion planning: transform constraints to isotropic forms to accelerate convergence.
    • ML regularization: replace hard constraints with proximal-friendly penalties when feasible.
    • Assumptions/dependencies:
    • Requires domain expertise to ensure equivalence and maintain problem fidelity.
    • Gains depend on how close transformed sets get to ball-like geometry.
  • Cross-oracle frameworks combining LMO and gauge oracles (software, academia)
    • Leverage dual gauge models and support-function access to design algorithms that switch between oracle types depending on progress and target ε.
    • Potential outcomes:
    • New frameworks that avoid LMO-specific bottlenecks identified by the lower bound while retaining projection-free advantages when beneficial.
    • Assumptions/dependencies:
    • Needs engineering to expose multiple oracles and theory to guarantee convergence under switching.
  • Sector-specific solver strategies under fundamental limits (healthcare, finance, education, energy)
    • Embed lower-bound awareness into domain solvers to:
    • Choose viable ε given regulatory or operational time constraints (e.g., clinical planning turnarounds, trading windows, grid scheduling horizons).
    • Justify solver choices in audits and regulatory filings by referencing provable limits (risk management and governance).
    • Assumptions/dependencies:
    • Requires sector-specific pipelines capable of incorporating methodological justifications and runtime predictions.
  • Cloud optimization services with reliability guarantees (software, cloud providers)
    • Offer “feasibility-aware SLAs” that map target accuracy and constraints to worst-case runtime guarantees using these lower bounds.
    • Assumptions/dependencies:
    • Requires standardized metadata about constraint sets and oracles, and robust monitoring to verify SLA adherence.

Notes on Feasibility and Scope

  • Determinism vs. randomness: The main lower bound applies to deterministic FO-LMO methods. Whether randomized strategies can avoid worst cases remains an open research question.
  • Dimensionality: Constructions assume high-dimensional regimes (d ≳ number of iterations). Most large-scale applications do satisfy this, but in very low dimension the worst-case may be less constraining.
  • Set geometry: Many practical sets (e.g., polytopes) are not strongly convex. The paper’s secondary result suggests that modest levels of smoothness (β = Ω(1/√ε)) do not unlock faster rates for span-based methods either.
  • Oracle model: Results assume exact LMOs. Approximate LMOs typically worsen, not improve, performance relative to the lower bound.
  • Objective conditioning: Even with perfectly conditioned objectives (μ = L), LMO-based methods face geometric barriers—improving objective conditioning alone does not yield linear convergence without changing the oracle/model.
  • Special cases (balls): When S is a Euclidean ball, projections are trivial (or computable via LMO), and linear convergence via projected methods is attainable—these are notable exceptions to the general limitation.

Glossary

  • Affine-covariant: Invariant under affine transformations; analyses that do not depend on a fixed inner product or coordinate system. "provided Frank-Wolfe with ``affine-covariant'' convergence theory"
  • Alpha-strongly convex set (α-strongly convex set): A convex set with curvature bounded away from zero, so midpoints “bulge” inward in a quantified way. "$\text{$Sis is \alpha$-strongly convex if\quad } \lambda x + (1-\lambda)y + B\left(0,\ \frac{\lambda(1-\lambda)\alpha}{2}\|y-x\|_2^2\right) \subseteq S$"
  • Argmin: The set of points where a function attains its minimum value. "z=LMOS(p)argminxSp,xz = \mathtt{LMO}_S(p) \in argmin_{x\in S} \langle p, x\rangle"
  • Beta-smooth set (β-smooth set): A convex set whose boundary normals change in a Lipschitz way with constant β. "$\text{$Sisis\beta$-smooth if\quad } \|n_y - n_x\|_2 \leq \beta \|y-x\|_2 \quad \forall x,y\in\operatorname{bdry}S$"
  • Big-O (O) notation: Asymptotic upper bound expressing rate of growth. "f(xT)minxSf(x)O(Ldiam(S)2T).f(x_T) - \min_{x\in S}f(x) \leq \mathcal{O}\left(\frac{L\, \mathrm{diam}(S)^2}{T}\right)."
  • Big-Omega (Ω) notation: Asymptotic lower bound expressing minimal rate needed. "fewer than Ω(Ldiam(S)2/ε)\Omega(\sqrt{L\, \mathrm{diam}(S)^2/\varepsilon}) iterations."
  • Big-Theta (Θ) notation: Asymptotically tight bound (both upper and lower) on growth rate. "this establishes the optimal iteration complexity for strongly convex constrained LMO-span optimization as Θ(Ldiam(S)2/ε)\Theta(\sqrt{L\, \mathrm{diam}(S)^2/\varepsilon})."
  • Compact convex set: A convex set that is closed and bounded. "For the minimization of an LL-smooth convex function ff over a compact convex set SS"
  • Convex hull: The smallest convex set containing given points. "xk+1conv{xk,zk+1}x_{k+1} \in \operatorname{conv}\{x_k, z_{k+1}\}."
  • Diameter (of a set): The maximum distance between any two points in a set. "We consider compact convex sets SS with diameter diam(S)=maxx,ySxy2\mathrm{diam}(S)=\max_{x,y\in S}\|x-y\|_2"
  • Exact line search: Choosing step size by minimizing the objective along the search direction. "an exact line search implementation of Frank-Wolfe would set xk+1x_{k+1} as the minimizer of ff on the segment [xk,zk+1][x_k,z_{k+1}]."
  • First-order oracle: An oracle returning function value and gradient at a query point. "each iteration of the algorithm can make one call to a first-order oracle, returning (f(xk),f(xk))(f(x_k), \nabla f(x_k))"
  • First-Order Linear Minimization Oracle (FO-LMO) methods: Methods that use gradient information and an LMO to generate iterates. "a FO-LMO method generates sequences of search directions pkRdp_k\in\mathbb{R}^d, linear minimization solutions $z_{k+1} = \mathtt{LMO}_S(p_k)\inargmin_{x\in S}\langle p_k, x\rangle$, and iterates xk+1x_{k+1}."
  • Frank-Wolfe methods: Projection-free first-order methods that use linear minimization over the feasible set. "Frank-Wolfe methods and the broader family of ``projection-free'' algorithms using a linear minimization subroutine have found renewed interest due to their scalability."
  • Gradient-span method: A method whose updates lie in the span of past gradients (and possibly other limited information). "can prevent {\it any gradient-span method} from having made substantial progress"
  • Iteration complexity: The number of oracle calls/iterations needed to reach a desired accuracy. "Our theory then bounds iteration complexity to measure the minimum number of such pairs of oracle calls needed to reach a target suboptimality."
  • Karush–Kuhn–Tucker (KKT) multiplier: A Lagrange multiplier satisfying the KKT optimality conditions in constrained optimization. "where λ>0\lambda > 0 is the unique KKT multiplier ensuring 12z22+i=1dwizi=C2\frac{1}{2}\|z\|_2^2 + \sum_{i=1}^d w_i |z_i| = C^2."
  • Linear Minimization Oracle (LMO): An oracle that returns a minimizer of a linear function over the feasible set. "given access to a Linear Minimization Oracle (LMO) for the constraint set"
  • LMO-span method: An LMO-based method whose search directions and iterates are restricted to spans/convex hulls of past information. "a LMO-span method generates sequences of search directions pkp_k, linear minimization solutions zk+1z_{k+1}, and iterates xk+1x_{k+1}"
  • L-smooth function: A differentiable function whose gradient is L-Lipschitz. "$\text{$fisisL$-smooth if\quad } \|\nabla f(y)-\nabla f(x)\|_2\leq L\|y-x\|_2 \quad \forall x,y\in \mathbb{R}^d.$"
  • Minimax optimal: Optimal in the worst-case sense across all problem instances within a class. "leaves open the question of determining exactly minimax optimal algorithms and hard problem instances."
  • Minkowski gauge: A function measuring how much a vector must be scaled to enter a set; dual to the support function under polarity for sets containing the origin. "a polar transformation establishes this as dual to assuming first-order access to the Minkowski gauge yinf{γ>0y/γS}y \mapsto \inf\{ \gamma>0 \mid y/\gamma \in S\}."
  • Minkowski sum: The set obtained by elementwise addition of two sets. "by taking Minkowski sums with a ball B(0,1/β)B(0,1/\beta)."
  • Normal cone: The set of vectors normal to a convex set at a point, defining supporting hyperplanes. "Normal vectors nNS(x):={nn,yx0 yS}n \in N_S(x) := \{n \mid \langle n, y-x\rangle \leq 0\ \forall y\in S\}"
  • Open-loop stepsize: A predetermined step size schedule independent of feedback. "a fixed ``open-loop'' stepsize implementation would fix xk+1=θkxk+(1θk)zk+1x_{k+1} = \theta_k x_k + (1-\theta_k)z_{k+1}"
  • Oracle complexity: Complexity measured in terms of the number of oracle queries required to achieve a target accuracy. "We consider the oracle complexity of constrained convex optimization given access to a Linear Minimization Oracle (LMO) for the constraint set"
  • Performance Estimation Problem (PEP): A framework that formulates tight worst-case performance bounds of optimization methods as optimization problems. "the Performance Estimation Problem (PEP) techniques pioneered by~\cite{drori2014performance,Interpolation,Interpolation2} have provided such theory."
  • Polar transformation: A duality mapping between a set and its polar, relating support functions and gauges. "For sets with 0S0\in S, a polar transformation establishes this as dual to assuming first-order access to the Minkowski gauge"
  • Projection-free algorithms: Optimization algorithms that avoid orthogonal projections, using LMOs instead. "the broader family of ``projection-free'' algorithms using a linear minimization subroutine"
  • Resisting oracle: An adversarial oracle that reveals information adaptively to make the problem hard for the algorithm. "By combining this with an adversarial ``resisting oracle'', lower bounds against {\it all deterministic gradient methods} were achieved."
  • Slater's condition: A regularity condition ensuring strong duality and KKT applicability via existence of a strictly feasible point. "Slater's condition holds here as the origin is in the interior (h(0)=0<C2h(0) = 0 < C^2)."
  • Soft-thresholding operator: An elementwise shrinkage mapping that zeroes out small components and reduces larger ones by a threshold. "is given elementwise by the soft-thresholding operator"
  • Strong convexity (of a function): A function property implying quadratic lower curvature and uniqueness of minimizers. "$\text{$fisis\mu$-strongly convex if\quad } f(\lambda x + (1-\lambda)y)\leq \lambda f(x) + (1-\lambda)f(y) - \frac{\lambda(1-\lambda)\mu}{2}\|y-x\|_2^2$"
  • Strongly convex set: A convex set with strictly positive curvature everywhere (no flat faces). "Over the problem class of strongly convex constraint sets SS, our main result proves that no such deterministic method can guarantee a final objective gap less than ε\varepsilon"
  • Support function: Maps a direction to the maximal inner product with points in a set; characterizes a convex set. "first-order access to the support function psup{p,xxS}p \mapsto \sup\{\langle p, x\rangle \mid x\in S\}."
  • Suboptimality: The gap between the objective value at a point and the optimal value. "needed to reach a target suboptimality."
  • Orthogonal projection oracle: An oracle that computes the Euclidean projection of a point onto the feasible set. "an LMO can be used to explicitly compute orthogonal projections onto the feasible region SS."
  • Zero-chain property: A construction where each iteration reveals information about only one coordinate, slowing progress. "This property, known as a ``zero-chain'' property, can prevent {\it any gradient-span method} from having made substantial progress"

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 125 likes about this paper.