Quasar-Convexity: Optimizing Nonconvex Functions

Updated 26 November 2025

Quasar-convexity is a relaxation of convexity defined by a one-point gradient condition that bridges star-convexity and general nonconvexity for global optimization guarantees.
Algorithms leveraging quasar-convexity, including deterministic, stochastic, and proximal methods, achieve accelerated convergence rates similar to those in convex optimization.
The framework has practical applications in machine learning, dynamical systems, reinforcement learning, and distributed or minimax settings, extending classical optimization bounds.

Quasar-convexity is a structural relaxation of convexity that characterizes a broad class of nonconvex functions enabling global optimization guarantees typically unattainable in generic nonconvex settings. This concept, and its generalizations such as strong quasar-convexity, proximal-quasar-convexity, and generalized quasar-convexity (GQC), provide frameworks in which first-order and proximal-type algorithms attain accelerated convergence rates comparable to convex optimization, with substantial applications in machine learning, dynamical systems, and reinforcement learning. Quasar-convexity interpolates between star-convexity and broader one-point relaxation hierarchies, supporting both deterministic and stochastic algorithmic guarantees and extending naturally to constrained, stochastic, distributed, and minimax settings.

1. Foundational Definitions and Generalizations

The canonical definition for a differentiable function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ with a global minimizer $x^*$ and $\gamma \in (0,1]$ is: $f(x^*) \geq f(x) + \frac{1}{\gamma} \langle \nabla f(x), x^* - x \rangle \quad \forall x \in \mathbb{R}^n,$ alternatively,

$f(x) - f(x^*) \leq \frac{1}{\gamma} \langle \nabla f(x), x - x^* \rangle.$

When $\gamma=1$ this is star-convexity; for general $\gamma$ it allows nonconvexity, relaxing global affine support to a one-point condition anchored at the minimizer. Strong quasar-convexity adds a quadratic term: $f(x^*) \geq f(x) + \frac{1}{\gamma} \langle \nabla f(x), x^* - x \rangle + \frac{\mu}{2} \|x^* - x\|^2,$ with $\mu > 0$ yielding uniqueness and robust error contraction (Jin, 2020, Hermant et al., 2024, Brito et al., 4 Sep 2025, Khanh et al., 28 Oct 2025).

Block-structured settings, as formalized in Generalized Quasar-Convexity (GQC), assign per-block parameters $\gamma_i$ for variables $x_i$ , defined over product spaces (e.g., probability simplices): $f(x^*) - f(x) \geq \sum_{i=1}^d \frac{1}{\gamma_i} \langle F_i(x), x^*_i - x_i \rangle,$ where $F_i$ is a general internal oracle, often but not necessarily a gradient; smaller $\gamma_i$ signifies greater nonconvexity per block (Ding et al., 2024). Extension to minimax settings defines Generalized Quasar-Convexity-Concavity (GQCC) with surrogate operators and weighting.

In constrained or compositional scenarios, proximal-quasar-convexity replaces $\nabla f$ with the proximal-gradient mapping, ensuring the structure persists under constraints (Farzin et al., 4 May 2025, Martínez-Rubio, 2 Oct 2025).

2. Algorithmic Implications and Complexity Results

Quasar-convexity enables first-order and related algorithms to achieve convergence rates much sharper than for general nonconvex functions, frequently matching those in convex optimization up to $\gamma$ -dependent factors.

Deterministic Gradient Descent and Acceleration: For smooth, $\gamma$ -quasar-convex $f$ , deterministic accelerated methods achieve

$f(x_T) - f(x^*) = \widetilde{O}\left(\frac{L \|x_0 - x^*\|^2}{\gamma T^2}\right),$

while strongly quasar-convex functions permit linear rates:

$f(x_T) - f(x^*) \leq C (1 - \gamma \sqrt{\mu / L})^T.$

These mirror classic convex/strongly convex scenarios, with degradation in the $1/\gamma$ factor (Jin, 2020, Wang et al., 2023, Hermant et al., 2024).

Stochastic Optimization: Under quasar-convexity, SGD and variance-reduced methods yield $O(1/\sqrt{T})$ to $O(1/T)$ convergence, and $O(\log(1/\epsilon))$ for strong variants. Adaptive stochastic mirror-descent frameworks (e.g., QASGD, QASVRG) exploit this for finite-sum and online settings (Fu et al., 2023).
Zeroth-order (Gaussian smoothing) algorithms: Randomized algorithms using smoothed function value queries (instead of gradients) inherit complexity bounds $O(n/\epsilon)$ (QC) or $O(n \log(1/\epsilon))$ (SQC), with variance-reduction tightening solution neighborhoods (Farzin et al., 4 May 2025).
Proximal Point and Constrained Methods: The proximal point algorithm (PPA), when applied to quasar-convex functions, attains sublinear $O(1/\epsilon)$ complexity; for strong quasar-convexity, linear contraction and $O(\log(1/\epsilon))$ (Brito et al., 4 Sep 2025). Accelerated schemes for constrained quasar-convex minimization achieve $O(\tilde{1}/(\gamma\sqrt{\epsilon}))$ rates, with projected gradient descent and Frank–Wolfe procedures scaling sublinearly in $1/(\gamma^2\epsilon)$ (Martínez-Rubio, 2 Oct 2025).
Optimistic Mirror Descent in GQC/GQCC: For multi-block or multi-distribution setups, OMD achieves adaptive convergence:

$\widetilde{O}\left(\left( \sum_{i=1}^d 1/\gamma_i \right)\epsilon^{-1} \right),$

which is strictly faster than standard mirror descent in block dimension $d$ (Ding et al., 2024).

Quasar-convexity sits strictly between star-convexity (one-point, $\gamma=1$ ), strong star-convexity, and convexity, encapsulating a one-point landscape lower bound rather than a global pairwise affine minorization. The inclusion chain is: $\text{(strong convex)} \implies \text{(strong star-convex)} \implies \text{(strong quasar-convex)}.$ It is strictly stronger than Polyak-Łojasiewicz (PL) and weak quasi-convexity inequalities and is distinct from weak convexity (curvature bound) or tilted convexity (Pun et al., 2024, Khanh et al., 28 Oct 2025, Martínez-Rubio, 2 Oct 2025).

Star-quasiconvexity (SSQC), an overclass, unifies convex, star-convex, quasiconvex, and quasar-convex functions and is characterized geometrically by all sublevel sets being star-shaped with respect to the set of global minimizers, admitting linear convergence of both gradient and proximal point algorithms under strong forms (Khanh et al., 28 Oct 2025).

4. Structural and Geometrical Interpretation

Quasar-convex functions enforce that along any ray starting from a minimizer, $f$ does not develop flat regions or spurious critical points, and the gradient maintains a sufficiently acute angle to the direction toward minimizers. This avoids pathological local minima but allows rich nonconvexity away from the minimizer, including regions of negative curvature, oscillation, or complicated level sets (especially in composite constructions $f(\|x\|)\cdot g(x/\|x\|)$ ) (Hermant et al., 2024, Brito et al., 4 Sep 2025).

GQC extends this to product spaces and block-wise landscapes, permitting distinct convexity-like parameters per variable block and accommodating general function oracles.

5. Applications and Model Classes

Quasar-convexity and its variants have been identified in several high-impact model classes:

Linear Dynamical System Identification: Population risks for stable LDS are quasar-convex, facilitating global guarantees for system identification via stochastic or zeroth-order methods (Fu et al., 2023, Farzin et al., 4 May 2025).
Generalized Linear Models (GLMs): When the link function is monotonic (e.g., leaky-ReLU, logistic), the loss landscape is (strongly) quasar-convex, even in time-varying or stochastic regimes (Pun et al., 2024, Wang et al., 2023).
Reinforcement Learning: Policy optimization in discounted MDPs and two-player Markov games admit a GQC or GQCC structure; e.g., the performance-difference lemma in MDPs realizes the GQC lower bound, enabling dimension-free policy gradient convergence rates (Ding et al., 2024).
Matrix Factorization/Completion: Problems under restricted isometry property (RIP) display no spurious minima and satisfy strong quasar-convex inequalities (Jin, 2020).
Composite/Constrained Learning: Riemannian optimization and problems with convex constraints inherit proximal-quasar-convexity, leading to efficient constrained algorithms (Martínez-Rubio, 2 Oct 2025, Farzin et al., 4 May 2025).

6. Extensions: Minimax, Online, and Distributed Optimization

Quasar-convexity generalizes to online and dynamic settings, with regret bounds scaling in path variation and cumulative noise, supporting settings where minimizers drift over time (Pun et al., 2024). In minimax and game-theoretic optimization (GQCC), block-wise surrogate functions and contraction mappings ensure that decentralized variants of OMD deliver nearly optimal Nash-equilibrium finding with explicit iteration-complexity controlled by composite block-wise parameters (Ding et al., 2024). These tools have demonstrated tight last-iterate and average iterate bounds, removing dependency on problem dimension in several scenarios.

7. Open Directions and Limitations

Several open problems remain. The dependence of rates on $1/\gamma$ may not be tight, and lower bounds in stochastic, higher-order, or variance-reduced methods are only partially understood (Jin, 2020). Extensions to infinite-dimensional, non-Euclidean, or higher-order (e.g., Riemannian) settings are active research areas, as is the identification of further problem classes with GQC or SQC structure (Martínez-Rubio, 2 Oct 2025, Farzin et al., 4 May 2025). Certain convexification, acceleration, or regularization tricks employed in convex optimization do not naively transfer to the quasar-convex setting because of the one-point anchoring requirement.

Table: Summary of Algorithmic Complexities under Quasar-Convexity

Setting	Condition	Algorithm/Class	Iteration/Oracle Complexity
Unconstrained, deterministic	$\gamma$ -QC, $L$ -smooth	(A)GD	$\widetilde{O}(\sqrt{L R^2/(\gamma \epsilon)})$
Unconstrained, strongly QC	$(\gamma,\mu)$ -SQC	(A)GD, PPA	$O(\log(1/\epsilon)/\gamma)$
SGD (stochastic)	$\gamma$ -QC	QASGD	$O(R^2 L/(\gamma\epsilon) + R^2\sigma^2/(\gamma^2\epsilon^2))$
Proximal point algorithm (SQC)	$(\kappa, \gamma)$ -SQC	PPA	$O(\log(1/\epsilon))$
OMD (blockwise, GQC)	GQC( $\gamma_i$ )	OMD	$\widetilde{O}((\sum_i 1/\gamma_i) \epsilon^{-1})$
Zeroth-order (QC/SQC)	(P)QC/SQC, Lipschitz	ZO-GS	$O(n/\epsilon)$ (QC), $O(n\log(1/\epsilon))$ (SQC)
Constrained (QC)	Proximal QC	Acc. Prox-Point	$O(\widetilde{1}/(\gamma\sqrt{\epsilon}))$

All rates are for $\epsilon$ -optimality in $f(x) - f(x^*) \leq \epsilon$ . Here, $L$ is the smoothness constant, $R$ the diameter, $n$ the ambient (block) dimension, $\sigma^2$ the variance, and $\kappa$ a mixing parameter.

Quasar-convexity and its generalizations unify and extend much of the landscape-friendly structure that enables tractable nonconvex optimization, acting as a bridge between the theory of convex analysis and the real-world solvability of complex learning and control problems. The landscape-structural and algorithmic innovation around these classes is a major driver of progress in scalable first-order nonconvex optimization.