Successive Convex Optimization (SCO)

Updated 3 February 2026

Successive Convex Optimization (SCO) is a framework that iteratively replaces nonconvex objectives with convex surrogates which are tangent at the current iterate, ensuring tractable subproblems and convergence.
SCO employs iterative convex proxy minimization that preserves key first-order information, with variants addressing stochastic settings, distributed architectures, and block-coordinate updates.
Applications span machine learning, optimal control, communications, and signal processing, where empirical results show faster convergence and improved solution quality compared to traditional methods.

Successive Convex Optimization (SCO), also commonly known as Successive Convex Approximation (SCA), is a unifying algorithmic framework for tackling nonconvex optimization problems by iteratively constructing and minimizing convex surrogate subproblems. These procedures are designed to preserve essential first-order information of the original problem, enabling provable convergence guarantees under general smoothness or boundedness conditions. SCO methods have broad application in large-scale nonconvex optimization, nonconvex optimal control, machine learning, communications, signal processing, and distributed optimization.

1. Principle and General Framework

The central principle of SCO is to iteratively replace the nonconvex objective (and possibly constraints) with tractable, convex surrogates that are tangent to the original function at the current iterate. At iteration $t$ , given the current point $x^t$ , a convex function $\tilde f(x; x^t)$ is constructed such that

$\tilde f(\cdot; x^t)$ is convex in $x$ ,
$\tilde f(x^t; x^t) = f(x^t)$ ,
$\nabla_x \tilde f(x^t; x^t) = \nabla f(x^t)$ , and typically (for global upper-bound surrogates) $\tilde f(x; x^t) \geq f(x)$ for all $x$ .

The basic update is

$x^{t+1} = \arg\min_{x \in \mathcal X} \left\{ \tilde f(x; x^t) + \frac{1}{2\alpha_t} \| x - x^t \|^2 \right\},$

where $\mathcal X$ is a convex feasible set and $\alpha_t > 0$ is a proximal regularization parameter or step-size. This template encompasses various schemes, including majorization–minimization, trust-region methods, and proximal algorithms (Scutari et al., 2018).

2. Algorithmic Realizations and Variants

Several algorithmic variants have been developed to address specific nonconvexities, constraints, distributed architectures, and stochastic scenarios:

Plain SCA for Smooth Nonconvex Problems: Each iterate solves a convex subproblem matching function value and gradient at the expansion point, leading to convergence to first-order stationary points at $O(1/\epsilon^2)$ rates under $L$ -smoothness and suitable step-size (Bedi et al., 2019, Scutari et al., 2018).
Perturbed SCA (P-SCA) for Saddle-Point Escape: Standard SCA can stall at strict saddle points. Introducing random perturbations when the gradient norm is small enables convergence to $\epsilon$ -second-order stationary points (i.e., local minima), with overall complexity $O(1/\epsilon^2 \cdot \mathrm{polylog}\,d)$ (Bedi et al., 2019).
Block-Coordinate and Parallel SCA: For problems with block structure, multiple variable blocks (coordinates) can be updated in parallel by minimizing local convex surrogates, improving scalability. Theoretical guarantees cover both cyclic and randomized block updates, with $O(1/\epsilon)$ complexity for nonconvex objectives under mild assumptions (Razaviyayn et al., 2014).
Distributed SCA: In multi-agent networks, each agent constructs a local surrogate and coordinates updates via consensus or gradient-tracking, achieving convergence to stationary solutions globally (Lorenzo et al., 2020, Scutari et al., 2018).
Stochastic SCA: When objectives or constraints involve expectations over random variables, surrogates are constructed using sampled gradients and recursive averaging (see CSSCA/SSCA frameworks). These schemes guarantee almost sure convergence to KKT points under diminishing step-sizes and suitable surrogate consistency (Liu et al., 2018, Ye et al., 2019, Liu et al., 2018, Idrees et al., 2024).
Convexification for Optimal Control (SCP/SCvx): SCO underpins direct methods in optimal control, especially for trajectory optimization under nonconvex dynamics and constraints. Here, complex constraints are handled via exact penalization and linearization, with trust regions and virtual controls ensuring feasibility and strong convergence properties (Mao et al., 2018, Mao et al., 2017, Song et al., 2019, Elango et al., 2024).

3. Guarantees: Convergence and Complexity

Theoretical analysis of SCO methods provides rigorous guarantees under standard smoothness and boundedness assumptions:

Variant	Stationarity Guarantee	Iteration Complexity	Key References
Plain SCA	1st-order stationarity ( $\\|\nabla f(x)\\| \leq \epsilon$ )	$O(1/\epsilon^2)$	(Bedi et al., 2019, Scutari et al., 2018)
P-SCA (with perturbation)	2nd-order stationarity (local minimum)	$O(1/\epsilon^2\,\mathrm{polylog}\,d)$	(Bedi et al., 2019)
Block-coordinate/parallel	1st-order (nonconvex, deterministic/randomized)	$O(1/\epsilon)$	(Razaviyayn et al., 2014)
Stochastic SCA	KKT (almost sure, general stochastic)	See below	(Liu et al., 2018, Ye et al., 2019, Idrees et al., 2024)
SCvx for optimal control	KKT for original/penalized problem	Superlinear (under KL)	(Mao et al., 2018, Mao et al., 2017)

In stochastic settings, complexity is typically defined in terms of the number of stochastic first-order oracle (SFO) calls needed to achieve $\epsilon$ -stationarity in expectation. Recent methods (e.g., CoSTA) achieve near-optimal $O(\epsilon^{-3/2})$ SFO complexity — matching the lower bounds for unconstrained stochastic nonconvex optimization (Idrees et al., 2024).

4. Surrogate Design and Practical Considerations

SCO success hinges on the careful construction of convex surrogates, tailored to the problem structure:

Quadratic Majorant: $\tilde f(x; x^t) = f(x^t) + \nabla f(x^t)^T (x-x^t) + \frac{C}{2}\|x-x^t\|^2$
Structured/Partial Linearization: Linearize only nonconvex terms, retain exact convex terms, and add a proximal regularization (Scutari et al., 2018, Liu et al., 2018).
Adaptive Parameter Selection: Step-sizes or proximal regularization are chosen based on the surrogate curvature and Lipschitz constants. Practical implementation often replaces unknown constants with empirical or order-of-magnitude estimates (Bedi et al., 2019).
Penalty and Trust Region Methods: For feasibility and rapid convergence in nonconvex optimal control, penalization of dynamic residuals, virtual controls, and shrinking trust regions are effective for preventing artificial infeasibility (Mao et al., 2018, Mao et al., 2017, Elango et al., 2024).

Efficient convex solvers (e.g., QP, SOCP, interior-point, dual ascent) make per-iteration computational cost manageable even for large-scale instances (Mao et al., 2018, Scutari et al., 2018).

5. Selected Applications and Empirical Results

SCO has achieved broad impact in:

Communications and Signal Processing: Optimizing transmit covariances and beamforming under nonconvex stochastic constraints (Liu et al., 2018, Qian et al., 2024).
Machine Learning: Training neural networks, distributed nonconvex learning, and sparse regression (Lorenzo et al., 2020, Idrees et al., 2024).
Trajectory Optimization and Control: Spacecraft trajectory design, real-time quadrotor motion planning, collision-avoidance, and rocket landing under nonlinear and nonconvex state constraints (Mao et al., 2018, Song et al., 2019, Mao et al., 2017, Elango et al., 2024).
Resource Allocation & Power Control: Massive-MIMO hybrid precoding, wireless resource allocation with stochastic channel models (Liu et al., 2018, Qian et al., 2024).

Empirical studies consistently show that SCO-based methods provide faster convergence than plain gradient descent, high-quality local minima, and—when applicable—efficient saddle-point escape (Bedi et al., 2019, Scutari et al., 2018). In convexified trajectory optimization, SCvx/SCO methods achieve real-time performance with global optimality for convexized subproblems and rapid convergence (often a few tens of iterations) (Mao et al., 2017, Mao et al., 2018, Elango et al., 2024).

6. Open Directions and Limitations

Active research topics include:

Global Certificate Extensions: While SCO converges to stationary points, global optimality in nonconvex settings remains out of reach except under special convexification or exact-penalty settings.
Refined Complexity Bounds: Closing the gap between theory and practice regarding tuning surrogate curvature and step-sizes; joint optimization for worst-case iteration complexity and per-iteration wall-clock time (Idrees et al., 2024).
Saddle-Point Escaping Beyond Smooth Unconstrained Case: Extending second-order guarantees to constrained, stochastic, and distributed SCA variants is an open problem (Bedi et al., 2019).
Scalability in Large-Scale Networks: Communication overhead can outweigh computational speed-up in distributed/parallel SCA with many agents; efficient orchestration is required (Razaviyayn et al., 2014, Scutari et al., 2018, Lorenzo et al., 2020).
Adaptive Surrogate Construction: Developing surrogates leveraging problem structure, such as structured sparsity, decoupling via inequalities (e.g., AM/GM bounds (Qian et al., 2024)) for improved empirical performance.

7. Summary Table: Core SCO Algorithmic Structure

Step	Description
1. Surrogate construction	Build convex $\tilde f(x; x^t)$ tangent to $f$ at $x^t$
2. (Optional) Convexification of constraints	Build convex surrogates for constraints
3. Solve convex subproblem	$x^{t+1} = \arg\min \tilde f(x; x^t) +$ (reg.) s.t. (surrogate constraints)
4. (Optional) Relaxed update/stepsize	$x^{t+1} \leftarrow x^t + \eta_t (x^{t+1} - x^t)$
5. Termination/test for stationarity	Stop if stationarity/feasibility achieved; else $t \rightarrow t+1$

These steps, adjustable for distributed, stochastic, or block-coordinate environments, constitute the core of all modern SCO/SCA methods (Scutari et al., 2018, Lorenzo et al., 2020, Liu et al., 2018, Razaviyayn et al., 2014).

References:

(Bedi et al., 2019): Escaping Saddle Points with the Successive Convex Approximation Algorithm
(Scutari et al., 2018): Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization
(Razaviyayn et al., 2014): Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
(Liu et al., 2018): Stochastic Successive Convex Approximation for Non-Convex Constrained Stochastic Optimization
(Lorenzo et al., 2020): Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation
(Mao et al., 2018): Successive Convexification: A Superlinearly Convergent Algorithm for Non-convex Optimal Control Problems
(Mao et al., 2017): Successive Convexification of Non-Convex Optimal Control Problems with State Constraints
(Song et al., 2019): Solar-Sail Deep Space Trajectory Optimization Using Successive Convex Programming
(Idrees et al., 2024): Constrained Stochastic Recursive Momentum Successive Convex Approximation
(Qian et al., 2024): Applications of Inequalities to Optimization in Communication Networks: Novel Decoupling Techniques and Bounds for Multiplicative Terms Through Successive Convex Approximation
(Elango et al., 2024): Successive Convexification for Trajectory Optimization with Continuous-Time Constraint Satisfaction