Projected Gradient Ascent (PGA)

Updated 23 January 2026

Projected Gradient Ascent (PGA) is a method that iteratively updates solutions via gradient steps and projects them onto convex sets to maintain feasibility.
It is widely applied in optimization, machine learning, and operations research, ensuring that every iterate adheres to the problem's constraints.
Algorithmic variants, including boosted PGA and methods like DEPS, enhance performance and guarantee approximation ratios even in complex, nonconvex settings.

Projected Gradient Ascent (PGA) is a foundational iterative optimization method for maximizing differentiable (not necessarily convex) objective functions over closed convex constraint sets. Each iteration performs an ascent step along the gradient direction followed by a projection onto the feasible set, ensuring that iterates remain admissible under constraints. PGA is extensively studied and deployed in optimization, machine learning, operations research, reinforcement learning, meta-learning, and submodular maximization, both in deterministic and stochastic (Monte Carlo) forms.

1. Mathematical Formulation and Basic Properties

Let $f : \mathbb{R}^d \to \mathbb{R}$ be a differentiable function and $C \subseteq \mathbb{R}^d$ a closed convex set. The canonical PGA iteration is

$x_{k+1} = P_C\left(x_k + \alpha_k \nabla f(x_k)\right),$

where $P_C(y) = \arg\min_{x \in C} \|x - y\|^2$ denotes the Euclidean projection and $\alpha_k$ is the step size at iteration $k$ .

PGA produces a sequence $\{x_k\}$ that is always feasible: $x_k \in C$ for all $k$ . For convex $f$ and $C$ , the sequence of function values $\{f(x_k)\}$ is nondecreasing, and accumulation points are first-order stationary; that is, they satisfy

$\langle \nabla f(\bar{x}), y - \bar{x} \rangle \leq 0, \quad \forall y \in C$

(Felzenszwalb et al., 1 Nov 2025). With large step sizes tending to infinity, PGA approximates a linear maximization at each step, reducing to the Frank–Wolfe (conditional gradient) method.

2. Projection Operators and Step-Size Schedules

The projection operator used in PGA is given by

$P_C(y) = \arg\min_{x \in C} \|x - y\|^2.$

For box constraints $C = \{x \mid \ell_i \leq x_i \leq u_i\}$ , the projection is componentwise: $[P_C(y)]_i = \min\{u_i, \max\{\ell_i, y_i\}\}.$ Step-size rules in PGA include: constant step size $\alpha_k \equiv \alpha > 0$ , diminishing step size $\alpha_k \to 0$ with $\sum_k \alpha_k = \infty$ , and arbitrary large step sizes for specific objectives (Felzenszwalb et al., 1 Nov 2025). In stochastic settings or high-dimensional problems, adaptive methods such as Adam may be used in place of vanilla SGD (Bolland et al., 2020).

3. Theory: Stationarity and Convergence

For smooth convex objectives over compact convex sets, monotonicity of the objective is ensured, and all limit points of the PGA sequence are first-order stationary, provided step sizes are bounded away from zero (Felzenszwalb et al., 1 Nov 2025). The normal cone condition is established by passing to the limit in

$x_k + \alpha_k \nabla f(x_k) - x_{k+1} \in N_C(x_{k+1}),$

where $N_C(x)$ denotes the normal cone at $x \in C$ .

For projected subgradient ascent, a single projection suffices for linear objectives, providing $\varepsilon$ -approximate solutions as $\eta \to \infty$ (Felzenszwalb et al., 1 Nov 2025). In general, no rate optimality is implied for nonconvex or nonmonotone problems, and stationary points may be arbitrarily suboptimal in certain nonconvex landscapes (Zhang et al., 2024).

4. Applications in DR-Submodular Maximization

PGA is the standard method for maximization of continuous DR-submodular functions under convex constraints. DR-submodularity (diminishing returns) demands that the Hessian has nonpositive off-diagonal elements, and monotonicity requires $f(x) \geq f(y)$ whenever $x \geq y$ coordinatewise.

Guarantees for standard PGA:

For monotone $\gamma$ -weak DR-submodular $f$ , any stationary point gives $f(x) \geq \frac{\gamma^2}{1+\gamma^2} \cdot \mathrm{OPT}$ .
For the DR-submodular case ( $\gamma=1$ ), this is $1/2$-approximate.
For nonmonotone DR-submodular $f$ , stationary points can be arbitrarily bad (Zhang et al., 2024).

Boosting techniques modify the ascent direction to operate on a non-oblivious auxiliary function $F$ , enabling improved approximation guarantees:

For monotone $f$ , PGA on the auxiliary $F$ obtains $f(x) \geq (1-e^{-\gamma})\mathrm{OPT}$ .
For non-monotone $f$ over general convex sets, a tight $\frac{1-\|\bar{u}\|_\infty}{4}$ -approximation, with $\bar{u} = \arg\min_{z \in \mathcal{C}} \|z\|_\infty$ (Zhang et al., 2024).

Boosted PGA achieves superior approximation ratios, improved empirical convergence on coverage, facility location, and quadratic programming problems, and extends to online and bandit feedback regimes.

5. Algorithmic Instantiations: DEPS and Meta-Backward

DEPS (Direct Environment and Policy Search)

Bolland et al. introduce DEPS, a deep RL algorithm that employs projected stochastic gradient ascent to jointly optimize policy parameters $\theta$ and environment parameters $\phi$ : $\begin{align*} y^{(\theta)}_{k+1} &= \theta_k + \alpha_k \widehat{\nabla_\theta J}(\theta_k, \phi_k), \ y^{(\phi)}_{k+1} &= \phi_k + \beta_k \widehat{\nabla_\phi J}(\theta_k, \phi_k), \ \theta_{k+1} &= \mathrm{Proj}_\Theta(y^{(\theta)}_{k+1}), \ \phi_{k+1} &= \mathrm{Proj}_\Phi(y^{(\phi)}_{k+1}). \end{align*}$ The projection enforces feasibility, and Monte Carlo sampling with baselined policy gradients computes the unbiased updates. This approach enables joint optimization in co-design problems, outperforming alternative baselines in model-based RL settings (Bolland et al., 2020).

Meta-Backward in Federated Meta-Learning

In federated meta-learning, the Meta-Backward algorithm utilizes projected stochastic gradient ascent (P-SGA) to synthesize a global meta-model from locally optimized task-specific solutions. At each backward round, each agent performs a projected gradient step on a quadraticized surrogate loss, projecting onto a consensus ball around an aggregate solution: $\phi_i^k = \begin{cases} \phi_i^{k,0}, & \text{if } \|\phi_i^{k,0} - \Phi^{k+1}\|^2 \leq \delta_k \ \Phi^{k+1} + \sqrt{\delta_k} \frac{\phi_i^{k,0} - \Phi^{k+1}}{\|\phi_i^{k,0} - \Phi^{k+1}\|}, & \text{otherwise} \end{cases}$ This method eliminates the need for Hessian computations, matrix inversions, or double loops found in MAML/iMAML, and is empirically more energy-efficient (Elgabli et al., 2021).

6. Complexity, Limitations, and Extensions

Each PGA iteration is dominated by evaluating the gradient and performing a projection, whose complexity depends on the convex set geometry. For balls and boxes, projection is efficiently computable. Total iteration complexity depends on the required accuracy and smoothness constants.

Limitations:

Stationary points of PGA in nonconvex and nonmonotone DR-submodular landscapes may be arbitrarily poor (Zhang et al., 2024).
Knowledge of problem-specific parameters (e.g., $\gamma$ in DR-submodular maximization) is required for optimal boosting methods.
Projection may be computationally intensive for complex sets.

Algorithmic extensions include non-oblivious boosting for submodular objectives, variance-reduction, and adaptation to delayed, online, and bandit learning scenarios (Zhang et al., 2024).

7. Empirical Performance and Benchmarks

Empirically, PGA-based methods are shown to:

Converge rapidly to high-return solutions in dynamical system co-design (mass–spring–damper, microgrid, drone) (Bolland et al., 2020).
Achieve superior or optimal approximation ratios and regret in continuous submodular maximization compared to classical Frank–Wolfe and meta-Frank–Wolfe variants, both in offline and online learning regimes (Zhang et al., 2024).
Outperform bi-level meta-learning baselines in federated environments in terms of compute and energy efficiency, with a 5–10× reduction in resource usage on image and regression benchmarks (Elgabli et al., 2021).
Attain $\varepsilon$ -approximate solutions with single-projection steps for certain large-scale SDP problems (Felzenszwalb et al., 1 Nov 2025).

These findings underscore PGA's central role as a principled and general-purpose approach for constrained maximization in modern large-scale optimization.