Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conflict-Aware Projected Gradient Ascent

Updated 6 February 2026
  • Conflict-Aware Projected Gradient Ascent is an optimization framework that resolves gradient conflicts by projecting gradients to balance competing objectives.
  • It employs projection techniques—such as orthogonal projection and Gram–Schmidt procedures—to mitigate negative interference in multi-task and multi-objective settings.
  • CAPGA has demonstrated improved convergence, fairness, and efficiency in applications including federated unlearning, LLM alignment, and multi-agent reinforcement learning.

Conflict-Aware Projected Gradient Ascent (CAPGA) refers to a class of optimization techniques that resolve conflicts between multiple objectives during gradient-based optimization by explicitly adjusting, projecting, or reweighting gradient vectors associated with those objectives. The methodology is particularly significant in multitask learning, federated unlearning, multi-objective optimization, and reinforcement learning, where simultaneous improvement of several possibly competing objectives is inherently challenging. CAPGA schemes offer principled conflict resolution at the level of gradients, aiming to guarantee favourable tradeoffs, fairness, or constraints, while maintaining computational efficiency.

1. Multitask and Multi-Objective Optimization Problem Formulation

In canonical multitask learning (MTL), one optimizes a shared model with parameter vector θRd\theta\in\mathbb{R}^d over KK tasks, each associated with a loss function Li(θ)L_i(\theta). The overall goal is to minimize a weighted sum or average:

L(θ)=1Ki=1KLi(θ).L(\theta) = \frac{1}{K} \sum_{i=1}^K L_i(\theta).

Each task provides a gradient gi=θLi(θ)g_i = \nabla_\theta L_i(\theta). If the gradients are aligned, naive gradient descent on L(θ)L(\theta) is efficient. However, when the gradients are not aligned (i.e., gradient conflicts), naive aggregation can degrade performance for some or all tasks, stall convergence, or induce negative transfer. This is formally observed when

gigj<0,g_i^\top g_j < 0,

indicating that improvement in one task may worsen another (Bohn et al., 2024).

Similarly, in multi-objective optimization, one seeks Pareto-critical points where no descent direction improves all objectives. Gradient conflicts are again central, with a solution defined by the condition that no dd exists such that fi(θ)d<0\nabla f_i(\theta)^\top d < 0 for all ii (Chen et al., 2 Feb 2026).

2. Detection and Definition of Gradient Conflict

Conflict-aware optimization strategies hinge on reliable detection of gradient conflict. The general criterion is an anti-correlation of gradients for different tasks or objectives:

  • Pairwise conflict: Tasks ii and jj are in conflict if gigj<0g_i^\top g_j < 0 (Bohn et al., 2024, Zhu et al., 5 Mar 2025, Yang et al., 14 Jan 2026).
  • Constraint violation: In constrained or utility-preserving settings, e.g., federated unlearning, the ascent direction gtarg_{\mathrm{tar}} (for unlearning/forgetting) and reference direction grefg_{\mathrm{ref}} (for utility) are in conflict if gtar,gref>0\langle g_{\mathrm{tar}}, g_{\mathrm{ref}}\rangle > 0, i.e., forgetting increases (rather than preserves) utility loss (Li et al., 30 Jan 2026).
  • Conic or angular constraints: ConicGrad restricts the allowable update direction dd to an angular cone around a reference gradient g0=i=1Tgig_0 = \sum_{i=1}^T g_i, enforcing g0,dcg0d\langle g_0, d\rangle \geq c \|g_0\| \|d\| for some c=cosθc = \cos\theta (Hassanpour et al., 31 Jan 2025).

Module- or parameter-block-specific conflicts are also detectable (e.g., in Modular Gradient Conflict Mitigation (MGCM), conflict is checked locally within architectural modules) (Liu et al., 2024).

3. Projection-Based Conflict Resolution Schemes

The core of CAPGA is using projections to adjust conflicting gradients. The canonical update for resolving a pairwise task conflict is to orthogonally project one gradient onto the space perpendicular to the other, as in PCGrad and descendants:

giproj=gigigjgj2gj.g_i^{\mathrm{proj}} = g_i - \frac{g_i^\top g_j}{\|g_j\|^2} g_j.

This update enforces giprojgjg_i^{\mathrm{proj}} \perp g_j, eliminating immediate negative interference (Bohn et al., 2024, Yang et al., 14 Jan 2026). Modular schemes apply this at the module level, locally surgerying gradients only where conflict actually occurs (Liu et al., 2024).

Full deconfliction via Gram–Schmidt-like procedures projects each gig_i onto the complement of the subspace spanned by all other gjg_j, guaranteeing “strong non-conflict” for all i,ji,j (Zhu et al., 5 Mar 2025).

ConicGrad seeks a direction dd^* maximizing the minimum improvement over all tasks subject to a cone constraint; the projection is given in closed form via Sherman–Morrison (Hassanpour et al., 31 Jan 2025).

Federated unlearning employs a constrained quadratic program:

minddgtar22subject to gref,d0,\min_d \|d - g_{\mathrm{tar}}\|_2^2 \quad \text{subject to } \langle g_{\mathrm{ref}}, d\rangle \leq 0,

with closed form:

d=gtarmax(0,gtar,gref)gref2+εgrefd = g_{\mathrm{tar}} - \frac{\max(0, \langle g_{\mathrm{tar}}, g_{\mathrm{ref}}\rangle)}{\|g_{\mathrm{ref}}\|^2 + \varepsilon} g_{\mathrm{ref}}

(Li et al., 30 Jan 2026).

In multi-objective/CAGrad-type schemes, the projection consists of directly solving a small quadratic program to find non-decreasing directions with guaranteed Pareto criticality. In the clipped CAGrad-Clip variant, the projection is clipped to respect user-provided preference bounds (Chen et al., 2 Feb 2026).

4. Probabilistic and Weighted Projection Schemes

The basic projection mechanism can be augmented to encode task priorities, dynamic loss weighting, or user-defined tradeoff preferences.

  • Probabilistic task prioritization: In wPCGrad, a probability vector p\vec{p} is dynamically updated (e.g., via Dynamic Task Prioritization), and at each step a task is sampled as an “anchor” based on its most recent loss. Only gradients that conflict with the anchor are projected, and tasks with larger losses are more likely to be protected from adjustment. Explicitly,

pi=(Li(t1))γj=1K(Lj(t1))γp_i = \frac{(L_i^{(t-1)})^\gamma}{\sum_{j=1}^K (L_j^{(t-1)})^\gamma}

for focusing exponent γ\gamma (Bohn et al., 2024).

  • Trade-off traversal: Weighted GradOPS uses a trade-off exponent α\alpha to control the final aggregation weights of the deconflicted gradients and efficiently explores the Pareto frontier (Zhu et al., 5 Mar 2025).
  • Preference-respecting clipping: In multi-objective LLM alignment under CAGrad-Clip (as in RACO), mixture weights are clipped to remain within user-specified quotas, preventing low-priority objectives from dominating the correction (Chen et al., 2 Feb 2026).

5. Theoretical Guarantees and Convergence Properties

CAPGA strategies typically rely on first-order smoothness and Lipschitz continuity. Core theoretical results include:

  • Monotonic improvement: Under smoothness/Lipschitz conditions and sufficiently small step sizes, projected updates guarantee non-decreasing improvement in all objectives or (in the case of fairness) convergence of the gap between objectives to zero (Kim et al., 25 Aug 2025, Bohn et al., 2024).
  • Pareto-stationarity: Deconflicted updates converge to Pareto-stationary points under appropriate decrease lemmas (Zhu et al., 5 Mar 2025, Chen et al., 2 Feb 2026).
  • Task utility protection: In federated unlearning, projection-based ascent guarantees no first-order increase in utility loss on the reference set (Li et al., 30 Jan 2026).
  • Computational complexity: Principally first-order, with small overhead for projection (at worst O(T3)O(T^{3}) in CAGrad-style QPs, or O(T)O(T) for pairwise methods). Efficient implementations exist, e.g., via Sherman–Morrison for ConicGrad (Hassanpour et al., 31 Jan 2025), or module-local projections for MGCM with minimal memory and compute costs (Liu et al., 2024).

6. Empirical Results and Impact

Conflict-aware projected gradient ascent methods consistently outperform naive or uniform-weighted variants, especially in scenarios characterized by severe gradient conflict:

Method/Setting Task/Dataset Baseline (PCGrad/joint) CAPGA Variant Performance Gain
wPCGrad+DTP nuScenes NDS=0.329 NDS=0.344 +4.6% NDS, +7.2% mAP
wPCGrad CelebA (40 attr.) 0.846 (avg. acc.) 0.850 +0.004 avg. acc
wPCGrad CIFAR-100 0.615 0.624 +0.009 avg. acc
Ortho-LoRA GLUE (multi-task) 88.4 (avg. acc, joint) 89.6 (Ortho-LoRA) Recovers 80–95% of LoRA gap
MGCM SimulST (BLEU) 24.51 (beam5, DiSeg) 25.14 (MGCM) +0.63 BLEU, 95% less memory
FedCARE (CAPGA) CIFAR-10 unlearn - 82.7% R-Acc Minimal overhead, near retrain

All tests above confirm enhanced utility retention, accelerated convergence, and in several cases, improved fairness in multitask or multi-agent regimes (Bohn et al., 2024, Zhu et al., 5 Mar 2025, Li et al., 30 Jan 2026, Yang et al., 14 Jan 2026, Liu et al., 2024).

7. Domain Extensions and Generalizations

Conflict-aware projected gradient ascent has been generalized beyond traditional MTL to a range of settings:

  • Federated Unlearning: CAPGA is central to unlearning in federated learning, efficiently removing private information from models while protecting global utility. The local CAPGA update solves a constrained ascent direction at each client (Li et al., 30 Jan 2026).
  • Multi-objective LLM alignment: CAGrad-Clip directly incorporates human preference weights and resolves conflicts between safety, helpfulness, or user utility objectives, resulting in superior Pareto frontiers compared to naive weighting or reward-model-based approaches (Chen et al., 2 Feb 2026).
  • Module-level conflict mitigation: MGCM precisely resolves local conflicts in simultaneous speech translation models, reducing both compute and GPU memory overhead (Liu et al., 2024).
  • Multi-agent RL and fairness: CAPGA, via gradient projection, guarantees monotonic non-decreasing improvement and achieves agent-equitable returns (Kim et al., 25 Aug 2025).
  • Parameter-efficient transfer: In LoRA-based multi-task LLM adaptation, projection in the low-dimensional subspace delivers near single-task performance with order-of-magnitude less inference overhead (Yang et al., 14 Jan 2026).

The flexibility and first-order efficiency of CAPGA make it suitable for complex, large-scale, and resource-constrained optimization regimes encountered in contemporary machine learning. As a consequence, CAPGA has become a foundational tool for robust and principled multi-objective optimization across diverse applications.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conflict-Aware Projected Gradient Ascent.