Conflict-Aware Projected Gradient Ascent

Updated 6 February 2026

Conflict-Aware Projected Gradient Ascent is an optimization framework that resolves gradient conflicts by projecting gradients to balance competing objectives.
It employs projection techniques—such as orthogonal projection and Gram–Schmidt procedures—to mitigate negative interference in multi-task and multi-objective settings.
CAPGA has demonstrated improved convergence, fairness, and efficiency in applications including federated unlearning, LLM alignment, and multi-agent reinforcement learning.

Conflict-Aware Projected Gradient Ascent (CAPGA) refers to a class of optimization techniques that resolve conflicts between multiple objectives during gradient-based optimization by explicitly adjusting, projecting, or reweighting gradient vectors associated with those objectives. The methodology is particularly significant in multitask learning, federated unlearning, multi-objective optimization, and reinforcement learning, where simultaneous improvement of several possibly competing objectives is inherently challenging. CAPGA schemes offer principled conflict resolution at the level of gradients, aiming to guarantee favourable tradeoffs, fairness, or constraints, while maintaining computational efficiency.

1. Multitask and Multi-Objective Optimization Problem Formulation

In canonical multitask learning (MTL), one optimizes a shared model with parameter vector $\theta\in\mathbb{R}^d$ over $K$ tasks, each associated with a loss function $L_i(\theta)$ . The overall goal is to minimize a weighted sum or average:

$L(\theta) = \frac{1}{K} \sum_{i=1}^K L_i(\theta).$

Each task provides a gradient $g_i = \nabla_\theta L_i(\theta)$ . If the gradients are aligned, naive gradient descent on $L(\theta)$ is efficient. However, when the gradients are not aligned (i.e., gradient conflicts), naive aggregation can degrade performance for some or all tasks, stall convergence, or induce negative transfer. This is formally observed when

$g_i^\top g_j < 0,$

indicating that improvement in one task may worsen another (Bohn et al., 2024).

Similarly, in multi-objective optimization, one seeks Pareto-critical points where no descent direction improves all objectives. Gradient conflicts are again central, with a solution defined by the condition that no $d$ exists such that $\nabla f_i(\theta)^\top d < 0$ for all $i$ (Chen et al., 2 Feb 2026).

2. Detection and Definition of Gradient Conflict

Conflict-aware optimization strategies hinge on reliable detection of gradient conflict. The general criterion is an anti-correlation of gradients for different tasks or objectives:

Pairwise conflict: Tasks $K$ 0 and $K$ 1 are in conflict if $K$ 2 (Bohn et al., 2024, Zhu et al., 5 Mar 2025, Yang et al., 14 Jan 2026).
Constraint violation: In constrained or utility-preserving settings, e.g., federated unlearning, the ascent direction $K$ 3 (for unlearning/forgetting) and reference direction $K$ 4 (for utility) are in conflict if $K$ 5, i.e., forgetting increases (rather than preserves) utility loss (Li et al., 30 Jan 2026).
Conic or angular constraints: ConicGrad restricts the allowable update direction $K$ 6 to an angular cone around a reference gradient $K$ 7, enforcing $K$ 8 for some $K$ 9 (Hassanpour et al., 31 Jan 2025).

Module- or parameter-block-specific conflicts are also detectable (e.g., in Modular Gradient Conflict Mitigation (MGCM), conflict is checked locally within architectural modules) (Liu et al., 2024).

3. Projection-Based Conflict Resolution Schemes

The core of CAPGA is using projections to adjust conflicting gradients. The canonical update for resolving a pairwise task conflict is to orthogonally project one gradient onto the space perpendicular to the other, as in PCGrad and descendants:

$L_i(\theta)$ 0

This update enforces $L_i(\theta)$ 1, eliminating immediate negative interference (Bohn et al., 2024, Yang et al., 14 Jan 2026). Modular schemes apply this at the module level, locally surgerying gradients only where conflict actually occurs (Liu et al., 2024).

Full deconfliction via Gram–Schmidt-like procedures projects each $L_i(\theta)$ 2 onto the complement of the subspace spanned by all other $L_i(\theta)$ 3, guaranteeing “strong non-conflict” for all $L_i(\theta)$ 4 (Zhu et al., 5 Mar 2025).

ConicGrad seeks a direction $L_i(\theta)$ 5 maximizing the minimum improvement over all tasks subject to a cone constraint; the projection is given in closed form via Sherman–Morrison (Hassanpour et al., 31 Jan 2025).

Federated unlearning employs a constrained quadratic program:

$L_i(\theta)$ 6

with closed form:

$L_i(\theta)$ 7

(Li et al., 30 Jan 2026).

In multi-objective/CAGrad-type schemes, the projection consists of directly solving a small quadratic program to find non-decreasing directions with guaranteed Pareto criticality. In the clipped CAGrad-Clip variant, the projection is clipped to respect user-provided preference bounds (Chen et al., 2 Feb 2026).

4. Probabilistic and Weighted Projection Schemes

The basic projection mechanism can be augmented to encode task priorities, dynamic loss weighting, or user-defined tradeoff preferences.

Probabilistic task prioritization: In wPCGrad, a probability vector $L_i(\theta)$ 8 is dynamically updated (e.g., via Dynamic Task Prioritization), and at each step a task is sampled as an “anchor” based on its most recent loss. Only gradients that conflict with the anchor are projected, and tasks with larger losses are more likely to be protected from adjustment. Explicitly,

$L_i(\theta)$ 9

for focusing exponent $L(\theta) = \frac{1}{K} \sum_{i=1}^K L_i(\theta).$ 0 (Bohn et al., 2024).

Trade-off traversal: Weighted GradOPS uses a trade-off exponent $L(\theta) = \frac{1}{K} \sum_{i=1}^K L_i(\theta).$ 1 to control the final aggregation weights of the deconflicted gradients and efficiently explores the Pareto frontier (Zhu et al., 5 Mar 2025).
Preference-respecting clipping: In multi-objective LLM alignment under CAGrad-Clip (as in RACO), mixture weights are clipped to remain within user-specified quotas, preventing low-priority objectives from dominating the correction (Chen et al., 2 Feb 2026).

5. Theoretical Guarantees and Convergence Properties

CAPGA strategies typically rely on first-order smoothness and Lipschitz continuity. Core theoretical results include:

Monotonic improvement: Under smoothness/Lipschitz conditions and sufficiently small step sizes, projected updates guarantee non-decreasing improvement in all objectives or (in the case of fairness) convergence of the gap between objectives to zero (Kim et al., 25 Aug 2025, Bohn et al., 2024).
Pareto-stationarity: Deconflicted updates converge to Pareto-stationary points under appropriate decrease lemmas (Zhu et al., 5 Mar 2025, Chen et al., 2 Feb 2026).
Task utility protection: In federated unlearning, projection-based ascent guarantees no first-order increase in utility loss on the reference set (Li et al., 30 Jan 2026).
Computational complexity: Principally first-order, with small overhead for projection (at worst $L(\theta) = \frac{1}{K} \sum_{i=1}^K L_i(\theta).$ 2 in CAGrad-style QPs, or $L(\theta) = \frac{1}{K} \sum_{i=1}^K L_i(\theta).$ 3 for pairwise methods). Efficient implementations exist, e.g., via Sherman–Morrison for ConicGrad (Hassanpour et al., 31 Jan 2025), or module-local projections for MGCM with minimal memory and compute costs (Liu et al., 2024).

6. Empirical Results and Impact

Conflict-aware projected gradient ascent methods consistently outperform naive or uniform-weighted variants, especially in scenarios characterized by severe gradient conflict:

Method/Setting	Task/Dataset	Baseline (PCGrad/joint)	CAPGA Variant	Performance Gain
wPCGrad+DTP	nuScenes	NDS=0.329	NDS=0.344	+4.6% NDS, +7.2% mAP
wPCGrad	CelebA (40 attr.)	0.846 (avg. acc.)	0.850	+0.004 avg. acc
wPCGrad	CIFAR-100	0.615	0.624	+0.009 avg. acc
Ortho-LoRA	GLUE (multi-task)	88.4 (avg. acc, joint)	89.6 (Ortho-LoRA)	Recovers 80–95% of LoRA gap
MGCM	SimulST (BLEU)	24.51 (beam5, DiSeg)	25.14 (MGCM)	+0.63 BLEU, 95% less memory
FedCARE (CAPGA)	CIFAR-10 unlearn	-	82.7% R-Acc	Minimal overhead, near retrain

All tests above confirm enhanced utility retention, accelerated convergence, and in several cases, improved fairness in multitask or multi-agent regimes (Bohn et al., 2024, Zhu et al., 5 Mar 2025, Li et al., 30 Jan 2026, Yang et al., 14 Jan 2026, Liu et al., 2024).

7. Domain Extensions and Generalizations

Conflict-aware projected gradient ascent has been generalized beyond traditional MTL to a range of settings:

Federated Unlearning: CAPGA is central to unlearning in federated learning, efficiently removing private information from models while protecting global utility. The local CAPGA update solves a constrained ascent direction at each client (Li et al., 30 Jan 2026).
Multi-objective LLM alignment: CAGrad-Clip directly incorporates human preference weights and resolves conflicts between safety, helpfulness, or user utility objectives, resulting in superior Pareto frontiers compared to naive weighting or reward-model-based approaches (Chen et al., 2 Feb 2026).
Module-level conflict mitigation: MGCM precisely resolves local conflicts in simultaneous speech translation models, reducing both compute and GPU memory overhead (Liu et al., 2024).
Multi-agent RL and fairness: CAPGA, via gradient projection, guarantees monotonic non-decreasing improvement and achieves agent-equitable returns (Kim et al., 25 Aug 2025).
Parameter-efficient transfer: In LoRA-based multi-task LLM adaptation, projection in the low-dimensional subspace delivers near single-task performance with order-of-magnitude less inference overhead (Yang et al., 14 Jan 2026).

The flexibility and first-order efficiency of CAPGA make it suitable for complex, large-scale, and resource-constrained optimization regimes encountered in contemporary machine learning. As a consequence, CAPGA has become a foundational tool for robust and principled multi-objective optimization across diverse applications.