Fully-Corrective Descents

Updated 2 February 2026

Fully-corrective descents are optimization strategies that re-optimize complete active sets per iteration to ensure rapid convergence and improved sparsity.
They encompass methods like FC-GCG, FCFW, and fully-corrective boosting, which solve small convex subproblems to adjust coefficients over active atoms.
These methods provide sublinear to linear convergence rates and leverage strategies such as ADMM and block-coordinate descent to mitigate computational costs.

Fully-corrective descent refers to optimization strategies where, at each iteration, the solution is re-optimized over all selected basis elements, active coordinates, or atoms rather than just progressing in a single greedy direction. Such methods are widely adopted in sparse minimization, boosting, conditional gradient (Frank–Wolfe) algorithms, and sequential model selection. The key property is the corrective re-optimization of the current support, guaranteeing rapid convergence and improved sparsity compared to stagewise or single-step alternatives.

1. Foundational Principles

Fully-corrective descents generalize simple greedy methods by introducing a corrective re-optimization over all active atoms or coordinates at each step. In the classic Frank–Wolfe (FW) and conditional gradient frameworks, an atom is added per iteration, and only its coefficient is updated. Conversely, fully-corrective variants—such as Fully-Corrective Frank–Wolfe (FCFW) and Fully-Corrective Generalized Conditional Gradient (FC-GCG)—recalculate the coefficients over the entire set of active atoms to minimize the objective, frequently solving a smaller, finite-dimensional convex problem restricted to the current active set (Bredies et al., 2021, Halbey et al., 3 Jun 2025).

Such decoupling of atom addition and full support minimization yields two critical improvements: rapid decay of residual error and efficient removal of obsolete atoms, leading to sparse and stable representations. For composite loss functions $J(u) = F(Ku) + G(u)$ , where $F$ is smooth and $G$ is positively one-homogeneous, FC-GCG iteratively updates both the active atom set and the coefficients via finite-dimensional convex subproblems, ensuring strict monotonicity and convergence (Bredies et al., 2021).

2. Algorithmic Procedures

The fully-corrective framework is exemplified in several prominent algorithms:

FC-GCG Procedure (Banach Space, Convex Loss; (Bredies et al., 2021)):

Update dual variable $p_k = -K_* \nabla F(K u_k)$ .
Add maximizer $v_k \in \arg\max_{v \in \text{Ext} B} \langle p_k, v \rangle$ to active atoms.
Solve for coefficients $\lambda^{k+1}$ minimizing $F(\sum \lambda_i K u_i) + \sum \lambda_i$ , restricted to the expanded support.
Purge atoms with zero coefficients and repeat.

FCFW:

At each step, recompute $x_{t+1} = \arg\min_{x \in \text{conv}(S_t)} f(x)$ , where $S_t$ is the active set. Quadratic corrective strategies—QC-LP (linear programming on simplex) and QC-MNP (Minimum-Norm-Point, pullback to simplex)—approximate the fully-corrective update efficiently for quadratic objectives by solving low-dimensional linear or LP subproblems (Halbey et al., 3 Jun 2025).

Fully-Corrective Boosting with Squared Hinge (Zeng et al., 2020):

In boosting, the fully-corrective greedy (FCG) update first selects the best weak learner, then re-optimizes all weights over the current basis—a convex problem over the span of chosen weak learners, typically solved by ADMM for the squared hinge loss.

Information-Optimal Graph Growth (Bond et al., 29 Jan 2026):

The information-geometry-driven sequential graph growth for Gaussian graphical inference interprets edge activation as fully-corrective coordinate descent, selecting at each step the edge that most decreases the graphical loss and then re-optimizing the precision matrix over all activated edges.

Algorithm	Atom Addition	Corrective Step
FC-GCG	$\arg\max_{v\in\text{Ext} B}\langle p_k, v\rangle$	Finite-dimensional convex minimization over active set
FCFW	FW-LMO (linear minimization oracle)	Minimization over $\text{conv}(S_t)$ or affine hull
FCG Boosting	Greedy selection by gradient inner product	Full joint convex re-optimization via ADMM
Graph Growth	Edge with maximal decrease in loss	Precision matrix minimization (block-CD or full reopt.)

3. Convergence Rates and Theoretical Guarantees

Fully-corrective algorithms yield provably faster convergence rates than stagewise or single-step methods. For FC-GCG, if the problem adheres to global convexity and smoothness conditions, sublinear convergence $r_J(u_k) \le C/(k+1)$ is achieved, where $r_J$ denotes the optimality gap. Under strict convexity, finite atom dual maximizers, linear independence, and strict complementarity, geometric (linear) convergence $r_J(u_k) \le C \zeta^k$ follows (Bredies et al., 2021). Similar linear rates obtain for Corrective FW algorithms when the feasible set is sharply convex (pyramidal width) and the objective is strongly convex (Halbey et al., 3 Jun 2025).

In boosting with the squared hinge loss, fully-corrective procedures yield fast statistical learning rates: $\mathcal{O}\bigl((m/\log m)^{-1/4}\bigr)$ for misclassification risk, and up to $\mathcal{O}\bigl((m/\log m)^{-1/2}\bigr)$ under Tsybakov noise assumptions, outperforming classical boosting methods (Zeng et al., 2020).

4. Sparsity, Robustness, and Adaptive Support

Fully-corrective descents rapidly attain sparse solutions due to aggressive pruning and re-weighting of the active set. In boosting, only weak learners contributing nonzero coefficients at the fully-corrective final iterates remain, enforcing explicit sparsity and margin maximization (Zeng et al., 2020). In Frank–Wolfe variants, obsolete atoms are eliminated by corrective steps, and when the optimal solution is supported on a low-dimensional face, QC-MNP and QC-LP procedures recover the correct support swiftly (Halbey et al., 3 Jun 2025).

In Gaussian graphical model growth, activation ranks and stability selection further distinguish signal from noise edges, exploiting the fully-corrective block descent properties to control false discovery (Bond et al., 29 Jan 2026). Early stopping based on excess risk or empirical margin further regularizes the solution by bounding sample variance.

5. Computational Strategies and Practical Considerations

The principal challenge of fully-corrective methods is computational cost, since full support re-optimization may be expensive. Several solution strategies have emerged:

Efficient Quadratic Corrections (FW): For convex quadratic problems, quadratic corrective steps substitute expensive QP solves with one LP or linear system and a ratio test (QC-MNP), reducing runtime by orders of magnitude while preserving linear convergence once optimal support is achieved (Halbey et al., 3 Jun 2025).
ADMM for Boosting: The fully-corrective subproblem is efficiently addressed by alternating direction method of multipliers, exploiting separability and closed-form updates for both coefficients and functional outputs (Zeng et al., 2020).
Block-Coordinate Descent for Graph Growth: Algorithmic relaxations—such as best-block improvement and Gauss–Southwell or Gauss–Southwell–Lipschitz rules—enable scalable approximate correction by updating $1\times1$ or $2\times2$ blocks or by pre-selecting edges with high partial correlations (Bond et al., 29 Jan 2026).
Active Set Management: Both FW and boosting frameworks use drop steps and zero-weight pruning to adapt the active set, confining optimization to the minimal support.

Application	Full Correction Challenge	Solution Strategy
FCFW/Quadratic Problems	High-dimensional QP	QC-LP/QC-MNP; active set pruning
Boosting	Joint convex re-optimization	ADMM-based efficient updates
Graphical Model Growth	Matrix re-inversion, edge scanning	Closed-form block update, GSL

6. Connections and Extensions

Fully-corrective descent intersects with diverse areas:

Boosting and Regularized Risk Minimization: Fully-corrective boosting strategies generalize beyond AdaBoost by re-optimizing all coefficients, accommodating various losses and regularizers (CGBoost framework) (Shen et al., 2010).
Sparse Recovery and Greedy Pursuit: Information-optimal graph growth is mathematically equivalent to fully-corrective coordinate descent in convex graphical loss, bridging statistical inference and deterministic optimization (Bond et al., 29 Jan 2026).
Frank–Wolfe Methods and Conditional Gradient Extensions: Quadratic corrections and blended pairwise strategies inherit the geometric contraction rates of classical FW, augmenting scalability for large-scale polytope problems and high-dimensional regression (Halbey et al., 3 Jun 2025).
Measure Lifting and Choquet Theory: In generalized atomic decomposition, fully-corrective steps correspond to primal-dual-active-point measures in Radon space via Choquet’s representation, establishing equivalence between lifted and original problems (Bredies et al., 2021).

Fully-corrective descent methodologies are now standard tools for sparse learning, convex optimization, and graphical inference, with implementation strategies and theoretical guarantees aligning closely across the literature.