Desingularization Subgradient Methods
- Desingularization subgradient methods redefine update rules by incorporating regularization and geometric reformulations to overcome non-descent directions in nonsmooth optimization.
- They employ techniques such as ε-regularization, radial reformulation, and selective exclusion of singular terms to stabilize iterations and promote convergence.
- Empirical and theoretical analyses show these methods achieve linear to superlinear convergence in applications like convex minimization, composite functions, and location problems.
The desingularization subgradient method comprises a set of algorithmic paradigms and theoretical frameworks in nonsmooth, possibly non-Lipschitz or singular optimization, wherein standard subgradient updates are regularized, geometrically projected, or otherwise modified to circumvent non-descent directions and singularities present in the subdifferential mapping. This class includes subgradient regularization methods, radial reformulations, active manifold reductions, and desingularizing subgradient strategies for convex, composite, and marginal functions. The desingularization principle enables stable, often provably convergent algorithms for nonsmooth objectives that otherwise exhibit pathological behavior under naive subgradient iteration.
1. Foundational Principles
Desingularization subgradient methods arise from the realization that, for general nonsmooth functions, a subgradient need not be a direction of descent; the Clarke subdifferential at a point may even contain directions orthogonal or adverse to the gradient flow. The core innovation is to transform the update rule, either via:
- Subgradient regularization: Introducing an auxiliary parameter and a family of regularized subdifferential maps satisfying outer semicontinuity and recovering the minimal-norm element of as (Li et al., 11 May 2025).
- Geometric reformulation: Operating on radial, manifold, or stratified representations of the feasible region or objective, leveraging properties such as Lipschitz continuity of transformed functions even when itself lacks such structure (Grimmer, 2017, Davis et al., 2021).
- Subproblem modification at singular points: For cases like the extended Weber location problem, excising undefined or infinite terms at data sites and proceeding with a well-defined subgradient of a partial sum (Lai et al., 2024).
These approaches are united by the goal of stabilizing iteration near points where the classical subgradient method either fails to progress (due to stationary yet nonoptimal subdifferential structure) or suffers numerical divergence.
2. Algorithmic Frameworks
Diverse algorithmic templates embody the desingularization concept:
| Method | Regularization/Structure | Main Update | Descent Guarantee |
|---|---|---|---|
| SRDescent | -regularized | , | Armijo-type: (Li et al., 11 May 2025) |
| Radial Subgradient (RSM) | Radial lift | Projected on radial level set | Contraction in terms of (Grimmer, 2017) |
| De-singularity for Weber | Removal of singular component | Omit -th term at | Strict decrease or stationarity (Lai et al., 2024) |
In SRDescent, is reduced progressively and associated regularized subgradient directions are computed to guarantee descent. Adaptive schemes, e.g., SRDescent-adapt, select based on line search or other criteria.
The Radial Subgradient Method maintains a strictly feasible point and at each iteration moves radially towards or away from this anchor, applying a structure-preserving transformation that is always convex and Lipschitz, thus circumventing the complications of itself not being Lipschitz.
For singularity-prone problems such as the extended Weber median, iterations at singular data points are handled by removing the term(s) causing the singularity, ensuring the update remains defined and promotes descent.
3. Theoretical Guarantees and Rates
All desingularization subgradient methods feature rigorous descent and convergence theory:
- Descent property: For any with , there exist and sufficiently small step such that (Li et al., 11 May 2025).
- Guarantees on stationarity: Algorithmic termination is certified at (approximate) Clarke-stationary points ; under mild conditions, all cluster points of the iterates are Clarke-stationary.
- Rates:
- For SRDescent on composite or finite-max functions, sublinear or even linear convergence is observed empirically and, in certain settings, theoretically (Li et al., 11 May 2025).
- The Radial Subgradient Method achieves iteration complexity for absolute or relative error criteria, matching the classical rate for Lipschitz objectives without requiring to be globally Lipschitz (Grimmer, 2017).
- In the extended Weber case, linear convergence is established for $1
(Lai et al., 2024).
4. Desingularization Mechanisms
Each method achieves desingularization through distinct mathematical strategies:
- Subgradient regularization (SRDescent): The set-valued map interpolates between robust regularized descent directions for positive and the minimal-norm subgradient as . For finite maxima of smooth functions, can be expressed via a quadratic penalty on the convex combination of gradients (see the formulation with and an -weighted -penalty) (Li et al., 11 May 2025).
- Radial reformulation: The function for smooths and regularizes , yielding convex, globally Lipschitz level sets , regardless of the unbounded slope or lack of regularity in itself. All iterates are mapped back into interior regions via radial re-projection, maintaining feasibility and leveraging controlled geometry (Grimmer, 2017).
- Excising singularities: For functions with data-induced singularities, the gradient at is replaced with the gradient of the sum omitting the term with , guaranteeing a valid update and ensuring escape from nonoptimal singular points in finitely many steps (Lai et al., 2024).
- Active manifold and stratification approach: In the context of subdifferentially regular and definable functions, convergence analysis reduces to a smooth gradient flow on the "active manifold" underlying the nonsmooth activity, with fast repulsion from normal directions and established Kurdyka–Łojasiewicz rates on (Davis et al., 2021).
5. Algorithmic Instantiations and Pseudocode
There are explicit and implementable pseudocode frameworks for desingularization subgradient methods:
SRDescent (Subgradient Regularization) (Li et al., 11 May 2025)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Input: x₀, initial ε₀>0, tolerances ε_tol, ν_tol≥0
For k = 0, 1, 2, ...
Inner loop (i = 0, 1, ...):
ε_{k,i} = ε_{k,0}·2^{−i}
Select g^{k,i} ∈ G(x^k, ε_{k,i})
If ε_{k,i} ≤ ε_tol and ‖g^{k,i}‖ ≤ ν_tol:
Return x^k
For backtracking j = 0, ..., i:
η = ε_{k,0}·2^{−j}
If f(x^k - η g^{k,i}) ≤ f(x^k) - α η‖g^{k,i}‖²:
Set η_k = η, i_k = i, break
Update x^{k+1} = x^k - η_k g^{k,i_k}
If ‖g^{k,i_k}‖ ≤ ν_k:
ν_{k+1} = θ_ν·ν_k, ε_{k+1,0} = θ_ε·ε_{k,0}
Else:
ν_{k+1} = ν_k, ε_{k+1,0} = ε_{k,0}
End For |
Radial Subgradient Method (Grimmer, 2017)
1 2 3 4 5 6 7 8 |
Input: initial level z₀ = f(0) < 0, x₀ = 0
For k = 0, 1, 2, ...
Choose g_k ∈ ∂Y_{z_k}(x_k)
Subgradient step: x̃_{k+1} = x_k - α_k g_k
If Y_{z_k}(x̃_{k+1}) = 0:
Terminate, unbounded below along ray
Radial update: z_{k+1} = z_k / Y_{z_k}(x̃_{k+1}), x_{k+1} = x̃_{k+1} / Y_{z_k}(x̃_{k+1})
End For |
De-singularity Subgradient for Extended Weber (Lai et al., 2024)
At , use and perform backtracking on the step-size to ensure .
6. Applications and Empirical Performance
Desingularization subgradient methods are applied to:
- Non-Lipschitz convex minimization: The radial method performs well when the domain of is unbounded and lacks any global slope constraint, outperforming regular subgradient methods by avoiding orthogonal projections (Grimmer, 2017).
- Marginal and composite functions: SRDescent and its adaptive variant robustly minimize nonsmooth finite maxima, minima, and convex composites, demonstrating empirical superiority in high-dimensional quadratic problems and “cuspy” instances (Li et al., 11 May 2025).
- Singular location problems: The desingularization strategy solves the -power Weber problem for , escaping failure modes of the Weiszfeld algorithm and showing both theoretical and empirical linear or superlinear rates as approaches 2 (Lai et al., 2024).
- Active manifold reduction: On functions definable in o-minimal structures, active manifold-based analysis establishes almost-sure convergence to local minimizers and enables transfer of smooth KL-type rates to nonsmooth settings (Davis et al., 2021).
Empirical results substantiate linear convergence for a range of problems, including the Chebyshev-Rosenbrock benchmark and large-scale marginal quadratic optimization, with notable robustness to nonregularity and high conditioning.
7. Connections and Generalizations
The desingularization paradigm connects intimately with:
- Bundle, gradient sampling, and prox-linear methods: SRDescent subsumes classical prox-linear steps as a regularization limit and interprets descent-oriented directions via auxiliary dual or primal-dual problems (Li et al., 11 May 2025).
- Active stratification and KL theory: Stratified and manifold-based approaches "factor out" singularity-driven slowdowns by explicit geometric reduction (Davis et al., 2021).
- Coordinate invariance: For composite and manifold-structured problems, certain desingularizing algorithms maintain oracle-complexity invariance under reparameterization, a property not shared by the naive subgradient flow (Davis et al., 2022).
A plausible implication is that future developments may further unify geometric, dual-regularized, and stratified reduction views for broader classes of nonconvex and non-Lipschitz objectives, enabling fast, reliable first-order methods even in the presence of pathological nonsmoothness.