Desingularization Subgradient Methods

Updated 22 December 2025

Desingularization subgradient methods redefine update rules by incorporating regularization and geometric reformulations to overcome non-descent directions in nonsmooth optimization.
They employ techniques such as ε-regularization, radial reformulation, and selective exclusion of singular terms to stabilize iterations and promote convergence.
Empirical and theoretical analyses show these methods achieve linear to superlinear convergence in applications like convex minimization, composite functions, and location problems.

The desingularization subgradient method comprises a set of algorithmic paradigms and theoretical frameworks in nonsmooth, possibly non-Lipschitz or singular optimization, wherein standard subgradient updates are regularized, geometrically projected, or otherwise modified to circumvent non-descent directions and singularities present in the subdifferential mapping. This class includes subgradient regularization methods, radial reformulations, active manifold reductions, and desingularizing subgradient strategies for convex, composite, and marginal functions. The desingularization principle enables stable, often provably convergent algorithms for nonsmooth objectives that otherwise exhibit pathological behavior under naive subgradient iteration.

1. Foundational Principles

Desingularization subgradient methods arise from the realization that, for general nonsmooth functions, a subgradient $v\in\partial f(x)$ need not be a direction of descent; the Clarke subdifferential at a point may even contain directions orthogonal or adverse to the gradient flow. The core innovation is to transform the update rule, either via:

Subgradient regularization: Introducing an auxiliary parameter $\varepsilon>0$ and a family of regularized subdifferential maps $G(x,\varepsilon)$ satisfying outer semicontinuity and recovering the minimal-norm element of $\partial f(x)$ as $\varepsilon\to 0$ (Li et al., 11 May 2025).
Geometric reformulation: Operating on radial, manifold, or stratified representations of the feasible region or objective, leveraging properties such as Lipschitz continuity of transformed functions even when $f$ itself lacks such structure (Grimmer, 2017, Davis et al., 2021).
Subproblem modification at singular points: For cases like the extended Weber location problem, excising undefined or infinite terms at data sites and proceeding with a well-defined subgradient of a partial sum (Lai et al., 2024).

These approaches are united by the goal of stabilizing iteration near points where the classical subgradient method either fails to progress (due to stationary yet nonoptimal subdifferential structure) or suffers numerical divergence.

2. Algorithmic Frameworks

Diverse algorithmic templates embody the desingularization concept:

Method	Regularization/Structure	Main Update	Descent Guarantee
SRDescent	$\varepsilon$ -regularized $G(x,\varepsilon)$	$x^+ = x - \eta g$ , $g\in G(x,\varepsilon)$	Armijo-type: $f(x^+)\leq f(x) - \alpha\eta\\|g\\|^2$ (Li et al., 11 May 2025)
Radial Subgradient (RSM)	Radial lift $Y_z(x)$	Projected on radial level set	Contraction in terms of $Y_z$ (Grimmer, 2017)
De-singularity for Weber	Removal of singular component	Omit $k$ -th term at $a_k$	Strict decrease or stationarity (Lai et al., 2024)

In SRDescent, $\varepsilon$ is reduced progressively and associated regularized subgradient directions $g\in G(x,\varepsilon)$ are computed to guarantee descent. Adaptive schemes, e.g., SRDescent-adapt, select $\varepsilon$ based on line search or other criteria.

The Radial Subgradient Method maintains a strictly feasible point and at each iteration moves radially towards or away from this anchor, applying a structure-preserving transformation $Y_z(x)$ that is always convex and Lipschitz, thus circumventing the complications of $f$ itself not being Lipschitz.

For singularity-prone problems such as the extended Weber median, iterations at singular data points are handled by removing the term(s) causing the singularity, ensuring the update remains defined and promotes descent.

3. Theoretical Guarantees and Rates

All desingularization subgradient methods feature rigorous descent and convergence theory:

Descent property: For any $g\in G(x,\varepsilon)$ with $0\notin\partial f(x)$ , there exist $\varepsilon$ and sufficiently small step $\eta$ such that $f(x-\eta g)\le f(x)-\alpha \eta \|g\|^2$ (Li et al., 11 May 2025).
Guarantees on stationarity: Algorithmic termination is certified at (approximate) Clarke-stationary points $0\in\partial f(x)$ ; under mild conditions, all cluster points of the iterates are Clarke-stationary.
Rates:
- For SRDescent on composite or finite-max functions, sublinear or even linear convergence is observed empirically and, in certain settings, theoretically (Li et al., 11 May 2025).
- The Radial Subgradient Method achieves $O(1/\varepsilon^2)$ iteration complexity for absolute or relative error criteria, matching the classical rate for Lipschitz objectives without requiring $f$ to be globally Lipschitz (Grimmer, 2017).
- In the extended Weber case, linear convergence is established for $1 $x_*=a_k$

4. Desingularization Mechanisms

Each method achieves desingularization through distinct mathematical strategies:

Subgradient regularization (SRDescent): The set-valued map $G(x,\varepsilon)$ interpolates between robust regularized descent directions for positive $\varepsilon$ and the minimal-norm subgradient as $\varepsilon\to 0$ . For finite maxima of smooth functions, $G(x,\varepsilon)$ can be expressed via a quadratic penalty on the convex combination of gradients (see the formulation with $\sum_{i=1}^m y_i\nabla f_i(x)$ and an $\varepsilon$ -weighted $\ell_2$ -penalty) (Li et al., 11 May 2025).
Radial reformulation: The function $Y_z(x) = \inf\{ y > 0 : y f(x / y) \le z \}$ for $z<0$ smooths and regularizes $f$ , yielding convex, globally Lipschitz level sets $Y_z(\cdot)$ , regardless of the unbounded slope or lack of regularity in $f$ itself. All iterates are mapped back into interior regions via radial re-projection, maintaining feasibility and leveraging controlled geometry (Grimmer, 2017).
Excising singularities: For functions with data-induced singularities, the gradient at $x=a_k$ is replaced with the gradient of the sum omitting the term with $i=k$ , guaranteeing a valid update and ensuring escape from nonoptimal singular points in finitely many steps (Lai et al., 2024).
Active manifold and stratification approach: In the context of subdifferentially regular and definable functions, convergence analysis reduces to a smooth gradient flow on the "active manifold" $M$ underlying the nonsmooth activity, with fast repulsion from normal directions and established Kurdyka–Łojasiewicz rates on $M$ (Davis et al., 2021).

5. Algorithmic Instantiations and Pseudocode

There are explicit and implementable pseudocode frameworks for desingularization subgradient methods:

Input: x₀, initial ε₀>0, tolerances ε_tol, ν_tol≥0
For k = 0, 1, 2, ...
  Inner loop (i = 0, 1, ...):
    ε_{k,i} = ε_{k,0}·2^{−i}
    Select g^{k,i} ∈ G(x^k, ε_{k,i})
    If ε_{k,i} ≤ ε_tol and ‖g^{k,i}‖ ≤ ν_tol:
      Return x^k
    For backtracking j = 0, ..., i:
      η = ε_{k,0}·2^{−j}
      If f(x^k - η g^{k,i}) ≤ f(x^k) - α η‖g^{k,i}‖²:
        Set η_k = η, i_k = i, break
  Update x^{k+1} = x^k - η_k g^{k,i_k}
  If ‖g^{k,i_k}‖ ≤ ν_k:
    ν_{k+1} = θ_ν·ν_k, ε_{k+1,0} = θ_ε·ε_{k,0}
  Else:
    ν_{k+1} = ν_k, ε_{k+1,0} = ε_{k,0}
End For

Input: initial level z₀ = f(0) < 0, x₀ = 0
For k = 0, 1, 2, ...
  Choose g_k ∈ ∂Y_{z_k}(x_k)
  Subgradient step: x̃_{k+1} = x_k - α_k g_k
  If Y_{z_k}(x̃_{k+1}) = 0:
    Terminate, unbounded below along ray
  Radial update: z_{k+1} = z_k / Y_{z_k}(x̃_{k+1}), x_{k+1} = x̃_{k+1} / Y_{z_k}(x̃_{k+1})
End For

At $x^p=a_k$ , use $g^p = \sum_{i\neq k} q w_i \|a_k-a_i\|^{q-2}(a_k - a_i)$ and perform backtracking on the step-size to ensure $f(a_k - \alpha g^p)<f(a_k)$ .

6. Applications and Empirical Performance

Desingularization subgradient methods are applied to:

Non-Lipschitz convex minimization: The radial method performs well when the domain of $f$ is unbounded and $f$ lacks any global slope constraint, outperforming regular subgradient methods by avoiding orthogonal projections (Grimmer, 2017).
Marginal and composite functions: SRDescent and its adaptive variant robustly minimize nonsmooth finite maxima, minima, and convex composites, demonstrating empirical superiority in high-dimensional quadratic problems and “cuspy” instances (Li et al., 11 May 2025).
Singular location problems: The desingularization strategy solves the $q$ -power Weber problem for $1\leq q<2$ , escaping failure modes of the Weiszfeld algorithm and showing both theoretical and empirical linear or superlinear rates as $q$ approaches 2 (Lai et al., 2024).
Active manifold reduction: On functions definable in o-minimal structures, active manifold-based analysis establishes almost-sure convergence to local minimizers and enables transfer of smooth KL-type rates to nonsmooth settings (Davis et al., 2021).

Empirical results substantiate linear convergence for a range of problems, including the Chebyshev-Rosenbrock benchmark and large-scale marginal quadratic optimization, with notable robustness to nonregularity and high conditioning.

7. Connections and Generalizations

The desingularization paradigm connects intimately with:

Bundle, gradient sampling, and prox-linear methods: SRDescent subsumes classical prox-linear steps as a regularization limit and interprets descent-oriented directions via auxiliary dual or primal-dual problems (Li et al., 11 May 2025).
Active stratification and KL theory: Stratified and manifold-based approaches "factor out" singularity-driven slowdowns by explicit geometric reduction (Davis et al., 2021).
Coordinate invariance: For composite and manifold-structured problems, certain desingularizing algorithms maintain oracle-complexity invariance under reparameterization, a property not shared by the naive subgradient flow (Davis et al., 2022).

A plausible implication is that future developments may further unify geometric, dual-regularized, and stratified reduction views for broader classes of nonconvex and non-Lipschitz objectives, enabling fast, reliable first-order methods even in the presence of pathological nonsmoothness.

Markdown Report Issue Upgrade to Chat

References (5)

Subgradient Regularization: A Descent-Oriented Subgradient Method for Nonsmooth Optimization (2025)

Radial Subgradient Method (2017)

Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization (2021)

A De-singularity Subgradient Approach for the Extended Weber Location Problem (2024)

A linearly convergent Gauss-Newton subgradient method for ill-conditioned problems (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Desingularization Subgradient Method.

Desingularization Subgradient Methods

1. Foundational Principles

2. Algorithmic Frameworks

3. Theoretical Guarantees and Rates

4. Desingularization Mechanisms

5. Algorithmic Instantiations and Pseudocode

SRDescent (Subgradient Regularization) (Li et al., 11 May 2025)

Radial Subgradient Method (Grimmer, 2017)

De-singularity Subgradient for Extended Weber (Lai et al., 2024)

6. Applications and Empirical Performance

7. Connections and Generalizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Desingularization Subgradient Methods

1. Foundational Principles

2. Algorithmic Frameworks

3. Theoretical Guarantees and Rates

4. Desingularization Mechanisms

5. Algorithmic Instantiations and Pseudocode

SRDescent (Subgradient Regularization) (Li et al., 11 May 2025)

Radial Subgradient Method (Grimmer, 2017)

De-singularity Subgradient for Extended Weber (Lai et al., 2024)

6. Applications and Empirical Performance

7. Connections and Generalizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics