Papers
Topics
Authors
Recent
Search
2000 character limit reached

Desingularization Subgradient Methods

Updated 22 December 2025
  • Desingularization subgradient methods redefine update rules by incorporating regularization and geometric reformulations to overcome non-descent directions in nonsmooth optimization.
  • They employ techniques such as ε-regularization, radial reformulation, and selective exclusion of singular terms to stabilize iterations and promote convergence.
  • Empirical and theoretical analyses show these methods achieve linear to superlinear convergence in applications like convex minimization, composite functions, and location problems.

The desingularization subgradient method comprises a set of algorithmic paradigms and theoretical frameworks in nonsmooth, possibly non-Lipschitz or singular optimization, wherein standard subgradient updates are regularized, geometrically projected, or otherwise modified to circumvent non-descent directions and singularities present in the subdifferential mapping. This class includes subgradient regularization methods, radial reformulations, active manifold reductions, and desingularizing subgradient strategies for convex, composite, and marginal functions. The desingularization principle enables stable, often provably convergent algorithms for nonsmooth objectives that otherwise exhibit pathological behavior under naive subgradient iteration.

1. Foundational Principles

Desingularization subgradient methods arise from the realization that, for general nonsmooth functions, a subgradient vf(x)v\in\partial f(x) need not be a direction of descent; the Clarke subdifferential at a point may even contain directions orthogonal or adverse to the gradient flow. The core innovation is to transform the update rule, either via:

  • Subgradient regularization: Introducing an auxiliary parameter ε>0\varepsilon>0 and a family of regularized subdifferential maps G(x,ε)G(x,\varepsilon) satisfying outer semicontinuity and recovering the minimal-norm element of f(x)\partial f(x) as ε0\varepsilon\to 0 (Li et al., 11 May 2025).
  • Geometric reformulation: Operating on radial, manifold, or stratified representations of the feasible region or objective, leveraging properties such as Lipschitz continuity of transformed functions even when ff itself lacks such structure (Grimmer, 2017, Davis et al., 2021).
  • Subproblem modification at singular points: For cases like the extended Weber location problem, excising undefined or infinite terms at data sites and proceeding with a well-defined subgradient of a partial sum (Lai et al., 2024).

These approaches are united by the goal of stabilizing iteration near points where the classical subgradient method either fails to progress (due to stationary yet nonoptimal subdifferential structure) or suffers numerical divergence.

2. Algorithmic Frameworks

Diverse algorithmic templates embody the desingularization concept:

Method Regularization/Structure Main Update Descent Guarantee
SRDescent ε\varepsilon-regularized G(x,ε)G(x,\varepsilon) x+=xηgx^+ = x - \eta g, gG(x,ε)g\in G(x,\varepsilon) Armijo-type: f(x+)f(x)αηg2f(x^+)\leq f(x) - \alpha\eta\|g\|^2 (Li et al., 11 May 2025)
Radial Subgradient (RSM) Radial lift Yz(x)Y_z(x) Projected on radial level set Contraction in terms of YzY_z (Grimmer, 2017)
De-singularity for Weber Removal of singular component Omit kk-th term at aka_k Strict decrease or stationarity (Lai et al., 2024)

In SRDescent, ε\varepsilon is reduced progressively and associated regularized subgradient directions gG(x,ε)g\in G(x,\varepsilon) are computed to guarantee descent. Adaptive schemes, e.g., SRDescent-adapt, select ε\varepsilon based on line search or other criteria.

The Radial Subgradient Method maintains a strictly feasible point and at each iteration moves radially towards or away from this anchor, applying a structure-preserving transformation Yz(x)Y_z(x) that is always convex and Lipschitz, thus circumventing the complications of ff itself not being Lipschitz.

For singularity-prone problems such as the extended Weber median, iterations at singular data points are handled by removing the term(s) causing the singularity, ensuring the update remains defined and promotes descent.

3. Theoretical Guarantees and Rates

All desingularization subgradient methods feature rigorous descent and convergence theory:

  • Descent property: For any gG(x,ε)g\in G(x,\varepsilon) with 0f(x)0\notin\partial f(x), there exist ε\varepsilon and sufficiently small step η\eta such that f(xηg)f(x)αηg2f(x-\eta g)\le f(x)-\alpha \eta \|g\|^2 (Li et al., 11 May 2025).
  • Guarantees on stationarity: Algorithmic termination is certified at (approximate) Clarke-stationary points 0f(x)0\in\partial f(x); under mild conditions, all cluster points of the iterates are Clarke-stationary.
  • Rates:
    • For SRDescent on composite or finite-max functions, sublinear or even linear convergence is observed empirically and, in certain settings, theoretically (Li et al., 11 May 2025).
    • The Radial Subgradient Method achieves O(1/ε2)O(1/\varepsilon^2) iteration complexity for absolute or relative error criteria, matching the classical rate for Lipschitz objectives without requiring ff to be globally Lipschitz (Grimmer, 2017).
    • In the extended Weber case, linear convergence is established for $1x=akx_*=a_k (Lai et al., 2024).

4. Desingularization Mechanisms

Each method achieves desingularization through distinct mathematical strategies:

  • Subgradient regularization (SRDescent): The set-valued map G(x,ε)G(x,\varepsilon) interpolates between robust regularized descent directions for positive ε\varepsilon and the minimal-norm subgradient as ε0\varepsilon\to 0. For finite maxima of smooth functions, G(x,ε)G(x,\varepsilon) can be expressed via a quadratic penalty on the convex combination of gradients (see the formulation with i=1myifi(x)\sum_{i=1}^m y_i\nabla f_i(x) and an ε\varepsilon-weighted 2\ell_2-penalty) (Li et al., 11 May 2025).
  • Radial reformulation: The function Yz(x)=inf{y>0:yf(x/y)z}Y_z(x) = \inf\{ y > 0 : y f(x / y) \le z \} for z<0z<0 smooths and regularizes ff, yielding convex, globally Lipschitz level sets Yz()Y_z(\cdot), regardless of the unbounded slope or lack of regularity in ff itself. All iterates are mapped back into interior regions via radial re-projection, maintaining feasibility and leveraging controlled geometry (Grimmer, 2017).
  • Excising singularities: For functions with data-induced singularities, the gradient at x=akx=a_k is replaced with the gradient of the sum omitting the term with i=ki=k, guaranteeing a valid update and ensuring escape from nonoptimal singular points in finitely many steps (Lai et al., 2024).
  • Active manifold and stratification approach: In the context of subdifferentially regular and definable functions, convergence analysis reduces to a smooth gradient flow on the "active manifold" MM underlying the nonsmooth activity, with fast repulsion from normal directions and established Kurdyka–Łojasiewicz rates on MM (Davis et al., 2021).

5. Algorithmic Instantiations and Pseudocode

There are explicit and implementable pseudocode frameworks for desingularization subgradient methods:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Input: x₀, initial ε₀>0, tolerances ε_tol, ν_tol≥0
For k = 0, 1, 2, ...
  Inner loop (i = 0, 1, ...):
    ε_{k,i} = ε_{k,0}·2^{−i}
    Select g^{k,i} ∈ G(x^k, ε_{k,i})
    If ε_{k,i} ≤ ε_tol and ‖g^{k,i}‖ ≤ ν_tol:
      Return x^k
    For backtracking j = 0, ..., i:
      η = ε_{k,0}·2^{−j}
      If f(x^k - η g^{k,i}) ≤ f(x^k) - α η‖g^{k,i}‖²:
        Set η_k = η, i_k = i, break
  Update x^{k+1} = x^k - η_k g^{k,i_k}
  If ‖g^{k,i_k}‖ ≤ ν_k:
    ν_{k+1} = θ_ν·ν_k, ε_{k+1,0} = θ_ε·ε_{k,0}
  Else:
    ν_{k+1} = ν_k, ε_{k+1,0} = ε_{k,0}
End For

1
2
3
4
5
6
7
8
Input: initial level z₀ = f(0) < 0, x₀ = 0
For k = 0, 1, 2, ...
  Choose g_k ∈ ∂Y_{z_k}(x_k)
  Subgradient step: x̃_{k+1} = x_k - α_k g_k
  If Y_{z_k}(x̃_{k+1}) = 0:
    Terminate, unbounded below along ray
  Radial update: z_{k+1} = z_k / Y_{z_k}(x̃_{k+1}), x_{k+1} = x̃_{k+1} / Y_{z_k}(x̃_{k+1})
End For

At xp=akx^p=a_k, use gp=ikqwiakaiq2(akai)g^p = \sum_{i\neq k} q w_i \|a_k-a_i\|^{q-2}(a_k - a_i) and perform backtracking on the step-size to ensure f(akαgp)<f(ak)f(a_k - \alpha g^p)<f(a_k).

6. Applications and Empirical Performance

Desingularization subgradient methods are applied to:

  • Non-Lipschitz convex minimization: The radial method performs well when the domain of ff is unbounded and ff lacks any global slope constraint, outperforming regular subgradient methods by avoiding orthogonal projections (Grimmer, 2017).
  • Marginal and composite functions: SRDescent and its adaptive variant robustly minimize nonsmooth finite maxima, minima, and convex composites, demonstrating empirical superiority in high-dimensional quadratic problems and “cuspy” instances (Li et al., 11 May 2025).
  • Singular location problems: The desingularization strategy solves the qq-power Weber problem for 1q<21\leq q<2, escaping failure modes of the Weiszfeld algorithm and showing both theoretical and empirical linear or superlinear rates as qq approaches 2 (Lai et al., 2024).
  • Active manifold reduction: On functions definable in o-minimal structures, active manifold-based analysis establishes almost-sure convergence to local minimizers and enables transfer of smooth KL-type rates to nonsmooth settings (Davis et al., 2021).

Empirical results substantiate linear convergence for a range of problems, including the Chebyshev-Rosenbrock benchmark and large-scale marginal quadratic optimization, with notable robustness to nonregularity and high conditioning.

7. Connections and Generalizations

The desingularization paradigm connects intimately with:

  • Bundle, gradient sampling, and prox-linear methods: SRDescent subsumes classical prox-linear steps as a regularization limit and interprets descent-oriented directions via auxiliary dual or primal-dual problems (Li et al., 11 May 2025).
  • Active stratification and KL theory: Stratified and manifold-based approaches "factor out" singularity-driven slowdowns by explicit geometric reduction (Davis et al., 2021).
  • Coordinate invariance: For composite and manifold-structured problems, certain desingularizing algorithms maintain oracle-complexity invariance under reparameterization, a property not shared by the naive subgradient flow (Davis et al., 2022).

A plausible implication is that future developments may further unify geometric, dual-regularized, and stratified reduction views for broader classes of nonconvex and non-Lipschitz objectives, enabling fast, reliable first-order methods even in the presence of pathological nonsmoothness.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Desingularization Subgradient Method.