Kurdyka-Łojasiewicz Inequality in Optimization

Updated 11 January 2026

Kurdyka-Łojasiewicz inequality is a foundational result linking function value gaps with subgradient norms, crucial for analyzing convergence in optimization.
It generalizes the classical Łojasiewicz gradient inequality to nonsmooth, infinite-dimensional, and composite settings with explicit rate regimes.
Recent advances introduce the exact modulus and calculus rules for desingularizers, enhancing rate optimality across various algorithmic frameworks.

The Kurdyka-Łojasiewicz (KL) inequality is a foundational result in nonsmooth and nonconvex analysis that quantitatively relates function value gaps to subgradient norms and provides the key geometric condition underlying the global convergence and complexity analysis of modern optimization algorithms. The inequality generalizes the classical Łojasiewicz gradient inequality from real-analytic functions to a much broader class including subanalytic, semialgebraic, and even nonsmooth or infinite-dimensional energies. The KL property governs not just the existence of convergence but also explicit rates—linear, sublinear, or superlinear—depending on the geometric exponent or desingularizing function present. Recent advances define the exact modulus of the KL property and develop calculus rules for the generalized (possibly nondifferentiable) desingularizers, allowing very fine control and transfer of this property across sums, minima, compositions, and separable structures of functions.

1. Formal Statement and Generalizations

For a proper, lower semicontinuous function $f:\mathbb{R}^n \to (-\infty, +\infty]$ , the KL inequality at a critical point $\bar{x} \in \operatorname{dom} \partial f$ postulates the existence of a neighborhood $U$ , a threshold $\eta > 0$ , and a concave, strictly increasing desingularizing function $\varphi: [0, \eta) \to \mathbb{R}_+$ , right-continuous at 0 with $\varphi(0) = 0$ , such that

$\varphi'_-(f(x) - f(\bar{x})) \cdot \operatorname{dist}(0, \partial f(x)) \geq 1$

for all $x \in U$ with $0 < f(x) - f(\bar{x}) < \eta$ (Wang et al., 2020, Wang et al., 2021). Here, $\varphi'_-$ denotes the left derivative. In the nonsmooth (general Banach or Hilbert space) setting, $\operatorname{dist}(0, \partial f(x))$ is replaced by the minimal norm of a (limiting, Mordukhovich, or Clarke) subgradient (Gerth et al., 2019, Chill et al., 2016).

Of particular practical and theoretical interest is the exponent (power) desingularizer: $\varphi(s) = c s^{1-\theta}$ , $\theta \in [0,1)$ , yielding the exponent KL inequality

$\operatorname{dist}(0, \partial f(x)) \geq \frac{1}{c(1-\theta)} (f(x) - f(\bar{x}))^\theta$

(Qian et al., 15 Apr 2025, Qian et al., 2022, Li et al., 2021, Bento et al., 2024). The parameter $\theta$ (KL exponent) dictates the rate regime for algorithms.

2. Exact Modulus and Optimal Desingularizing Functions

Recent work (Wang et al., 2020) constructs the exact modulus of the generalized concave KL property, answering the open problem of what is the optimal desingularizing function. For a given $f$ as above, define

$h(s) = \sup \{ \operatorname{dist}(0, \partial f(x))^{-1} : x \in U, s \leq f(x) - f(\bar{x}) < \eta \}$

and

$\Psi^*(t) = \int_0^t h(s) ds, \quad t \in [0, \eta), \quad \Psi^*(0) = 0.$

This $\Psi^*$ is the minimal possible concave desingularizer: for every other such function $\varphi$ , $\Psi^*(t) \leq \varphi(t)$ , and $\Psi^*$ itself delivers the sharpest possible finite-length bound for iterates in nonconvex, nonsmooth algorithms (e.g., PALM) (Wang et al., 2020). Constructed examples show that classical exponent desingularizers may significantly overestimate rates compared to the exact modulus.

3. Calculus Rules: Sums, Minima, Separable and Composite Structures

The generalized concave KL property, where desingularizers may be nondifferentiable, admits calculus rules far beyond the classical exponent case. For sums, minima, separable sums, and compositions, explicit procedures are given to deduce the KL property and to transfer or recompute the optimal desingularizer for the result (Wang et al., 2021).

For a sum $f = \sum_{i=1}^m f_i$ , if each $f_i$ has generalized KL at $\bar{x}$ with modulus $\varphi_i$ , then under linear-regularity,

$\varphi(t) = \frac{1}{\alpha} \int_0^t \max_i \varphi'_{i, -}(s/m) ds$

is a valid desingularizer for $f$ , potentially much smaller than the classical exponent-based one. For minima, the modulus is constructed from the active indexes. For compositions $g \circ F$ with $\nabla F$ full rank, a scaling of the modulus applies. The rules do not require $\varphi$ to be of the simple form $c t^{1-\theta}$ nor differentiable, greatly enlarging the set of functions for which sharp rates apply (Wang et al., 2021).

4. Algorithmic Consequences: Convergence Rates and Complexity

The KL inequality is the key tool that upgrades mere subsequence or function value convergence into full sequence convergence with explicit geometric or polynomial rates. For function sequences satisfying suitable descent, error, and continuity conditions (summarized in the Attouch-Bolte framework), the implications are:

If $\theta = 0$ , finite-time convergence (termination) occurs.
If $0 < \theta \leq 1/2$ , linear (exponential, geometric) convergence holds: $\|x^k - x^* \| \leq \gamma \rho^k$ .
If $1/2 < \theta < 1$ , polynomial (sublinear) convergence: $\| x^k - x^* \| = O(k^{-(1-\theta)/(2\theta-1)})$ .

These rates apply to nonmonotone descent methods (Qian et al., 15 Apr 2025), PALM (Wang et al., 2020), stochastic algorithms (SGD, random reshuffling, and variance-reduced schemes) (Fatkhullin et al., 2022, Li et al., 2021), decentralized algorithms (Wu et al., 24 Nov 2025), and generalized descent frameworks (DEAL, boosted proximal gradient, high-order proximal methods) (Ahookhosh et al., 13 Nov 2025, Qian et al., 2022).

Several works establish optimality and sharpness: the exact modulus constructs prove that classical exponent choices are not always rate-minimizing (Wang et al., 2020, Wang et al., 2021). In stochastic optimization under global KŁ conditions, the best known sample complexities for SGD and variants are attained (Fatkhullin et al., 2022), with exponents dictating whether rates are O(ε^{-2}) or slower.

5. Explicit and Effective KL Exponents

For semialgebraic, Nash, and polynomial cases, explicit degree-based formulas for the exponent in the KL/Łojasiewicz inequality are given (Dinh et al., 2015, Osińska-Ulrych et al., 2018). For the largest-eigenvalue functions of polynomial matrices,

$\inf_{w \in \partial^\circ f(x)} \|w\| \geq c |f(x) - f(\bar{x})|^{1 - 1/\mathscr{R}(2n + p(n+1), d+3)}$

with $\mathscr{R}(n, d) = d(3d-3)^{n-1}$ , yielding concrete bounds for convergence and error estimates in eigenvalue optimization and SDP (Dinh et al., 2015). For Nash functions, universality of the effective exponent estimates is established via the degree of the minimal polynomial defining the function (Osińska-Ulrych et al., 2018).

6. KL Inequality in Infinite-Dimensional and Banach Settings

The KL property extends beyond finite-dimensional and real-analytic cases. In Banach spaces, it is equivalent to classical regularity conditions (variational inequality, source conditions, distance bounds) for Tikhonov regularization. Explicit equivalence proofs connect KL and convergence rate regularity (Gerth et al., 2019). In semiconvex or lower semicontinuous energies in infinite-dimensional Hilbert spaces, the KL-Simon variant ensures stabilization of abstract gradient systems, with the scalar exponent delivering exponential or algebraic rates of convergence to equilibria (Chill et al., 2016).

Trust-region subproblems, including nonconvex quadratics over balls, also exhibit KL properties—with explicit local exponents (1/2 or 3/4) and matching Hölderian error bound moduli globally (Jiang et al., 2019).

7. Controversies, Limitations, and Open Problems

One key phenomenon is the incompatibility of low KL exponents ( $\theta < 1/2$ ) with uniform Lipschitz gradient regularity in DC programming and related settings (Bento et al., 2024). Quadratic growth induced by Lipschitz continuity forbids such exponents locally, while mere gradient continuity admits them (e.g., in $|x|^{3/2}$ ). The precise boundaries between regularity, geometry, and achievable exponents remain an area of active research. Furthermore, in stochastic regimes, knowledge of KL/PL exponents and constants is necessary for optimal iteration scheduling; adaptivity without such knowledge is an open direction (Fatkhullin et al., 2022).

Generalization to composite, nonconvex, and nonsmooth settings has expanded the KL framework, but direct computation of exact moduli or exponents remains challenging except in structured (semi-)algebraic cases. Nonsmooth calculus, extensions to infinite-dimensional objective landscapes, and connections to error bounds and stability are ongoing themes.

In summary, the Kurdyka-Łojasiewicz inequality and its variants (generalized, exponent, and exact modulus) have become central to the unified geometric and analytic foundation for convergence analysis in nonconvex and nonsmooth optimization, offering explicit, computable, and often optimal rates as well as powerful regularity characterizations across a wide range of applied mathematical problems (Wang et al., 2020, Wang et al., 2021, Qian et al., 15 Apr 2025, Qian et al., 2022, Ahookhosh et al., 13 Nov 2025, Fatkhullin et al., 2022, Gerth et al., 2019, Dinh et al., 2015, Osińska-Ulrych et al., 2018, Bento et al., 2024).