Decentralized Nonsmooth Nonconvex Optimization
- Decentralized nonsmooth nonconvex optimization is a framework for collaboratively minimizing nondifferentiable and nonconvex objectives across networked agents.
- The approach leverages first-order, proximal, and gradient-free techniques to handle structured problems, including minimax and nonlinear constraints, with robust convergence guarantees.
- Recent advancements demonstrate rate-optimal convergence to generalized stationary points while improving communication efficiency via consensus and spectral gap methods.
Decentralized nonsmooth nonconvex optimization addresses the collaborative minimization or saddle-point problems of nonconvex and nonsmooth objectives distributed over a network of agents communicating under a specified topology. This area encompasses first-order, proximal, gradient-free, and stochastic methods, as well as minimax structures and nonlinear or coupled constraints. Recent advances have led to algorithms with provable global and sometimes rate-optimal convergence to generalized stationary points, often leveraging innovations in consensus control, randomized smoothing, subdifferential calculus, and online-to-nonconvex reduction.
1. Problem Formulation and Mathematical Setting
The canonical decentralized nonsmooth nonconvex optimization problem involves agents, each with access to a private local cost (possibly stochastic and nondifferentiable), seeking to minimize the aggregate
Communication is restricted by a network graph , often modeled by a symmetric, doubly-stochastic mixing matrix with spectral gap , where is the second largest eigenvalue. Agents may only communicate with their direct neighbors. In the nonsmooth nonconvex regime, each is typically assumed Lipschitz continuous (possibly via local stochastic samples), and optimization is with respect to the Goldstein (δ,ε)-stationarity criterion: where denotes the δ-Goldstein subdifferential, i.e., the convex hull of Clarke subgradients taken over a δ-ball around 0 (Chen et al., 27 Jan 2026, Lin et al., 2023, Sahinoglu et al., 2024). For structured problems, such as nonconvex–strongly concave minimax or composite objectives, additional local regularizers 1 and 2 may be included, and constraints may involve nonlinear or nonconvex couplings (Xu, 2023, Yang et al., 2020).
2. Algorithmic Methodologies
2.1. First-Order and Subgradient-Based Algorithms
Decentralized stochastic subgradient descent (DSGD) forms the primary baseline. Each agent updates its local copy 3 via a local subgradient 4 (possibly stochastic), combines updates from neighbors via 5, and averages accordingly: 6 with diminishing stepsizes 7 (Kungurtsev, 2019, Zhang et al., 2024). Ergodic convergence to Clarke-stationary (or stable set) points is guaranteed under standard SA assumptions.
2.2. Proximal and Composite Approaches
When the problem involves composite terms 8, with 9 nonsmooth (possibly nonconvex and proximable), Prox-DGD applies the operator splitting: 0 enabling treatment of 1 quasi-norms, SCAD, MCP, and indicator functions of (possibly nonconvex) sets (Zeng et al., 2016).
2.3. Gradient-Free and Zeroth-Order Methods
When only function value oracles (not gradients) are available, two-point randomized smoothing estimators are utilized. DGFM (Lin et al., 2023) and ME-DOL (Sahinoglu et al., 2024) build SUFFICIENTLY accurate surrogates via
2
enabling decentralized, gradient-tracking–augmented updates. Variance-reduced variants (e.g., DGFM3) further improve query complexity.
2.4. Minimax and Nonlinear Constrained Problems
Structured minimax problems (e.g., decentralized nonconvex–strongly-concave) require simultaneous minimization over 4 and maximization over 5, with possible nonsmooth regularizers. The D-GDMax method (Xu, 2023) reformulates the saddle-point to absorb dual consensus constraints into Lagrange multipliers, enabling exact local maximization (not just a gradient ascent step) and more aggressive stepsizes, decoupling consensus and nonsmoothness in dual variables. For nonlinear constraints (e.g., 6), methods such as PLDM use proximal linearization together with an augmented Lagrangian scheme to avoid exact local solves at each iteration (Yang et al., 2020).
3. Convergence, Complexity, and Theoretical Guarantees
3.1. Stationarity Measures
Owing to nonsmoothness and nonconvexity, stationarity is measured via:
- Clarke stationarity: 7 (Kungurtsev, 2019, Zhang et al., 2024)
- Goldstein-8: 9 (Chen et al., 27 Jan 2026, Lin et al., 2023, Sahinoglu et al., 2024)
- 0-critical KKT residuals for constrained problems (Yang et al., 2020).
3.2. Complexity Bounds
Optimal (dimension-independent) sample complexity for 1-stationarity in the decentralized, nonsmooth, nonconvex, stochastic first-order setting is 2 (Sahinoglu et al., 2024, Chen et al., 27 Jan 2026). For zeroth-order/gradient-free settings, the best results match the centralized bound up to a polynomial dimension factor: 3 Communication complexity generally matches sample complexity times a network-dependent factor (inverse spectral gap). Chebyshev-accelerated gossip contracts consensus error at 4 per iteration, where 5 is the mixing matrix spectral gap (Chen et al., 27 Jan 2026).
3.3. Global and Local Convergence
Asymptotic convergence (without nonasymptotic rates) is established for decentralized stochastic subgradient methods under mild assumptions via perturbed differential inclusion and Lyapunov methods (Kungurtsev, 2019, Zhang et al., 2024). PLDM and related augmented Lagrangian/prox-linear methods guarantee convergence to critical points under the Kurdyka–Łojasiewicz property, with possible linear (or sublinear) rates depending on the KL exponent (Yang et al., 2020, Zeng et al., 2016).
3.4. Complexity Comparison Table
| Algorithm | Setting | Sample Complexity |
|---|---|---|
| ME-DOL | First-/zeroth-order | 6 (Sahinoglu et al., 2024) |
| DGFM7 | Zeroth-order | 8 (Lin et al., 2023) |
| DOCS | First-/zeroth-order | 9, comm. 0 (Chen et al., 27 Jan 2026) |
| D-GDMax | Minimax, NCSC composite | 1 (Xu, 2023) |
| Prox-DGD | Proximable composite | 2 ergodic (convex), sublinear otherwise (Zeng et al., 2016) |
4. Fundamental Techniques: Smoothing, Consensus, and Subdifferential Calculus
4.1. Randomized Smoothing
Randomized smoothing approximates a nonsmooth 3 by 4, yielding a smooth surrogate whose gradient is in the Goldstein subdifferential: 5. This underpins both theoretical analysis and practical implementations in DGFM, ME-DOL, and related methods (Lin et al., 2023, Sahinoglu et al., 2024).
4.2. Consensus Mechanisms
Almost all decentralized algorithms deploy spectral-mixing (gossip or Metropolis) matrices to control disagreement. Temporally decaying step-sizes and Chebyshev-accelerated consensus (especially for communication-critical settings) are essential for provable convergence in sparse or poorly connected graphs (Kungurtsev, 2019, Chen et al., 27 Jan 2026). The spectral gap determines the rate of consensus contraction.
4.3. Gradient Tracking and Variance Reduction
Decentralized gradient-tracking adds auxiliary variables to enable the local recovery of global directional information, improving error contraction and sample complexity. Variance reduction via SPIDER or multi-batch schemes further improves efficiency in the stochastic zeroth-order regime (Lin et al., 2023).
4.4. Subgradient and Set-Valued Analysis
Owing to nonsmoothness, the analysis requires careful use of generalized subdifferentials (Clarke, Goldstein, or conservative field mappings), together with SA-based or differential inclusion arguments for convergence (Zhang et al., 2024).
5. Structured Problem Classes and Applications
5.1. Composite Minimax Optimization
D-GDMax targets decentralized nonconvex–strongly-concave minimax games with convex nonsmooth terms in both variables. Reformulation introduces local copies and dual variables, allowing aligned maximization and decoupled nonsmoothness handling, achieving improved complexity and global convergence guarantees (Xu, 2023).
5.2. Nonlinear and Coupled Constraints
PLDM addresses decentralized problems with nonlinear equality and bound constraints by combining local proximal linearization with Gauss–Seidel updates and adaptive penalty Lagrangian mechanisms (Yang et al., 2020). This technique avoids heavy local solves required in ADMM-like frameworks and enables provable convergence in coupled settings.
5.3. Empirical Benchmarks
Applications include:
- Distributionally robust logistic regression (minimax, D-GDMax) (Xu, 2023).
- Nonconvex SVM with capped-6 penalty and adversarial attacks (DGFM, ME-DOL, DOCS) (Lin et al., 2023, Chen et al., 27 Jan 2026, Sahinoglu et al., 2024).
- Federated deep neural network training (subgradient-based, ResNet on CIFAR, ReLU networks) (Zhang et al., 2024).
Empirical results consistently confirm the theoretical iteration/sample/communication advantages of the corresponding algorithms over previous baselines.
6. Open Directions and Future Work
Relevant challenges and future prospects include:
- Extension to general (merely concave or even nonconcave) dual variables in minimax problems (Xu, 2023).
- Acceleration via variance reduction, adaptive stepsizes, or momentum in nonsmooth/nonconvex decentralized regimes (Lin et al., 2023, Zhang et al., 2024).
- Precise trade-off analysis between communication, computation, and sample complexity, especially as network topology varies (Chen et al., 27 Jan 2026).
- Handling structured constraints using non-Euclidean prox setups, or relaxing smoothness/regularity conditions via advanced subgradient interpolation or bundle methods (Yang et al., 2020, Lin et al., 2023).
- Global rates beyond asymptotic guarantees, particularly 7 or faster for classes with additional structure (KL property, weak convexity) (Zeng et al., 2016, Yang et al., 2020).
Fundamental questions persist regarding information-theoretic lower bounds, robustness to heterogeneous stochasticity, and the design of adaptive, communication-efficient decentralized protocols capable of scaling to extremely large networks or high-dimensional Lipschitz nonconvex regimes.