Prox-Convex Functions Overview
- Prox-convex functions are a generalized form of convex functions that guarantee a unique, firmly nonexpansive proximity operator even in nonconvex settings.
- They underpin various proximal algorithms, such as the proximal point method and splitting techniques, ensuring convergence and robust performance.
- These functions bridge convex, weakly convex, and difference-of-convex paradigms and are vital in applications like image recovery and variational inequality analysis.
A prox-convex function is a generalized convexity notion applied to functions for which the proximity operator is single-valued, firmly nonexpansive, and admits suitable descent-type inequalities, often even in certain nonconvex settings. Prox-convexity subsumes convex functions and many weakly convex, quasiconvex, or difference-of-convex (DC) classes, but does not coincide with any of them. This notion is foundational to proximal point algorithms and their extensions to composite or nonconvex problems, enabling both theoretical guarantees and practical algorithms that go beyond the classical convex regime. Recent research elaborates diverse definitions, properties, structural results, algorithmic templates, and practical scenarios in which prox-convexity is essential.
1. Formal Definitions and Basic Properties
Prox-convexity is defined via the behavior of the proximity operator. For a proper, lower semicontinuous function and a closed set , is prox-convex on with constant if for every , the subproblem
has a unique solution , and for all , the inequality
holds (Grad et al., 2021). This captures both existence and a firm nonexpansivity property for the prox mapping. Convex functions furnish prox-convexity automatically with ; prox-convexity extends to weakly convex (-weakly convex) functions with (Davis et al., 2019).
Prox-convex functions strictly include some nonconvex cases, e.g., on (Grad et al., 2021). The proximity operator for a prox-convex function is always single-valued, and the associated map is firmly nonexpansive:
for all .
2. Connections to Generalized Convexity Paradigms
Prox-convexity relates to—but does not coincide with—several important generalized convexity classes:
- Convex functions: Every convex function is prox-convex with modulus 1.
- Weakly convex functions: If is convex (-weakly convex), then is prox-convex for any (Davis et al., 2019, Grad et al., 2021).
- Quasiconvex and DC functions: Many (strongly) quasiconvex and difference-of-convex functions are prox-convex, but the inclusion is not bidirectional (Grad et al., 2021), as certain counterexamples demonstrate.
- Semi-algebraic and o-minimal losses: Prox-convex models encompass semi-algebraic structures, e.g., penalties, ReLU activations, nuclear norm surrogates, and truncated quadratic clustering losses (Davis et al., 2019).
A summary of containment relationships:
| Class | Contains prox-convex | Subset of prox-convex | Intersection nontrivial |
|---|---|---|---|
| Convex | Yes | Yes | Yes |
| Weakly convex | Yes | Yes | Yes |
| DC functions | No | No | Yes |
| Strongly quasiconvex | No | No | Yes |
3. Proximal Operators and Decomposition
For on a Hilbert space, the proximal operator of their sum satisfies a decomposition:
where is defined as the unique such that (Adly et al., 2017). This naturally generalizes the classical Douglas-Rachford splitting, aids variational sensitivity analysis, and gives tractable fixed-point algorithms for sums of prox-convex functions.
For composite functions (with convex, smooth, components convex), the prox-convex approach forms a convex subproblem at each iteration by linearizing only the smooth maps and keeping convex terms exact, with strong convexification via a metric (Uzun et al., 22 Dec 2025). This structure enables robust global convergence and local Q-linear contraction.
4. Algorithmic Frameworks for Prox-Convex Minimization
Classical and modern proximal algorithms leverage prox-convexity for both convex and certain nonconvex objectives. Key schemes include:
- Proximal Point Algorithm (PPA): Iteratively solves . For prox-convex , PPA yields monotonic decrease, bounded iterates in sublevel sets, and Moreau gap rate (Grad et al., 2021). Convergence to stationary points is guaranteed on closed convex sets.
- Splitting PPA: For (each prox-convex, Lipschitz), splitting PPA evaluates individual steps in cyclic or random order (Brito et al., 11 Jan 2026). Both deterministic and stochastic variants admit global convergence; the stochastic variant exploits supermartingale arguments for almost sure convergence.
- Composite/Relaxed Proximal Algorithms: For , the prox-convex step (typically,
with ) preserves all convex structure and achieves complexity for prox-gradient residuals, with Q-linear rates under local error bounds (Uzun et al., 22 Dec 2025).
- Douglas-Rachford and other fully proximal splitting: These methods employ only proximal activations, which are now feasible with closed-form prox operators established for many smooth convex penalties (Combettes et al., 2018).
5. Interpretation of Prox-Convexity in Applications and Composite Optimization
Prox-convexity is instrumental in numerous domains:
- Weakly Convex/Nonsmooth Optimization: Proximal algorithms can escape strict saddles and converge to local minima for weakly convex functions satisfying the strict-saddle property, particularly in problems with semi-algebraic loss landscapes (Davis et al., 2019). This underpins robust minimization in modern machine learning and signal processing.
- Image Recovery and Structured Inverse Problems: Fully proximal activation, enabled by closed-form prox operators for smooth terms, substantially accelerates convergence in image deconvolution, reconstruction, interpolation, and inconsistent feasibility relaxation. Practical comparisons consistently show that the fully proximal strategy outperforms mixed gradient/proximal approaches (Combettes et al., 2018).
- Variational Inequality Sensitivity: The decomposition formula for the sum of two convex (or prox-convex) functions computes directional derivatives of solution maps for linear variational inequalities, integrating seamlessly with established sensitivity analysis frameworks (Adly et al., 2017).
6. Comparison Principles, Determination, and Lipschitz Characterization
Proximal mapping properties encode substantial information about the underlying function:
- Comparison principles: If for all , then pointwise (Vilches, 2020).
- Determination by Proximal Norm: The pointwise norm uniquely determines a convex : if two convex functions have matching norms for all , they differ only by an additive constant. This equivalence extends through minimal-norm subgradients and Moreau envelopes (Vilches, 2020).
- Lipschitz Characterization: Convex is -Lipschitz iff for all (Vilches, 2020).
These results facilitate function identification and classification purely through observed proximal dynamics.
7. Extensions in Dual Averaging and Prox-Functions
Dual averaging schemes commonly require a prox-function that is strongly convex; recent work relaxes this, establishing rates under prox-convex-like assumptions:
- Prox-Compact + Domain Inclusion: Strong convexity on the compact region containing iterates suffices, provided certain primal/dual domain inclusion conditions hold (Zhao, 4 Apr 2025).
- Dual-Monotonicity and Open Domain: When the Fenchel dual domain of is open, dual monotonicity can be enforced for convergence (Zhao, 4 Apr 2025).
- Function Classes: Many entropic, barrier, and "indicator + barrier" combinations meet these requirements, broadening the set of viable regularizers in machine learning and structured optimization.
8. Discrete-Choice Prox-Functions and Specialized Models
Discrete-choice geometry induces a specialized class of prox-functions on the simplex, derived as convex conjugates of surplus (log-sum) functions arising in random utility models (Müller et al., 2019). These prox-functions are strongly convex with respect to norm (by Legendre duality) and admit closed-form mirror descent updates with probabilistic interpretation in terms of choice probabilities. Strong convexity parameters are computed explicitly for generalized extreme value and nested logit models. Such constructions enable dual-averaging schemes with convergence in the duality gap, naturally fitting economic application contexts.
References
- (Grad et al., 2021, Davis et al., 2019, Adly et al., 2017, Brito et al., 11 Jan 2026, Uzun et al., 22 Dec 2025, Vilches, 2020, Zhao, 4 Apr 2025, Combettes et al., 2018, Müller et al., 2019)