Proximal and Contraction Methods in Optimization

Updated 1 December 2025

Proximal and contraction methods are analytical frameworks that leverage proximal operators and contraction mappings to ensure convergence and robustness in optimization problems.
They provide a unified approach for analyzing convex minimization, fixed-point theory, and variational inequalities through clear contraction properties and rate-optimality.
These methods extend to accelerated algorithms, variable metric techniques, and optimal transport, enhancing both theoretical insights and practical algorithmic performance.

The proximal and contraction method refers to a broad collection of analytical and algorithmic frameworks in mathematical optimization, variational analysis, nonlinear functional analysis, and related metric space theory, which leverage the interplay between proximity operators (proximal maps) and contractive mappings to guarantee convergence, stability, and rate-optimality of iterative procedures. These concepts unify discrete and continuous variational evolution, fixed point and best proximity point theory, and state-of-the-art accelerated optimization algorithms under a common abstract structure.

1. The Classical Proximal Point Method and Its Contraction Properties

The proximal point method (PPM) is a foundational iterative method for convex minimization. Given a closed convex function $f:\mathbb{R}^d\to\mathbb{R}$ and a stepsize $\eta>0$ , the update is

$x_{k+1} = \arg\min_{x\in\mathbb{R}^d} \left\{ f(x) + \frac{1}{2\eta}\|x - x_k\|^2 \right\}$

or equivalently in operator terms,

$x_{k+1} = (I + \eta\,\nabla f)^{-1}(x_k) =: \mathcal{R}_{\eta}(x_k)$

where $\mathcal{R}_{\eta}$ is called the resolvent of $\nabla f$ (Ahn et al., 2020). The resolvent $\mathcal{R}_{\eta}$ is nonexpansive under convexity and Lipschitz-gradient assumptions,

$\|\mathcal{R}_{\eta}(x) - \mathcal{R}_{\eta}(y)\| \leq \|x - y\| \;\; \forall x,y$

and strictly contractive in the strongly convex case,

$\|\mathcal{R}_{\eta}(x) - \mathcal{R}_{\eta}(y)\| \leq \frac{1}{1 + \mu\eta}\|x - y\|$

with strong convexity parameter $\mu$ .

PPM's contraction properties enable robust convergence analyses, unifying the convergence rates for a variety of algorithms, including Nesterov's accelerated gradient method via first-order approximations and Lyapunov function techniques (Ahn et al., 2020). The contraction metric is essential both in classical Euclidean space and in more general metrics, e.g., the 2-Wasserstein distance for measure-valued flows (Carlen et al., 2012).

2. Proximal Contraction Principles in Metric Fixed Point and Best Proximity Theory

In metric fixed point theory, the notion of a proximal contraction generalizes Banach's contraction principle to non-self-mappings between two sets $A,B$ in $(X,d)$ . Let $d(A,B) = \inf\{ d(a,b) : a\in A, b\in B \}$ , and define $A_0 = \{a\in A : d(a,b) = d(A,B) \text{ for some } b\in B\}$ (Fernández-León, 2012, Som, 2021). A mapping $T:A\to B$ is a proximal contraction of the first kind if there exists $0\leq\alpha<1$ such that

$d(u_1, u_2) \leq \alpha\, d(x_1, x_2)$

for all $u_1,u_2,x_1,x_2\in A$ with $d(u_1, T x_1) = d(u_2, T x_2) = d(A,B)$ . Analogous second-kind definitions measure contraction in the $T$ -image.

The cornerstone theorem asserts that, under completeness, closedness (or approximate compactness), and technical image conditions, such a $T$ has a unique best proximity point, i.e., $x^*\in A$ with $d(x^*,T x^*) = d(A,B)$ . The Picard-type iteration

$x_{n+1}\in A_0 \text{ with } d(x_{n+1}, T x_n) = d(A,B)$

converges geometrically to $x^*$ (Fernández-León, 2012, Som, 2021). When $A=B$ , this result reduces to the Banach contraction fixed-point theorem.

Extensions include implicit relation approaches (Mondal et al., 2020), generalized $\theta$ - $\phi$ proximal contractions (Rossafi et al., 2023), and modified "proximally closed/completeness" frameworks (Alam, 2024), relaxing classical closedness and compactness demands and enabling applications to variational inequalities under weaker metric assumptions.

3. Accelerated and Contracting Proximal Algorithms in Convex Optimization

The contraction structure of the PPM extends naturally to the design of accelerated algorithms, particularly via two perspectives:

Approximate resolvent via first-order models: Nesterov's acceleration is interpreted as a first-order surrogate of PPM, alternating between a reckless gradient step and a conservative proximal update, yielding classical momentum forms with convergence rates $O(1/k^2)$ (convex) and geometric rates $(1-\sqrt{\mu/L})^k$ (strongly convex) (Ahn et al., 2020).
Contracting Proximal Methods with Variable Metrics: Contracted versions of the objective (a.k.a. contracted-proximal steps) take the form

$v_{k+1} \approx \arg\min_x \left\{ f(\lambda x + (1-\lambda)v_k) + \psi(x) + \frac{1}{\alpha} D_h(v_k; x) \right\}$

with $\lambda\in(0,1]$ , regularization parameter $\alpha>0$ , and Bregman divergence $D_h$ . The contraction in $x\mapsto f(\lambda x + (1-\lambda)v_k)$ induces strong convexity in the subproblem, ensuring rapid convergence when paired with an outer momentum/averaging loop (Doikov et al., 2019).

Variable metric variants, prediction-correction methods, and inertial and relaxation modifications further expand the class, yielding formal $O(1/k^{p+1})$ or $O(1/k^2)$ rates depending on problem structure, and geometric rates under uniform convexity (Wang et al., 2023, Nwakpa et al., 23 Nov 2025). These frameworks are unified via Lyapunov-style descent identities exploiting the contraction in the proximal operator's update.

4. Wasserstein-Proximal Maps and Gradient Flows in Optimal Transport

The proximal point and contraction principles admit a direct extension to metric spaces of probability measures equipped with the 2-Wasserstein metric $W_2$ . Here the Moreau–Yosida regularization and its associated proximal map are defined by

$E_\tau(\mu) = \inf_{\nu} \left\{ \tfrac{1}{2\tau} W_2^2(\mu, \nu) + E(\nu) \right\}$

$\prox_{\tau E}(\mu) = \arg\min_\nu \left\{ \tfrac{1}{2\tau} W_2^2(\mu, \nu) + E(\nu) \right\}$

for a proper, coercive, lower semicontinuous, $\lambda$ –convex functional $E$ (Carlen et al., 2012). The contraction property generalizes to

$\Lambda_\tau(\mu_\tau, \nu_\tau) \leq \Lambda_\tau(\mu, \nu)$

with

$\Lambda_\tau(\mu, \nu) = W_2^2(\mu, \nu) + \tfrac{\tau^2}{2}|\nabla_W E(\mu)|^2 + \tfrac{\tau^2}{2}|\nabla_W E(\nu)|^2$

where $|\nabla_W E|$ is the metric slope. This ensures monotonic decrease under discrete Wasserstein flows, fundamental to the analysis of gradient flows and evolution equations in the space of measures. For quadratic energies, the contraction constant is explicit: $\prox_{\tau E}(\mu) = (1+\lambda\tau)^{-1}\, \text{id}_\# \mu$ ensures $W_2^2(\mu_\tau, \nu_\tau) = (1+\lambda\tau)^{-2} W_2^2(\mu, \nu)$ (Carlen et al., 2012).

Further, discrete-time flows generated by such proximal maps reproduce fine properties of the associated continuum PDEs, e.g., the porous medium/fast diffusion equations and the invariance and contraction of Barenblatt profiles.

5. Applications to Variational Inequalities and Composite Problems

The proximal and contraction framework underpins existence and uniqueness results for variational inequality problems and monotone inclusion in Hilbert spaces (Alam, 2024, Nwakpa et al., 23 Nov 2025). A canonical result states:

Given a monotone, $\beta$ -Lipschitz continuous $T:\mathcal{H}\to\mathcal{H}$ and proper convex $g$ , the iterative scheme

$x_{n+1} = \prox_{\lambda_n g}(x_n - \lambda_n T x_n)$

with suitable adaptive step sizes and possibly inertial, correction, and relaxation terms, produces weakly convergent sequences to solutions of the mixed variational inequality

$\langle T x^*, u - x^* \rangle + g(u) - g(x^*) \geq 0 \quad \forall u$

The contraction of the composite proximal mapping ensures convergence, even under nontrivial problem geometry and weak assumptions (Nwakpa et al., 23 Nov 2025). Algorithmic enhancements (inertial/correction/relaxation) are shown to accelerate convergence and damp numerical oscillations in practice.

6. Generalizations and Algorithmic Patterns

The proximal-and-contraction paradigm extends to:

Nonlinear convex programs with nonlinear (possibly non-Euclidean) constraints, via variable-metric PPA and prediction–correction schemes (Wang et al., 2023).
Min-max and saddle-point problems: contraction-based approximations of implicit PP steps enable near-optimal O(1/T) ergodic and O(1/√T) last-iterate rates with explicit error tracking (Cevher et al., 2023).
Implicit relation (e.g., $\mathcal{A}$ , $\mathcal{A}'$ -type) contraction mappings, best proximity principles under generalized nonexpansive and control function schemes (Mondal et al., 2020, Rossafi et al., 2023).

These generalizations leverage contraction mappings (possibly abstract or metric/prox-based) as the structural foundation for both theoretical guarantees and computable algorithms, frequently with strong rate results and flexible applicability across problem domains. Many classical results in fixed-point theory, optimization, variational analysis, and PDE theory are recovered as special or limiting cases.

7. Impact, Open Directions, and Unification

The proximal and contraction method acts as a unifying theme in the analysis of modern iterative algorithms and variational problems. Its reach spans:

Theoretical unification of Nesterov-type acceleration, Lyapunov proofs, and classical fixed-point/fixed-gap theorems (Ahn et al., 2020, Fernández-León, 2012).
Extension to non-Euclidean, infinite-dimensional, and measure-theoretic settings (Carlen et al., 2012).
Unified convergence credentials for diverse algorithmic structures, including variable metric, inertial, and adaptive steps (Doikov et al., 2019, Nwakpa et al., 23 Nov 2025).
Foundations for accelerated algorithms beyond deterministic convex optimization—to stochastic, nonconvex, composite, and saddle-point problems (Cevher et al., 2023, Wang et al., 2023).

Current research includes further weakening of metric and completeness conditions, extension to generalized and implicit contractive frameworks, and direct design of new iterative algorithms for challenging nonlinear and high-dimensional settings. The common algebraic and geometric structures induced by contraction/proximality continue to drive progress in discrete and continuous optimization, equilibrium problems, and analysis of evolution equations.