Time-Smoothing Tikhonov Regularization

Updated 18 January 2026

Time-smoothing Tikhonov regularization is a method that adds a decaying quadratic penalty to convex optimization, guiding trajectories toward minimum-norm solutions.
It modifies continuous and discrete dynamics by enforcing early strong convexity, which stabilizes the process while achieving fast decay of function values and gradients.
This approach underpins accelerated optimization, stochastic approximation, and inverse problems by unifying inertial dynamics with robust convergence guarantees.

Time-smoothing Tikhonov regularization refers to the incorporation of a time-dependent (typically vanishing) regularization parameter into continuous and discrete optimization dynamics, often for convex minimization in infinite-dimensional Hilbert or Banach spaces. The core idea is to add a Tikhonov-type quadratic penalty whose strength decays smoothly over time, thereby initially enforcing strong convexity or stability, but asymptotically allowing the dynamics to converge to solutions of the original unregularized problem—preferably to the minimum-norm minimizer when the solution set is non-unique. This regularization paradigm is central to accelerated and stabilized dynamics for convex (possibly nonsmooth) optimization, stochastic approximation, and regularized inverse problems.

1. Fundamental Principles and Definitions

Time-smoothing Tikhonov regularization modifies either discrete optimization algorithms (e.g., stochastic gradient descent) or continuous-time flows by adding a time-dependent penalty term $\varepsilon(t)\|x\|^2$ (or $\lambda_k\|x\|^2$ for discrete $k$ ) to the original objective $f(x)$ . The penalty coefficient $\varepsilon(t)\to 0$ as $t\to\infty$ , ensuring that the regularization does not bias the true solution at equilibrium but biases the trajectory toward the minimum-norm minimizer during the evolution.

In the continuous-time context, prototypical systems studied are:

First-order dynamics (gradient/Tikhonov flow):

$\dot{x}(t) + \nabla f(x(t)) + \varepsilon(t) x(t) = 0.$

Second-order inertial systems:

$\ddot{x}(t) + a\,\dot{x}(t) + B\,\nabla^2\varphi_{\lambda(t)}(x(t))[\dot{x}(t)] + b(t)\nabla\varphi_{\lambda(t)}(x(t)) + \varepsilon(t)x(t) = 0,$

where $\varphi_{\lambda}$ is the Moreau envelope of $f$ of index $\lambda$ , and $b(t)$ is a time-scaling factor (Csetnek et al., 2022, Bot et al., 2019, Bagy et al., 2024, Attouch et al., 2022).

Discrete-time versions include regularized SGD (reg-SGD):

$X_k = X_{k-1} - \alpha_k(\nabla f(X_{k-1}) + \lambda_k X_{k-1} + D_k),$

with $D_k$ representing noise and $\lambda_k \to 0$ (Kassing et al., 16 May 2025).

2. Core Theoretical Results

Under suitable choices of $\varepsilon(t)$ or $\lambda_k$ schedules and step-size parameters, time-smoothing Tikhonov regularization achieves three principal properties:

Fast function value and gradient decay: For continuous second-order systems (with appropriately tuned damping and time-scaling), one achieves

$f(x(t)) - \min f = O(1/t^2 b(t)) \quad\text{and}\quad \|\nabla f(x(t))\| = O(1/(t b(t))),$

where $b(t)$ is the time-scaling parameter (Csetnek et al., 2022).

Strong convergence to the minimum-norm solution: When $\int t\,\varepsilon(t)\,dt < \infty$ and additional technical conditions hold, the trajectory converges strongly to $x^* = \operatorname{proj}_{\arg\min f}(0)$ (Csetnek et al., 2022, Bot et al., 2019). For reg-SGD, if $\sum \alpha_k \lambda_k = \infty$ and $\lambda_k \to 0$ sufficiently slowly,

$X_k \to x_* \quad \text{almost surely} [2505.11434].$

Stability and controlled trajectories: The addition of $\varepsilon(t)x(t)$ or $\lambda_k X_{k-1}$ ensures early strong convexity, preventing large excursions, while its vanishing ensures the regularization bias disappears asymptotically.

The following table summarizes key convergence properties from representative models:

System/Algorithm	Function Value Rate	Strong Convergence Condition
Second-order ODE (Csetnek et al., 2022)	$o\left(1/(t^2 b(t))\right)$	$\int t\,\varepsilon(t)\,dt<\infty$
reg-SGD (Kassing et al., 16 May 2025)	$O(k^{-\min(p, q-p)})$	$\sum\alpha_k \lambda_k = \infty$
First-order ODE (Bagy et al., 2024)	$O(1/\beta(t))$	$\dot\beta(t)/\beta(t)\to 0$

For the choice $\varepsilon(t) = t^{-\gamma}$ with $1<\gamma<2$ , both $O(1/t^2)$ rates and strong convergence are guaranteed in a wide class of inertial and gradient systems (Bot et al., 2019, Attouch et al., 2022). The Lyapunov analysis in these works demonstrates that energy functionals incorporating the regularization yield differential inequalities whose integrability and decay rates lead to the desired convergence properties.

3. Variational and Dynamical Formulations

Time-smoothing Tikhonov regularization appears in several closely related forms:

Moreau envelope embedding: For nonsmooth convex $f$ , the Moreau envelope $\varphi_\lambda$ regularizes $f$ and allows for dynamics involving $\nabla \varphi_\lambda(x)$ , which is $C^1$ and globally Lipschitz (Csetnek et al., 2022).
Inertial (second-order) and geometric damping: Hessian-driven damping terms, such as $B\,\nabla^2\varphi_{\lambda(t)}(x)\dot{x}$ , enhance stability and allow for faster rates with reduced oscillations (Bot et al., 2019, Attouch et al., 2022).
Temporal scaling: The explicit time scaling $b(t)$ multiplies the descent direction, allowing arbitrarily accelerated rates ( $O(1/(t^2 b(t)))$ ) while the Tikhonov term ensures projection toward minimum-norm solutions (Csetnek et al., 2022, Bagy et al., 2024).
Stochastic approximation: In reg-SGD, vanishing regularization schedules can be optimized together with learning rates to balance bias, variance, and stability (Kassing et al., 16 May 2025).

In inverse problems, variational regularization in Lebesgue-Bochner spaces incorporates both spatial and temporal regularization, with penalties on norms of $u$ and its time-derivative, enabling distinct space-time smoothing (Sarnighausen et al., 12 Jun 2025).

4. Proof Techniques and Lyapunov Analysis

The convergence analysis for time-smoothing Tikhonov schemes is typically based on carefully constructed Lyapunov functionals that combine potential gaps $f(x(t))-\min f$ , kinetic terms (e.g., $\|\dot{x}(t)\|^2$ ), and regularization terms ( $\varepsilon(t)\|x(t)\|^2$ ). Essential steps are:

Differentiation along trajectories: Standard identities such as

$\frac{d}{dt} \varphi_{\lambda(t)}(x(t)) = \langle\nabla\varphi_{\lambda(t)}(x(t)), \dot{x}(t)\rangle + \dot{\lambda}(t)\partial_\lambda \varphi_\lambda(x)$

are used in the analysis (Csetnek et al., 2022).

Decay via integral bounds: Integrability of $\varepsilon(t)$ and sign conditions on time-scaling and damping coefficients ensure the Lyapunov functional decays and trajectories remain bounded.
Time-scaling and auxiliary minimizers: Comparing $x(t)$ to the Tikhonov-regularized minimizer $x_{\varepsilon(t)}$ , and quantifying their separation, quantifies the bias and ultimately guarantees convergence to the minimum-norm element (assuming the regularization vanishes slowly enough).

This analysis framework extends to both continuous and discrete settings, with Lyapunov or supermartingale arguments adapted as appropriate (Kassing et al., 16 May 2025, Attouch et al., 2022).

5. Applications and Numerical Evidence

Time-smoothing Tikhonov regularization has been verified in controlled experiments and practical settings:

Synthetic and toy problems: For nonsmooth convex objectives (e.g., $f(x) = |x| + x^4$ ), increasing the time-scaling $b(t)$ empirically accelerates the decay of function values and gradients, matching the theoretical rates $O(1/(t^2 b(t)))$ (Csetnek et al., 2022).
Piecewise-linear objectives: In problems where $\arg\min f$ is non-unique (e.g., $[-1,1]$ ), lack of Tikhonov regularization leads to merely weak convergence, whereas adding $\varepsilon(t)x(t)$ yields strong convergence to the minimum-norm solution (Csetnek et al., 2022).
Inverse problems: In dynamic computerized tomography and time-dependent parameter identification, incorporating time-derivative (temporal smoothing) penalties in Banach/Lebesgue-Bochner space frameworks yields improved stability, lower relative error, and better PSNR compared with frame-wise or spatial-only regularization approaches (Sarnighausen et al., 12 Jun 2025).
Stochastic optimization: reg-SGD with polynomially decaying regularization achieves pathwise convergence to the minimum-norm solution in both noiseless and noisy linear inverse problems. Explicit rates in terms of schedule exponents $p$ and $q$ recover the best known convergence rates for convex stochastic algorithms (Kassing et al., 16 May 2025).

A key empirical finding is the trade-off: rapid decay of the Tikhonov parameter accelerates value convergence but may slow or prevent strong convergence unless the integrability and sign conditions are matched carefully.

Time-smoothing Tikhonov regularization unifies and generalizes several themes:

Nesterov-type acceleration: Second-order and inertial methods extended with time-varying regularization balance fast value decay and minimum-norm selection (Attouch et al., 2022).
Heavy-ball methods and strong convexity analogy: Vanishing regularization mimics an asymptotically vanishing strong convexity, interpolating between exponentially fast (strongly convex) rates and optimal $O(1/t^2)$ rates for general convex functions.
General Banach space settings: Extensions to non-Hilbert geometries use variational regularization in Lebesgue-Bochner spaces, exploiting duality maps and smoothness-of-power-type for flexible space-time smoothing (Sarnighausen et al., 12 Jun 2025).
Stability and instability mitigation: Time-smoothing Tikhonov terms control the transient dynamics and stabilize both continuous and discrete iterative schemes, avoiding the excessive oscillations of unregularized inertial heuristics or the drift in standard SGD (Kassing et al., 16 May 2025).

Ongoing research explores variants such as closed-loop (state-dependent) regularization, adaptive time-scaling, discrete symplectic discretizations, and extensions to nonlinear and nonconvex objectives (Bot et al., 2019, Attouch et al., 2022).

7. Significance and Outlook

Time-smoothing Tikhonov regularization is now a central tool in continuous optimization theory, stochastic algorithms, and regularized inverse problems. Its appeal lies in providing—via decaying, carefully tuned quadratic penalties—a unified principle for:

Guaranteeing strong convergence to minimum-norm solutions,
Preserving or improving fast rates for function value and gradient decay,
Stabilizing dynamics against noise or instability,
Flexible adaptation to both deterministic and stochastic and infinite-dimensional settings.

This approach, combining explicit temporal control of regularization and appropriate inertial, damping, or scaling strategies, continues to inform the development of optimization methods with both theoretical guarantees and practical impact across mathematical optimization, inverse problems, and machine learning (Csetnek et al., 2022, Kassing et al., 16 May 2025, Bot et al., 2019, Sarnighausen et al., 12 Jun 2025, Bagy et al., 2024, Attouch et al., 2022).