Papers
Topics
Authors
Recent
Search
2000 character limit reached

Levenberg-Marquardt Optimization

Updated 17 February 2026
  • Levenberg-Marquardt is an iterative method for nonlinear least-squares that dynamically adjusts a damping parameter to balance between Gauss–Newton and gradient descent.
  • It achieves robust local convergence with linear to quadratic rates and is applied in parameter estimation, machine learning, and inverse problems.
  • Enhancements like adaptive damping, singular scaling, and matrix-free implementations boost its stability and performance for ill-conditioned and large-scale issues.

The Levenberg–Marquardt (LM) optimization algorithm is a prominent iterative method for solving nonlinear least-squares problems. It serves as an interpolation between the Gauss–Newton algorithm and gradient descent by dynamically adjusting a damping (regularization) parameter. Across its diverse applications—including parameter estimation, machine learning, inverse problems, and trajectory design—the LM method is characterized by robust local convergence and the ability to control update magnitudes, making it especially effective in scenarios involving ill-conditioned or highly nonlinear models.

1. Mathematical Foundations and Classical Formulation

The standard LM method addresses the minimization of sum-of-squares objectives of the form

minxRn f(x)=12F(x)2,\min_{x\in\mathbb{R}^n}~f(x) = \frac{1}{2} \|F(x)\|^2,

where F:RnRmF:\mathbb{R}^n\to\mathbb{R}^m is twice continuously differentiable. The method forms a local quadratic (Gauss–Newton) model at iterate xkx_k: mk(s)=12F(xk)+Jks2+12λks2,m_k(s) = \frac{1}{2} \|F(x_k) + J_k s\|^2 + \frac{1}{2} \lambda_k \|s\|^2, with Jk=J(xk)=F(xk)J_k = J(x_k) = \nabla F(x_k) and a scalar damping parameter λk>0\lambda_k > 0. The LM update is obtained by solving

(JkTJk+λkI)sk=JkTF(xk),(J_k^T J_k + \lambda_k I) s_k = -J_k^T F(x_k),

so that xk+1=xk+skx_{k+1} = x_k + s_k (Bergou et al., 2020, Philipps et al., 2020). As λk0\lambda_k\to 0, LM becomes Gauss–Newton; as λk\lambda_k \to \infty, it recovers gradient descent.

In maximum likelihood estimation for exponential families, LM generalizes to

θ(t+1)=θ(t)[Hl(θ(t))+γ(t)P(θ(t))]1s(θ(t)),\theta^{(t+1)} = \theta^{(t)} - [H_l(\theta^{(t)}) + \gamma^{(t)}P(\theta^{(t)})]^{-1} s(\theta^{(t)}),

where HlH_l is the observed Hessian and PP is a user-chosen negative-definite penalty, often P=diagHlP = \operatorname{diag} H_l (Giordan et al., 2014).

2. Damping Parameter Adaptation and Trust-Region Interpretation

The choice and update of the damping parameter λk\lambda_k critically affect LM performance. Standard update strategies involve decrease of λk\lambda_k when a step yields sufficient reduction in f(x)f(x) (step acceptance), and increase otherwise:

  • If f(xk+sk)<f(xk)f(x_k + s_k) < f(x_k), then λk+1=ρλk\lambda_{k+1} = \rho \lambda_k, ρ(0,1)\rho \in (0,1);
  • Else, λk+1=τλk\lambda_{k+1} = \tau \lambda_k, τ>1\tau > 1 (Protic et al., 2021). The "gain ratio"

ρk=f(xk)f(xk+sk)mk(0)mk(sk)\rho_k = \frac{f(x_k) - f(x_k + s_k)}{m_k(0) - m_k(s_k)}

can be used to refine updates, as in λk+1=λkmax{1/3,1(2ρk1)3}\lambda_{k+1} = \lambda_k \max\{1/3, 1-(2\rho_k-1)^3\} if ρk>0\rho_k > 0 (Giordan et al., 2014, Philipps et al., 2020, Nadjiasngar et al., 2011). This dynamic adjustment underpins the trust-region interpretation of LM, constraining step size to maintain stability and guarantee monotonic decrease of the residual norm.

3. Convergence Properties: Local and Global Behavior

The LM method is globally convergent under mild regularity and bounded level-set assumptions. For the unconstrained case, if FF is twice differentiable and the Jacobian is Lipschitz and bounded, the sequence {xk}\{x_k\} converges to stationary points of ff, and an upper bound on the number of iterations to achieve gradient norm ϵ\leq \epsilon is O~(ϵ2)\widetilde{O}(\epsilon^{-2}) (Bergou et al., 2020).

Local convergence is guaranteed to be at least linear, and becomes quadratic in the so-called "zero-residual" regime (true solution F(x)=0F(x_*) = 0) provided the error-bound condition holds: dist(x,X)MF(x)F(xˉ),\operatorname{dist}(x, X^*) \leq M \|F(x) - F(\bar{x})\|, where XX^* is the set of minimizers. In the nonzero-residual case, the rate is linear (Bergou et al., 2020). For nonlinear inverse problems and certain regularized settings, it is possible to ensure quadratic local convergence by adaptively choosing regularization to match the data misfit (Daijun et al., 2015).

4. Algorithmic Enhancements and Generalizations

Numerous variants of the classical LM algorithm have been developed to address specific difficulties:

  • Singular Scaling: Incorporating possibly rank-deficient regularization matrices LTLL^T L to enforce smoothness or problem structure, notably in inverse problems with ill-posedness (Boos et al., 2023).
  • Robust Estimation: Employing LM within robust objective functions, such as LOVO (Lower Order-value Optimization), to downweight outliers. Here, at each iteration, the algorithm restricts to residuals of lowest magnitude, updating the active set and recalculating the LM step (Castelani et al., 2019).
  • q-Generalization: Replacing classical derivatives with q-derivatives to enhance global search properties and escape shallow local minima (Protic et al., 2021).
  • Accelerated-Proximal Schemes: For composite minimization (e.g., nonsmooth plus smooth-composite objectives), the "prox-linear" or generalized LM method solves damped, strongly convex subproblems with accelerated gradient sub-solvers, achieving quadratic convergence under quadratic growth conditions and providing sharp bounds on oracle complexity (Marumo et al., 2022).
  • Matrix-Free and Distributed Implementations: Large-scale settings benefit from algorithms leveraging dynamic programming and block-structured decompositions to parallelize the LM step or use Jacobian-vector products only (Haring et al., 2022, Bergou et al., 2020).

5. Applications Across Domains

The LM algorithm finds use across a broad range of fields:

6. Practical Implementation and Performance Considerations

Implementations of LM must address several key concerns:

  • Jacobian Computation: Analytical expressions, automatic differentiation, finite-differences, or quasi-Newton (e.g., Broyden) updates may be used, with structured or block-sparse assembly wherever possible (Transtrum et al., 2012).
  • Stopping Criteria: Convergence is typically declared when parameter increments, gradient norm, or objective changes fall below prescribed thresholds (e.g., f(x)<ϵ\|\nabla f(x)\| < \epsilon), or a maximum iteration count is reached (Bergou et al., 2020, Philipps et al., 2020).
  • Globalization: Line-search or trust-region strategies ensure sufficient decrease and safeguard against divergence. Nonmonotone acceptance rules or heuristic step rejections are recommended for constrained or ill-conditioned scenarios (Bergou et al., 2020).
  • Parallelization: Large-scale problems and computational bottlenecks in Jacobian or Hessian computation are alleviated via distributed or blockwise solution techniques (Haring et al., 2022, Philipps et al., 2020).
  • Tuning of Regularization: The choice of initial damping, regularization matrix, or adaptive weighting in the cost function has substantial impact. Model-informed scaling (e.g., λk=F(xk)2\lambda_k = \|F(x_k)\|^2) and specialized adaptive schemes are preferred (Boos et al., 2023).
  • Robustness: The R package marqLevAlg integrates stringent stopping rules—including parameter, objective, and Hessian-based tests—to reduce the risk of premature convergence at stationary but nonoptimal points (Philipps et al., 2020).

Table: Summary of Damping Parameter Adaptation in Representative LM Variants

Variant / Paper Damping Update Rule Empirical Regime
Classical LM (Bergou et al., 2020) Increase on step rejection, decrease on acceptance (trust-region test) Local quadratic global
Exponential Family (Giordan et al., 2014) Gain ratio-based: γt+1=γtmax[1/3,1(2ρ1)3]\gamma_{t+1} = \gamma_t \max[1/3, 1-(2\rho-1)^3] Fast/stable
Robust LM-LOVO (Castelani et al., 2019) Increase or decrease proportional to current gradient norm Handles outliers
Generalized LM (Marumo et al., 2022) Backtracking, increase until sufficient descent achieved Composite objectives
Heat/Inverse (Boos et al., 2023) λk=F(xk)2\lambda_k = \|F(x_k)\|^2, stepsize via Armijo or fixed Smoothness regularized

7. Comparative Empirical Evidence

Empirical evaluations consistently demonstrate that the LM algorithm, particularly with adaptive damping and curvature-informed scaling, outperforms naive Gauss–Newton or basic first-order methods when the objective is nonlinear least-squares with moderate to strong nonlinearity or ill-conditioning. For example, in large compositional models (Dirichlet, Aitchison), LM converges reliably and rapidly where Newton–Raphson or fixed-point iteration may stall or diverge (Giordan et al., 2014). In neural networks and PINNs, LM attains lower final losses and solution errors than BFGS, SGD, Adam, or L-BFGS, often with fewer iterations and comparable or lower computational cost for moderate problem sizes (Shahab et al., 9 Feb 2026, Pooladzandi et al., 2022). In inverse problems with PDEs, LM with singular scaling solves ill-posed parameter identification problems with higher accuracy and fewer iterations than unregularized alternatives (Boos et al., 2023).

References

  • "On the maximization of likelihoods belonging to the exponential family using ideas related to the Levenberg-Marquardt approach" (Giordan et al., 2014)
  • "The q-Levenberg-Marquardt method for unconstrained nonlinear optimization" (Protic et al., 2021)
  • "A robust method based on LOVO functions for solving least squares problems" (Castelani et al., 2019)
  • "Designing trajectories in the Earth-Moon system: a Levenberg-Marquardt approach" (Nunes et al., 21 Oct 2025)
  • "Convergence and Complexity Analysis of a Levenberg-Marquardt Algorithm for Inverse Problems" (Bergou et al., 2020)
  • "Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization" (Transtrum et al., 2012)
  • "Gauss-Newton Filtering incorporating Levenberg-Marquardt Methods for Radar Tracking" (Nadjiasngar et al., 2011)
  • "Modified Levenberg-Marquardt Algorithm For Tensor CP Decomposition in Image Compression" (Karim et al., 2024)
  • "Levenberg-Marquardt method and partial exact penalty parameter selection in bilevel optimization" (Tin et al., 2021)
  • "Robust and Efficient Optimization Using a Marquardt-Levenberg Algorithm with R Package marqLevAlg" (Philipps et al., 2020)
  • "Levenberg-Marquardt method with Singular Scaling and applications" (Boos et al., 2023)
  • "Quadratic Convergence of Levenberg-Marquardt Method for Elliptic and Parabolic Inverse Robin Problems" (Daijun et al., 2015)
  • "Accelerated-gradient-based generalized Levenberg--Marquardt method with oracle complexity bound and local quadratic convergence" (Marumo et al., 2022)
  • "Do physics-informed neural networks (PINNs) need to be deep? Shallow PINNs using the Levenberg-Marquardt algorithm" (Shahab et al., 9 Feb 2026)
  • "A Nonmonotone Matrix-Free Algorithm for Nonlinear Equality-Constrained Least-Squares Problems" (Bergou et al., 2020)
  • "Improving Levenberg-Marquardt Algorithm for Neural Networks" (Pooladzandi et al., 2022)
  • "A Levenberg-Marquardt algorithm for sparse identification of dynamical systems" (Haring et al., 2022)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Levenberg-Marquardt Optimization Algorithm.