Newton-Kaczmarz Method for Nonlinear Systems

Updated 28 December 2025

Newton-Kaczmarz method is an iterative algorithm that linearizes and projects individual nonlinear equations to achieve efficient convergence.
It employs a row-wise update with a relaxation parameter to tackle overdetermined and ill-conditioned systems without needing a full Jacobian.
The approach supports parallelization on disjoint parameter subsets and demonstrates robust, potentially superlinear convergence in high dimensions.

The Newton-Kaczmarz method is an iterative algorithm for solving systems of nonlinear equations, leveraging the strengths of both the classical Newton and Kaczmarz approaches. In its canonical form, it addresses the solution of $f(Z) = 0$ or $f(x) = 0$ for vector-valued, generally nonlinear $f$ , by performing sequential (row-wise) Newton-type updates per component equation, rather than processing all equations simultaneously. This methodology enables efficient memory usage, admits straightforward parallelization, and exhibits robust convergence properties, particularly in overdetermined and ill-conditioned problem settings. Several recent works rigorously analyze the Newton-Kaczmarz method and generalizations, establish sharp convergence guarantees, and demonstrate effectiveness on high-dimensional nonlinear least squares, Kolmogorov-Arnold representation learning, and structured inverse problems (Poluektov et al., 2023, Gower et al., 2023).

1. Mathematical Framework and Iteration Rule

Consider a system of nonlinear equations $f(Z) = [f_1(Z), \ldots, f_N(Z)]^\top = 0$ with parameter vector $Z \in \mathbb{R}^r$ . The Newton-Kaczmarz method operates by iteratively projecting the current iterate onto the hyperplane defined by the linearization of a single residual equation at the current point. For step $q$ , let $f_i(Z^q)$ denote the $i$ -th residual and $\nabla f_i(Z^q)$ its gradient. The standard update is

$Z^{q+1} = Z^q - \mu\,\frac{f_i(Z^q)}{\|\nabla f_i(Z^q)\|^2}\, \nabla f_i(Z^q)$

with a relaxation parameter $\mu \in (0,2)$ . This constructs a projection of $Z^q$ onto the hyperplane $\{Z : f_i(Z^q) + \nabla f_i(Z^q)^\top (Z - Z^q) = 0\}$ . The process generally cycles or randomly selects over $i = 1, \dots, N$ . This paradigm can be characterized as a row-by-row Newton method, where each step enforces agreement with a single example or measurement (Poluektov et al., 2023, Gower et al., 2023).

2. Derivation and Relationship to Other Methods

The Newton-Kaczmarz step is derived by linearizing a single equation $f_i(Z)$ about $Z^q$ , replacing $f_i(Z^{q+1}) \approx f_i(Z^q) + \nabla f_i(Z^q)^\top (Z^{q+1}-Z^q)$ , and finding the $\Delta Z$ that zeroes this linearization. This gives the minimal correction in the Euclidean norm sense that enforces the $i$ -th constraint to first order. The $\mu$ parameter controls relaxation and numerical stability. For $\mu=1$ this corresponds to the pure orthogonal projection; $\mu<1$ increases robustness to poorly scaled gradients or ill-conditioning.

Viewed through the lens of Bregman projections, the Newton-Kaczmarz iteration is the special case of nonlinear Kaczmarz in the Euclidean metric, as established in (Gower et al., 2023). In the general setting, one may perform mirror descent via Bregman divergence, with an adaptive step size determined by a one-dimensional convex subproblem. For Euclidean projections and unconstrained problems, these generalizations coincide with the classical Newton-Kaczmarz routine.

3. Algorithmic Workflow and Parallelization

The canonical sequential Newton-Kaczmarz algorithm iterates as follows:

Given data $\{(X^i, y_i)\}_{i=1}^N$ , initialize $Z^0$ .
For each $i \in 1,\dots,N$ $i \in 1, \dots, N$ (in cyclic or random order):
- Compute $\hat{y}_i = \hat{y}_i(Z^q)$ (model output).
- Evaluate residual $f_i = \hat{y}_i - y_i$ .
- Calculate gradient $v = \nabla f_i(Z^q)$ .
- Update $Z^{q+1} = Z^q - \mu (f_i/\|v\|^2) v$ if $\|v\|^2$ is sufficiently large.
Test for convergence based on residuals or parameter change.

Parallelization is highly efficient: if two equations $f_i$ and $f_j$ depend on disjoint sets of parameters, their projections can be performed simultaneously without write-conflicts. In practice, parallel (batched) updates require synchronization, locking, or coordination over shared parameter subsets. The memory footprint is modest, as only the parameter vector $Z$ and basis function supports are maintained; neither the full Jacobian nor Hessian is required (Poluektov et al., 2023).

4. Convergence Properties and Robustness

Under standard conditions— $f_i$ continuously differentiable, $\nabla f_i$ nonvanishing near the solution $Z^*$ —the Newton-Kaczmarz method enjoys local convergence. The expected error typically decreases at least linearly with iterations, with possible superlinear convergence when $f$ exhibits additional smoothness. The relaxation parameter $\mu$ tunes the stability and rate: smaller $\mu$ increases robustness but slows convergence. Empirical evidence demonstrates that Newton-Kaczmarz is less sensitive to the initial guess than the Gauss-Newton method and has a broader basin of attraction under poor initialization (Poluektov et al., 2023).

In the generalized Bregman-Kaczmarz context, convergence theorems are established for convex (nonnegative-interpolation) and locally smooth nonlinear systems. In the convex regime, the method yields monotonic decrease of Bregman distances and sublinear $O(1/k)$ convergence of expected squared residuals. With full-rank Jacobians, linear convergence in Bregman distance is attained in expectation (Gower et al., 2023).

5. Computational Efficiency and Implementation

Each Newton-Kaczmarz iteration incurs computational cost proportional to the number of active parameters for the sampled equation, typically $O(n_i)$ operations for evaluation and gradient computation, and a rank-one update for $Z$ . There is no need to materialize a global Jacobian or its factorizations, in contrast to batch Newton-type or Gauss-Newton methods. This feature enables the method to handle massive datasets efficiently and is well-suited for high-dimensional regression, system identification, and function representation problems (Poluektov et al., 2023).

Initialization and regularization play a significant role: parameter vector $Z^0$ is often chosen via uniform randomization over the feasible output range, and relaxation parameter $\mu$ is generally initialized at 1 and reduced if the iterates exhibit oscillation. Stopping criteria typically combine residual thresholds, parameter increments, and a maximum allowable number of sweeps through the data.

6. Numerical Performance and Comparison

Empirical results on the Kolmogorov-Arnold model fitting task demonstrate substantial practical advantages of the Newton-Kaczmarz approach. For a 25-input, 10-million-sample nonlinear regression problem, the method produces solutions in 4-10 minutes, in contrast to the 4-8 hours required by MATLAB's built-in multilayer perceptrons, while yielding higher accuracy, reduced CPU load, and smaller memory requirements. In robustness studies, Newton-Kaczmarz achieves successful recovery from poor initial guesses at frequencies one order of magnitude higher than Gauss-Newton, particularly in high-dimensional and highly nonlinear regimes (Poluektov et al., 2023).

Generalizations to the Bregman-Kaczmarz framework further enable solution of structured systems under sparsity or simplex constraints, with superior convergence in both overdetermined and underdetermined cases. In all examined benchmarks, the exact-step Bregman-Kaczmarz variant exhibits faster residual reduction and improved robustness to ill-conditioned linearizations compared to relaxed or classical Euclidean variants (Gower et al., 2023).

7. Extensions and Research Directions

The Newton-Kaczmarz method provides a foundational algorithmic block for data-driven identification of nonlinear systems, parametric function learning, and inverse problems in high dimensions. Current research extends the basic method to distributed architectures, block-iterative projections, and Bregman-divergence based generalizations for incorporating constraints and promoting solution structure (e.g., sparsity, simplex membership). The literature demonstrates efficacy not only for function approximation (Kolmogorov-Arnold models) but also for constraint systems arising in scientific computing, sparse quadratic equations, and simplex-constrained decompositions. A plausible implication is that further integration with adaptive sampling, probabilistic selection rules, and advanced relaxation schemes could enhance both convergence theory and practical scalability (Gower et al., 2023, Poluektov et al., 2023).

Markdown Report Issue Upgrade to Chat

References (2)

Construction of the Kolmogorov-Arnold representation using the Newton-Kaczmarz method (2023)

A Bregman-Kaczmarz method for nonlinear systems of equations (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Newton-Kaczmarz Method.