Papers
Topics
Authors
Recent
Search
2000 character limit reached

MP-LBFGS: Efficient Optimization for FBPINNs

Updated 20 January 2026
  • The algorithm leverages local LBFGS steps within FBPINN subdomains to form quasi-Newton corrections, significantly reducing global synchronizations.
  • It aggregates local corrections through a nonlinear subspace minimization, preserving curvature benefits while ensuring global secant consistency.
  • Empirical benchmarks on PDE problems show reduced training epochs and improved accuracy by trading off local computation for fewer communications.

Multi-Preconditioned LBFGS (MP-LBFGS) is an optimization algorithm developed to accelerate and robustify the training of finite-basis physics-informed neural networks (FBPINNs). Inspired by the nonlinear additive Schwarz method from domain decomposition in numerical PDEs, MP-LBFGS exploits the intrinsic additive structure of FBPINNs by constructing parallel, subdomain-local quasi-Newton corrections and then optimally combining them via a nonlinear, low-dimensional subspace minimization. This approach yields a preconditioned search direction that both improves convergence and reduces communication overhead in distributed learning settings (Salvadó-Benasco et al., 13 Jan 2026).

1. FBPINN Architecture and Domain Decomposition

The motivating problem is the solution of a partial differential equation (PDE) P[u](x)=f(x)\mathcal{P}[u](x) = f(x) on a bounded domain ΩRd\Omega \subset \mathbb{R}^d with boundary conditions u=gu = g on Ω\partial \Omega. A physics-informed neural network (PINN) approximates the unknown uu by a global neural network N(θ;x)N(\theta;x), trained to minimize the squared PDE residual over collocation points.

FBPINNs address limitations such as spectral bias by decomposing Ω\Omega into nsn_s overlapping subdomains {Ωj}\{\Omega_j\} (with overlap width δ\delta). Each Ωj\Omega_j receives a local network Nj(θj;x)N_j(\theta_j; x); the global solution is assembled via a smooth partition of unity {wj}\{w_j\} satisfying jwj(x)1\sum_j w_j(x) \equiv 1 and suppwjΩj{\rm supp}\,w_j \subset \Omega_j: N(θ;x)=j=1nswj(x)Nj(θj;normj(x)),θ=(θ1,...,θns)Rp, p=jpj.N(\theta;x) = \sum_{j=1}^{n_s} w_j(x) N_j(\theta_j; \mathrm{norm}_j(x)), \qquad \theta = (\theta_1, ..., \theta_{n_s}) \in \mathbb{R}^p,\ p = \sum_j p_j. The normalization normj\mathrm{norm}_j maps local coordinates to (1,1)d(-1, 1)^d. Collocation data and model parameters thus admit natural subdomain-local partitioning, facilitating parallelism.

2. Nonlinear Additive Schwarz Motivation

In classical PDE solvers, nonlinear additive Schwarz preconditioning accelerates Newton/quasi-Newton methods by performing local solves on each subdomain and then aggregating the results. MP-LBFGS transposes this methodology: each subnetwork solves a local training subproblem using several LBFGS steps, generating a local correction. Aggregation is enforced by a right-preconditioned LBFGS step that ensures global secant consistency.

The full-space optimality condition L(θ)=0\nabla L(\theta) = 0 is reformulated as

F(θ):=L(P(θ))=0,\mathcal{F}(\theta) := \nabla L(P(\theta)) = 0,

with the nonlinear additive Schwarz map

P(θ)=θ+j=1nsRj(θjRjθ),P(\theta) = \theta + \sum_{j=1}^{n_s} R_j^\top (\theta_j^* - R_j\theta),

where RjR_j restricts global parameters to subdomain jj, and θj\theta_j^* approximates the local minimum via local LBFGS updates. This induces a lifted, preconditioned variable on which global LBFGS is performed, retaining curvature benefits while reducing communication frequency.

3. Algorithmic Components of MP-LBFGS

3.1 Local LBFGS Corrections

At each global iteration kk, the global parameters θ(k)\theta^{(k)} are distributed to each subdomain, yielding θj(k)=Rjθ(k)\theta_j^{(k)} = R_j \theta^{(k)}. Each local network minimizes its loss LjL_j by performing η\eta LBFGS steps: sj(k,q)=θj(k,q+1)θj(k,q),yj(k,q)=Lj(θj(k,q+1))Lj(θj(k,q)),s_j^{(k, q)} = \theta_j^{(k, q+1)} - \theta_j^{(k, q)}, \quad y_j^{(k, q)} = \nabla L_j(\theta_j^{(k, q+1)}) - \nabla L_j(\theta_j^{(k, q)}), q=0,...,η1q = 0, ..., \eta - 1, resulting in an approximate minimizer θj(k,)=θj(k,η)\theta_j^{(k,*)} = \theta_j^{(k, \eta)} and local correction cj(k)=θj(k,)θj(k)c_j^{(k)} = \theta_j^{(k,*)} - \theta_j^{(k)}.

3.2 Subspace Aggregation

Local corrections are collected as

C(k)=[R1c1(k),...,Rnscns(k)]Rp×ns.C^{(k)} = [ R_1^\top c_1^{(k)}, ... , R_{n_s}^\top c_{n_s}^{(k)} ] \in \mathbb{R}^{p \times n_s}.

Coefficients β=(β1,...,βns)\beta = (\beta_1, ..., \beta_{n_s}) are sought such that d(k)=C(k)βd^{(k)} = C^{(k)}\beta minimizes the overall objective. While linear preconditioning would suggest

minβj=1nsβj(Hj(k))1g(k)g(k)2,jβj=1,\min_{\beta} \biggl\| \sum_{j=1}^{n_s} \beta_j (H_j^{(k)})^{-1} g^{(k)} - g^{(k)} \biggr\|^2, \quad \sum_j \beta_j = 1,

robustness in the neural setting is enhanced by instead solving the nonlinear subspace minimization

minβRnsϕ(β):=L(θ(k)+C(k)β).\min_{\beta \in \mathbb{R}^{n_s}} \phi(\beta) := L(\theta^{(k)} + C^{(k)}\beta).

This problem is efficiently solved via a few (damped) Newton steps; the Hessian (C(k))2L(θ(k))C(k)(C^{(k)})^\top \nabla^2 L(\theta^{(k)}) C^{(k)} is only ns×nsn_s \times n_s and thus computationally cheap.

3.3 Global LBFGS Step

The updated parameter is θ~(k)=θ(k)+C(k)β\tilde{\theta}^{(k)} = \theta^{(k)} + C^{(k)}\beta^*, with new gradient g~=L(θ~(k))\tilde{g} = \nabla L(\tilde{\theta}^{(k)}). The global secant pair (s(k),y(k))(s^{(k)}, y^{(k)}) is then formed: s(k)=θ~(k)θ(k),y(k)=g~g(k).s^{(k)} = \tilde{\theta}^{(k)} - \theta^{(k)},\qquad y^{(k)} = \tilde{g} - g^{(k)}. The global LBFGS memory is updated, the descent direction dLBFGS(k)=B(k)1g~d_{\text{LBFGS}}^{(k)} = - B^{(k)^{-1}} \tilde{g} is computed with standard recursion, and a Wolfe-condition line search yields

θ(k+1)=θ~(k)+α(k)dLBFGS(k).\theta^{(k+1)} = \tilde{\theta}^{(k)} + \alpha^{(k)} d_{\text{LBFGS}}^{(k)}.

4. Iterative Structure and Communication Patterns

The MP-LBFGS outer iteration proceeds as follows:

Step Operation Communication Pattern
1 Compute L(θ(k))\nabla L(\theta^{(k)}) Global all-reduce
2 Parallel local LBFGS None; fully local
3 Aggregate local corrections Minimal (local to global)
4 Subspace minimization Fully local or negligible
5-10 Global LBFGS/line search Standard (as in vanilla LBFGS)

Relative to traditional LBFGS, MP-LBFGS reduces the number of global synchronizations by a factor η\sim \eta, replacing some forward/backward passes over the full network with local computations.

5. Computational Complexity per Outer Iteration

Let pp be the total parameter count, pjp_j per subdomain, QQ the global LBFGS memory size, and η\eta the number of local steps:

  • Forward/backward passes: $1$ global gradient eval plus #ls\#\text{ls} extra loss evals; η\eta local gradients per subdomain in parallel. Subspace minimization involves a few parallel evaluations.
  • LBFGS recursion: O(pQ)O(p Q) flops and O(pQ)O(p Q) memory for global step; O(pjQj)O(p_j Q_j) per local memory.
  • Communication: One all-reduce per global gradient and per line search; MP-LBFGS thus reduces the number of synchronizations by η\sim \eta compared to standard LBFGS.

This configuration enables a tradeoff between local computational work and synchronization frequency.

6. Empirical Performance and Benchmarks

Numerical experiments on 1D and 2D Poisson equations and the 2D time-dependent Burgers' equation used FBPINN configurations with four-layer ResNets (width 20), 2,000–20,000 Hammersley-sampled collocation points, and various uniform decompositions (2×22\times 2, 4×24\times 2, 3×33\times 3).

Scaling strategies compared:

  • Uniform scaling (UniS)
  • Line-search scaling (LSS)
  • Subspace minimization (SPM)

Key empirical findings:

  • SPM was most robust and stable as nsn_s increased.
  • MP-LBFGS with SPM reduced the required outer epochs (global synchronizations) by up to an order of magnitude compared with standard LBFGS.
  • Total per-device gradient work was comparable or lower; final validation error in L2L^2 norm was often an order of magnitude smaller.
  • Increasing local LBFGS steps η\eta further reduced synchronizations, trading increased local computation.

7. Practical Recommendations and Insights

The nonlinear Schwarz-type preconditioner enabled by the FBPINN decomposition allows extensive local computation, substantially reducing communication overhead in distributed settings. Subspace minimization is critical for stability: uniform or sequential line searches fail to scale as the number of subdomains increases.

Implementation recommendations:

  • Tune η\eta (local steps), QQ (memory size), and nsn_s (number of subdomains) to balance local computation against synchronization.
  • Retain separate local and global LBFGS memories to avoid mixing curvature information.
  • Precompute and cache small Hessian blocks (C2LC)(C^\top \nabla^2 L\,C) to expedite subspace Newton solves.
  • MP-LBFGS can be integrated into existing FBPINN codebases by wrapping local training in an outer loop, collecting corrections, solving the nsn_s-dimensional subspace problem, then applying a standard global LBFGS update.

In summary, MP-LBFGS leverages the additive decomposition of FBPINNs to perform parallel, local quasi-Newton updates, followed by a global preconditioned update whose direction is defined by a small, nonlinear subspace optimization. This mechanism accelerates convergence, lowers wall-clock time in distributed environments, and can improve final model accuracy relative to standard LBFGS (Salvadó-Benasco et al., 13 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Preconditioned LBFGS (MP-LBFGS).