MP-LBFGS: Efficient Optimization for FBPINNs
- The algorithm leverages local LBFGS steps within FBPINN subdomains to form quasi-Newton corrections, significantly reducing global synchronizations.
- It aggregates local corrections through a nonlinear subspace minimization, preserving curvature benefits while ensuring global secant consistency.
- Empirical benchmarks on PDE problems show reduced training epochs and improved accuracy by trading off local computation for fewer communications.
Multi-Preconditioned LBFGS (MP-LBFGS) is an optimization algorithm developed to accelerate and robustify the training of finite-basis physics-informed neural networks (FBPINNs). Inspired by the nonlinear additive Schwarz method from domain decomposition in numerical PDEs, MP-LBFGS exploits the intrinsic additive structure of FBPINNs by constructing parallel, subdomain-local quasi-Newton corrections and then optimally combining them via a nonlinear, low-dimensional subspace minimization. This approach yields a preconditioned search direction that both improves convergence and reduces communication overhead in distributed learning settings (Salvadó-Benasco et al., 13 Jan 2026).
1. FBPINN Architecture and Domain Decomposition
The motivating problem is the solution of a partial differential equation (PDE) on a bounded domain with boundary conditions on . A physics-informed neural network (PINN) approximates the unknown by a global neural network , trained to minimize the squared PDE residual over collocation points.
FBPINNs address limitations such as spectral bias by decomposing into overlapping subdomains (with overlap width ). Each receives a local network ; the global solution is assembled via a smooth partition of unity satisfying and : The normalization maps local coordinates to . Collocation data and model parameters thus admit natural subdomain-local partitioning, facilitating parallelism.
2. Nonlinear Additive Schwarz Motivation
In classical PDE solvers, nonlinear additive Schwarz preconditioning accelerates Newton/quasi-Newton methods by performing local solves on each subdomain and then aggregating the results. MP-LBFGS transposes this methodology: each subnetwork solves a local training subproblem using several LBFGS steps, generating a local correction. Aggregation is enforced by a right-preconditioned LBFGS step that ensures global secant consistency.
The full-space optimality condition is reformulated as
with the nonlinear additive Schwarz map
where restricts global parameters to subdomain , and approximates the local minimum via local LBFGS updates. This induces a lifted, preconditioned variable on which global LBFGS is performed, retaining curvature benefits while reducing communication frequency.
3. Algorithmic Components of MP-LBFGS
3.1 Local LBFGS Corrections
At each global iteration , the global parameters are distributed to each subdomain, yielding . Each local network minimizes its loss by performing LBFGS steps: , resulting in an approximate minimizer and local correction .
3.2 Subspace Aggregation
Local corrections are collected as
Coefficients are sought such that minimizes the overall objective. While linear preconditioning would suggest
robustness in the neural setting is enhanced by instead solving the nonlinear subspace minimization
This problem is efficiently solved via a few (damped) Newton steps; the Hessian is only and thus computationally cheap.
3.3 Global LBFGS Step
The updated parameter is , with new gradient . The global secant pair is then formed: The global LBFGS memory is updated, the descent direction is computed with standard recursion, and a Wolfe-condition line search yields
4. Iterative Structure and Communication Patterns
The MP-LBFGS outer iteration proceeds as follows:
| Step | Operation | Communication Pattern |
|---|---|---|
| 1 | Compute | Global all-reduce |
| 2 | Parallel local LBFGS | None; fully local |
| 3 | Aggregate local corrections | Minimal (local to global) |
| 4 | Subspace minimization | Fully local or negligible |
| 5-10 | Global LBFGS/line search | Standard (as in vanilla LBFGS) |
Relative to traditional LBFGS, MP-LBFGS reduces the number of global synchronizations by a factor , replacing some forward/backward passes over the full network with local computations.
5. Computational Complexity per Outer Iteration
Let be the total parameter count, per subdomain, the global LBFGS memory size, and the number of local steps:
- Forward/backward passes: $1$ global gradient eval plus extra loss evals; local gradients per subdomain in parallel. Subspace minimization involves a few parallel evaluations.
- LBFGS recursion: flops and memory for global step; per local memory.
- Communication: One all-reduce per global gradient and per line search; MP-LBFGS thus reduces the number of synchronizations by compared to standard LBFGS.
This configuration enables a tradeoff between local computational work and synchronization frequency.
6. Empirical Performance and Benchmarks
Numerical experiments on 1D and 2D Poisson equations and the 2D time-dependent Burgers' equation used FBPINN configurations with four-layer ResNets (width 20), 2,000–20,000 Hammersley-sampled collocation points, and various uniform decompositions (, , ).
Scaling strategies compared:
- Uniform scaling (UniS)
- Line-search scaling (LSS)
- Subspace minimization (SPM)
Key empirical findings:
- SPM was most robust and stable as increased.
- MP-LBFGS with SPM reduced the required outer epochs (global synchronizations) by up to an order of magnitude compared with standard LBFGS.
- Total per-device gradient work was comparable or lower; final validation error in norm was often an order of magnitude smaller.
- Increasing local LBFGS steps further reduced synchronizations, trading increased local computation.
7. Practical Recommendations and Insights
The nonlinear Schwarz-type preconditioner enabled by the FBPINN decomposition allows extensive local computation, substantially reducing communication overhead in distributed settings. Subspace minimization is critical for stability: uniform or sequential line searches fail to scale as the number of subdomains increases.
Implementation recommendations:
- Tune (local steps), (memory size), and (number of subdomains) to balance local computation against synchronization.
- Retain separate local and global LBFGS memories to avoid mixing curvature information.
- Precompute and cache small Hessian blocks to expedite subspace Newton solves.
- MP-LBFGS can be integrated into existing FBPINN codebases by wrapping local training in an outer loop, collecting corrections, solving the -dimensional subspace problem, then applying a standard global LBFGS update.
In summary, MP-LBFGS leverages the additive decomposition of FBPINNs to perform parallel, local quasi-Newton updates, followed by a global preconditioned update whose direction is defined by a small, nonlinear subspace optimization. This mechanism accelerates convergence, lowers wall-clock time in distributed environments, and can improve final model accuracy relative to standard LBFGS (Salvadó-Benasco et al., 13 Jan 2026).