Trust-Region Memory Updates

Updated 2 February 2026

Trust-region memory updates are algorithmic strategies that combine trust-region methods with memory-efficient quasi-Newton techniques to achieve robust second-order optimization in high dimensions.
They employ limited-memory representations like L-BFGS and spectral projections to solve shifted linear systems efficiently while maintaining positive definiteness and convergence guarantees.
These updates are vital for applications in reinforcement learning, PINNs, and sparse recovery, offering enhanced numerical stability and computational efficiency over full Hessian methods.

Trust-region memory updates refer to algorithmic strategies that couple trust-region optimization methods with memory-efficient quasi-Newton or curvature-aggregation techniques, enabling large-scale and robust second-order optimization. These mechanisms are critical in large-dimensional unconstrained and constrained optimization, reinforcement learning, and scientific machine learning settings where forming or inverting the full Hessian is computationally prohibitive. Trust-region memory updates facilitate the efficient solution of trust-region subproblems by utilizing limited-memory representations and update rules, while maintaining stability, accuracy, and convergence guarantees.

1. Foundations of Trust-Region Memory Updates

Classical trust-region methods iteratively solve subproblems of the form

$\min_{s}\ q(s) = g_k^T s + \tfrac{1}{2} s^T B_k s\quad \text{subject to}\quad \|s\| \leq \Delta_k,$

where $g_k = \nabla f(x_k)$ , $B_k$ is a Hessian or an approximation, and $\Delta_k$ is the trust-region radius. Due to the cost of assembling and manipulating $B_k$ , especially for large $n$ , memory-efficient approximations such as limited-memory BFGS (L-BFGS) or low-rank plus shift formats are employed.

Trust-region memory updates augment these schemes by (i) deriving compact recursions for shifted linear systems $(B_k + \sigma I) x = y$ (Erway et al., 2011), (ii) resetting or projecting curvature information through spectral/nearest-matrix projections (Berglund et al., 2024), or (iii) utilizing memory banks of previous policies or iterates as in policy optimization (Le et al., 2022).

2. Limited-Memory Recursive Solvers for Trust-Region Subproblems

The shifted linear systems central to the trust-region subproblem arise from the Moré–Sorensen optimality conditions. When $B_k$ is an L-BFGS matrix, the canonical two-loop recursion efficiently computes $B_k^{-1} z$ for arbitrary $z$ in $O(Mn)$ , but does not address shifted systems: $(B_k + \sigma I) s = -g_k,$ where $\sigma$ is a parameter determined (e.g., by Newton’s method) such that $\|s\| = \Delta_k$ .

Erway & Marcia developed a diagonal-update recursion for $(B_k + \sigma I)^{-1} z$ that views $B_k + \sigma I$ as a base matrix plus a sequence of rank-one updates, enabling a recursive application of the Sherman–Morrison–Woodbury formula (Erway et al., 2011). This approach maintains $O(M^2 n)$ complexity—optimal for $n \gg M$ —and preserves positive-definiteness under mild conditions. The recursion only involves vector inner products and can be implemented efficiently in high-level or low-level languages using standard linear algebra primitives.

3. Memory Update Strategies and Representations

Several trust-region memory update paradigms emerge in the literature:

L-BFGS Memory with Trust-Region Control: L-BFGS stores the latest $M$ $(s_i, y_i)$ pairs, and recursive strategies enable maintaining and applying approximate inverse Hessians during subproblem solves. Successful steps update the memory; rejected steps do not (Adhikari et al., 2016, Luo et al., 2020, Aravkin et al., 2021).
Projection-Based Low-Rank Updates: Instead of canonical L-BFGS, the Hessian approximation can be constructed via a two-stage process: (i) perform a Broyden-class update; (ii) project the result onto the class of limited-memory (low-rank + shift) matrices via a nearest-matrix problem in a unitarily invariant norm (e.g., Frobenius or $l^2$ ), or a Stein divergence (Berglund et al., 2024). Storage and update are performed in terms of eigenvalue decompositions, allowing for efficient solutions of the trust-region subproblem via the spectral representation.
Memory in Reinforcement Learning Trust Regions: In memory-constrained policy optimization (MCPO), memory buffers of previous policies are used to define a “virtual trust region,” with the update objective incorporating KL-divergence to both the latest policy and a convex combination (via a learned attention mechanism) of stored prior policies (Le et al., 2022). The weighting between current and virtual trust regions is dynamically adjusted based on advantage-weighted returns, enhancing robustness when recent policies perform poorly.

4. Algorithmic Procedures and Recursion Details

Below is a tabulation of core update mechanisms:

Method/Reference	Curvature Update	Memory Format	Trust-Region Step
Erway & Marcia (Erway et al., 2011)	Diagonal-update L-BFGS (SMW recursion)	Last $M$ pairs $(s_i, y_i)$	Recursively solve $(B+\sigma I) x = y$
Projected Quasi-Newton (Berglund et al., 2024)	Broyden-class + spectral projection	Shift + eigenbasis + eigenvalues	Direct spectral solution to TR subproblem
MCPO (Le et al., 2022)	Policy memory buffer	$N$ past policies, attention weights	KL divergence to memory-derived “virtual” policy

The recursion from (Erway et al., 2011) computes $(B_k + \sigma I)^{-1}z$ as:

Initialize $C_0^{-1} = (\gamma + \sigma)^{-1} I$ , $r = C_0^{-1} z$ .
For $j = 0,\dots,2k-1$ , recursively update intermediate vectors and correct $r$ using inner products and low-rank terms.

For projection-based updates (Berglund et al., 2024), the eigenstructure is updated after a Broyden step and projected back into the limited-memory constraint set, allowing for efficient subproblem solutions and memory resets compatible with curvature and trust-region constraints.

5. Integration with Nonlinear Constraints and Nonsmooth Terms

Recent advances incorporate trust-region memory update strategies into constrained and nonsmooth optimization:

trSQP-PINN applies a trust-region Sequential Quadratic Programming algorithm to PINN problems, where a quasi-Newton memory is used to approximate the Lagrangian Hessian, and a trust-region radius is adaptively updated via a soft-penalty merit function (Cheng et al., 2024). Quasi-Newton memory facilitates efficient curvature updates (using damped BFGS or SR1), ensuring regularization of search directions in ill-conditioned regions.
Proximal Trust-Region Quasi-Newton methods for nonsmooth composite problems maintain limited-memory curvature for the smooth term and couple it with a proximal term (for the nonsmooth part) within the trust-region subproblem, updating memory only upon successful steps and employing Powell-type damping in nonconvex regimes (Aravkin et al., 2021).

6. Stability Guarantees and Practical Performance

Trust-region memory updates possess key stability and efficiency properties:

Positive definiteness: The use of a trust-region (i.e., shift $\sigma > 0$ ) and the conditions on inner products ( $y_i^T s_i > \varepsilon$ ) maintain positive-definiteness throughout the update sequence (Erway et al., 2011).
Numerical stability: Recursive updates use only vector operations, minimizing roundoff error and permitting robust implementation for moderate $M$ .
Empirical efficiency: Across large-scale regression, sparse recovery, and control problems, limited-memory trust-region schemes outperform classical line-search and full-memory BFGS, requiring fewer stored vectors and trust-region iterations (Adhikari et al., 2016, Luo et al., 2020, Berglund et al., 2024); in policy optimization and PINNs, memory-based trust-region updates yield superior sample efficiency, resilience against poor local minima, and tolerance to ill-conditioning (Le et al., 2022, Cheng et al., 2024).

7. Extensions and Domain-Specific Adaptations

Memory update strategies have been adapted to diverse domains:

Sparse relaxation: Efficient removal of spurious solutions and better computational scaling in LASSO-type problems (Adhikari et al., 2016).
Physics-informed neural networks: trSQP-PINN leverages hard-constrained trust-region updates and quasi-Newton memory to overcome ill-conditioning endemic to penalty-based losses, showing two to three orders of magnitude error improvements (Cheng et al., 2024).
Deep RL: Memory-constrained policy optimization dynamically constructs a trust region from historical policy memory, enabling robust progress in sparse-reward and challenging environments (Le et al., 2022).

A plausible implication is the emergence of hybrid algorithms that combine projection-based curvature resetting, trust-region subproblem structure, and dynamic memory management to achieve scalability and stability across increasingly complex optimization landscapes.