Papers
Topics
Authors
Recent
Search
2000 character limit reached

Stochastic Trust-Region Methods

Updated 28 January 2026
  • Stochastic trust-region methods are algorithms that employ probabilistic model accuracy and dynamically adjusted radii to optimize noisy, nonconvex objectives.
  • They construct local stochastic models using subsampled gradients and Hessians, ensuring efficient progress in large-scale machine learning and simulation tasks.
  • These methods integrate variance reduction and adaptive parameter tuning to achieve robust convergence guarantees and sample-efficient performance.

Stochastic trust-region methods are a family of algorithms for stochastic optimization that combine trust-region principles and model-based iteration with probabilistic control over the accuracy of subsampled gradients, Hessians, or function-value estimates. Such methods are designed to address large-scale, nonconvex, and possibly noisy objectives, as frequently encountered in machine learning, simulation optimization, and modern inverse problems. The defining feature is the use of dynamically sized trust regions constrained by random, inexact models of the objective, along with sampling-based or probabilistic mechanisms for model construction, step acceptance, and radius adaptation.

1. Foundational Concepts and Problem Settings

Stochastic trust-region methods have been developed to generalize classical trust-region strategies to settings in which only noisy, sampled, or approximate information about the objective and its derivatives is available. The canonical problem is an unconstrained minimization:

minxRdf(x)=1Ni=1Nfi(x)\min_{x \in \mathbb{R}^d} f(x) = \frac{1}{N} \sum_{i=1}^N f_i(x)

where each fif_i is C1C^1 and ff is bounded below. This encompasses both empirical risk minimization (finite-sum) and stochastic expectation forms (e.g., f(x)=Eξ[F(x,ξ)]f(x) = \mathbb{E}_\xi[F(x,\xi)]). The stochastic trust-region framework can also be extended to constrained (Fang et al., 2024, Fang et al., 2022), composite nonsmooth (Baraldi et al., 3 Oct 2025), multiobjective (Krejić et al., 10 Jan 2025), and minimax (Gao et al., 16 Sep 2025) settings.

Key to all variants is the iteration-wise construction of a local, stochastic model mk(p)m_k(p) of ff near xkx_k, the solution of a subproblem (often quadratic, subject to pΔk\|p\| \leq \Delta_k), and the adaptation of the trust-region radius Δk\Delta_k based on model quality and progress. Models are built using stochastic gradients, Hessians, or interpolation/regression fitted to noisy samples.

2. Algorithmic Structures and Representative Methods

A broad taxonomy of stochastic trust-region methods includes:

  • Model-based trust-region algorithms with probabilistic accuracy: Algorithms like STORM construct random models mkm_k (quadratic in pp) whose first- and second-order Taylor expansions are “fully linear” with fixed high probability. Acceptance of a trial step is based on the ratio

ρk=actual reductionpredicted reduction\rho_k = \frac{\text{actual reduction}}{\text{predicted reduction}}

using noisy function estimates (Chen et al., 2015).

  • Variance-reduced trust-region algorithms: TRSVR and TR-SVR combine stochastic trust-region updates with variance-reduced gradient estimators, typically in SVRG style, to accelerate convergence and improve sample complexity. The trust-region radius is adaptively proportional to the norm of the variance-reduced gradient (Fang et al., 21 Jan 2026, Zheng, 2024).
  • Second-order and inexact Newton trust-region methods: Methods like STRON and the stochastic second-order TRish employ subsampled or stochastic Hessians, often using conjugate gradient solvers for the subproblem, and in some cases incorporate curvature or negative curvature directions for faster convergence and escape from saddle points (Chauhan et al., 2018, Curtis et al., 2019).
  • Radius adaptation and probabilistic model control: Algorithms such as STRME determine the trust-region radius as δk=μkgk\delta_k = \mu_k \|g_k\|, with μk\mu_k and acceptance thresholds updated via stochastic criteria (Wang et al., 2019). In “trust-region-ish” (TRish) variants, piecewise rules based on the gradient norm control the “effective radius” without classical acceptance-rejection (Curtis et al., 2017, Bellavia et al., 2024).
  • Derivative-free and random-subspace trust-region methods: STARS confines model fitting and subproblem solving to a low-dimensional random subspace, substantially reducing per-iteration cost and making derivative-free stochastic trust-region optimization scalable (Dzahini et al., 2022).
  • Bi-fidelity and composite extensions: Methods such as ASTRO-BFDF leverage low-fidelity surrogates for variance reduction and reduced sample cost, while ProxSTORM extends the trust-region paradigm to composite functions with possibly nonsmooth convex regularizers (Ha et al., 2024, Baraldi et al., 3 Oct 2025).

These methods share a structure of iterative model construction, subproblem solution under a trust-region constraint, and adaptively controlled radii, with step acceptance (and possibly model sample size) governed by probabilistic reduction or improvement tests.

3. Mathematical Models, Variance Reduction, and Subproblem Formulation

Stochastic trust-region methods hinge on the formulation of the model mk(p)m_k(p) and the subproblem constraints. The general model is

mk(p)=gkTp+12pTHkpm_k(p) = g_k^T p + \frac{1}{2} p^T H_k p

subject to pΔk\|p\| \leq \Delta_k

where gkg_k is a stochastic (mini-batch or variance-reduced) estimator of the gradient, and HkH_k is a stochastic Hessian, a diagonal/BB quasi-Newton approximation, or simply the identity. In several advanced variants, HkH_k may depend on gkg_k (“gradient-dependent Hessian”) and may be indefinite (Fang et al., 21 Jan 2026).

Variance-reduced gradient estimators of SVRG type are central in high-accuracy, low-variance methods:

gk,s=1biIk,s[fi(xk,s)fi(xk,0)]+gk,0g_{k,s} = \frac{1}{b} \sum_{i \in I_{k,s}} [\nabla f_i(x_{k,s}) - \nabla f_i(x_{k,0})] + g_{k,0}

where gk,0g_{k,0} is the full gradient at a reference point (Fang et al., 21 Jan 2026, Zheng, 2024).

In derivative-free or bandit settings, mk(p)m_k(p) may be a regression/interpolation model fitted to noisy zeroth-order data, possibly in a random subspace (Dzahini et al., 2022). In composite or nonsmooth optimization, the model incorporates the proximal mapping of a convex term (Baraldi et al., 3 Oct 2025).

Radius rules are diverse, but prominent forms are:

  • Δk=αgk\Delta_k = \alpha \|g_k\| (gradient norm–proportional)
  • Multi-zone piecewise rules depending on gk\|g_k\| (TRish: normalized when gk\|g_k\| is moderate, scaled otherwise) (Curtis et al., 2017, Bellavia et al., 2024).

4. Theoretical Convergence and Complexity Analysis

The central theoretical contributions establish global convergence in expectation or almost surely and (when possible) quantitative complexity or sample complexity rates. A representative convergence theorem (using SVRG gradients, as in TRSVR) is:

E[1KSk,sf(xk,s)2]CKSμ0μ1v0\mathbb{E} \left[ \frac{1}{KS} \sum_{k,s} \|\nabla f(x_{k,s})\|^2 \right] \leq \frac{C}{KS \mu_0 \mu_1 v_0}

with γ=2/3\gamma=2/3, total sample complexity O(N+N2/3ϵ1)O(N + N^{2/3}\epsilon^{-1}) to reach ϵ\epsilon-stationarity (Fang et al., 21 Jan 2026). This matches the theoretical best rates of first-order variance-reduced methods.

Classical model-based schemes (e.g., STORM) prove almost-sure convergence to stationary points under high-probability “full-linearity” of models and an adaptive trust-region process. Under these assumptions (including αβ>12\alpha\beta>\frac{1}{2} for model and estimate accuracy), kΔk2<\sum_k \Delta_k^2 < \infty a.s. and xkx_k converges to a point xx^* with f(x)=0\|\nabla f(x^*)\|=0 (Chen et al., 2015).

Second-order methods for nonconvex minimization (STR) achieve an O(n1/2/ϵ1.5)\mathcal{O}(n^{1/2}/\epsilon^{1.5}) stochastic Hessian oracle complexity for finding (ϵ,ϵ)(\epsilon,\sqrt{\epsilon})–approximate local minima, outperforming existing cubic/subsampled cubic approaches (Shen et al., 2019).

In composite, constrained, or multiobjective extensions, analogous Lyapunov or potential function arguments using martingale and renewal-reward arguments underpin global convergence results, possibly to KKT or Pareto–criticality (Baraldi et al., 3 Oct 2025, Krejić et al., 10 Jan 2025, Fang et al., 2024, Fang et al., 2022).

5. Practical Implementation and Parameter Selection

Efficient realization of stochastic trust-region methods requires careful choices of mini-batch size, inner-loop length, radius-control parameters, and subproblem solver tolerance:

  • Mini-batch size bb and inner-loop length SS are tuned to balance per-epoch cost and variance: for dense problems, b100b\sim100–$400$, S100S\sim100–$400$; for high-dimensional sparse data, small bb and large SS are favored. The parameter α\alpha (radius-control) is grid searched (Fang et al., 21 Jan 2026).
  • For STORM and probabilistic model-based methods, accuracy in model fitting and function estimation is typically scaled as O(δk2)O(\delta_k^2) for value (using O(δk4)O(\delta_k^{-4}) samples per iteration); linear-probabilistic accuracy with smaller batch sizes can be achieved under less restrictive conditions (Chen et al., 2015, Wang et al., 2019).
  • Subproblem solvers range from exact or inexact CG (typically 3–20 iterations sufficient), to closed-form updates in first-order (TRish) or diagonal BB steplength methods (Chauhan et al., 2018, Bellavia et al., 2024).
  • Adaptive sampling and bi-fidelity approaches further reduce cost by leveraging low-fidelity surrogates or streaming variance estimates (Ha et al., 2024).

6. Empirical Evaluation and Application Domains

Benchmark suites for stochastic trust-region methods span large-scale logistic regression, SVMs, deep neural network training, reinforcement learning policy optimization, multi-objective learning, and derivative-free black-box optimization:

7. Advances, Limitations, and Research Directions

Stochastic trust-region methodology has advanced significantly in recent years, with key innovations including:

  • Fully stochastic frameworks: Modern algorithms eliminate the need for exact function measurements, full gradients, or deterministic Hessians, enabling scalable deployment on large or simulation-generated datasets (Fang et al., 21 Jan 2026, Chen et al., 2015).
  • Variance-reduction integration: SVRG- and SAGA-type estimators, when combined with trust-region geometry, substantially lower sample complexity and improve asymptotic precision without sacrificing robustness.
  • Adaptive, probabilistic radius and model control: Designs such as Δk=μkgk\Delta_k=\mu_k\|g_k\|, random-subspace models, and renewal-reward-based complexity analysis grant both theoretical guarantees and practical scalability.
  • Extension to diverse settings: The trust-region principle now underpins state-of-the-art algorithms for nonsmooth composite optimization, multi-objective decision-making, derivative-free and high-dimensional settings, and minimax game-theory problems.

Nevertheless, limitations include:

  • The need for probabilistic or variance assumptions for global convergence proofs.
  • Sample sizes for model accuracy still scale as O(δk4)O(\delta_k^{-4}) in nonsmooth, derivative-free, or multiobjective problems.
  • Hyperparameter tuning (radius-scaling, acceptance thresholds, etc.) remains nontrivial, though default regimes are suggested.
  • Second-order (Hessian-based) stochastic approaches, though empirically strong, rely on accurate curvature estimation, which can be expensive in noisy or mini-batch regimes.

Current research is actively developing adaptive sample sizing, non-convex constraint handling, hybrid variance reduction/model-based techniques, and improved theoretical rates under weaker noise assumptions.


References

  • TRSVR: "TRSVR: An Adaptive Stochastic Trust-Region Method with Variance Reduction" (Fang et al., 21 Jan 2026)
  • STORM: "Stochastic Optimization Using a Trust-Region Method and Random Models" (Chen et al., 2015)
  • TRON/STRON: "Stochastic Trust Region Inexact Newton Method for Large-scale Machine Learning" (Chauhan et al., 2018)
  • TR-SVR: "Trust-Region Stochastic Optimization with Variance Reduction Technique" (Zheng, 2024)
  • ProxSTORM: "ProxSTORM -- A Stochastic Trust-Region Algorithm for Nonsmooth Optimization" (Baraldi et al., 3 Oct 2025)
  • SMOP: "SMOP: Stochastic trust region method for multi-objective problems" (Krejić et al., 10 Jan 2025)
  • STRME: "Stochastic Trust Region Methods with Trust Region Radius Depending on Probabilistic Models" (Wang et al., 2019)
  • TRish/BB: "Fully stochastic trust-region methods with Barzilai-Borwein steplengths" (Bellavia et al., 2024), "A Stochastic Trust Region Algorithm Based on Careful Step Normalization" (Curtis et al., 2017)
  • STARS: "Stochastic trust-region algorithm in random subspaces" (Dzahini et al., 2022)
  • ASTRO-BFDF: "Adaptive Sampling-Based Bi-Fidelity Stochastic Trust Region Method" (Ha et al., 2024)
  • TR-SQP-STORM: "Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models" (Fang et al., 2024)
  • SIRTR: "A stochastic first-order trust-region method with inexact restoration" (Bellavia et al., 2021)
  • Policy optimization: "A Stochastic Trust-Region Framework for Policy Optimization" (Zhao et al., 2019)
  • Stochastic minimax: "Trust Region Algorithm for Stochastic Minimax Problems with Decision-Dependent Distributions" (Gao et al., 16 Sep 2025)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Stochastic Trust-Region Method.