Stochastic Trust-Region Methods

Updated 28 January 2026

Stochastic trust-region methods are algorithms that employ probabilistic model accuracy and dynamically adjusted radii to optimize noisy, nonconvex objectives.
They construct local stochastic models using subsampled gradients and Hessians, ensuring efficient progress in large-scale machine learning and simulation tasks.
These methods integrate variance reduction and adaptive parameter tuning to achieve robust convergence guarantees and sample-efficient performance.

Stochastic trust-region methods are a family of algorithms for stochastic optimization that combine trust-region principles and model-based iteration with probabilistic control over the accuracy of subsampled gradients, Hessians, or function-value estimates. Such methods are designed to address large-scale, nonconvex, and possibly noisy objectives, as frequently encountered in machine learning, simulation optimization, and modern inverse problems. The defining feature is the use of dynamically sized trust regions constrained by random, inexact models of the objective, along with sampling-based or probabilistic mechanisms for model construction, step acceptance, and radius adaptation.

1. Foundational Concepts and Problem Settings

Stochastic trust-region methods have been developed to generalize classical trust-region strategies to settings in which only noisy, sampled, or approximate information about the objective and its derivatives is available. The canonical problem is an unconstrained minimization:

$\min_{x \in \mathbb{R}^d} f(x) = \frac{1}{N} \sum_{i=1}^N f_i(x)$

where each $f_i$ is $C^1$ and $f$ is bounded below. This encompasses both empirical risk minimization (finite-sum) and stochastic expectation forms (e.g., $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ ). The stochastic trust-region framework can also be extended to constrained (Fang et al., 2024, Fang et al., 2022), composite nonsmooth (Baraldi et al., 3 Oct 2025), multiobjective (Krejić et al., 10 Jan 2025), and minimax (Gao et al., 16 Sep 2025) settings.

Key to all variants is the iteration-wise construction of a local, stochastic model $m_k(p)$ of $f$ near $x_k$ , the solution of a subproblem (often quadratic, subject to $\|p\| \leq \Delta_k$ ), and the adaptation of the trust-region radius $\Delta_k$ based on model quality and progress. Models are built using stochastic gradients, Hessians, or interpolation/regression fitted to noisy samples.

2. Algorithmic Structures and Representative Methods

A broad taxonomy of stochastic trust-region methods includes:

Model-based trust-region algorithms with probabilistic accuracy: Algorithms like STORM construct random models $f_i$ 0 (quadratic in $f_i$ 1) whose first- and second-order Taylor expansions are “fully linear” with fixed high probability. Acceptance of a trial step is based on the ratio

$f_i$ 2

using noisy function estimates (Chen et al., 2015).

Variance-reduced trust-region algorithms: TRSVR and TR-SVR combine stochastic trust-region updates with variance-reduced gradient estimators, typically in SVRG style, to accelerate convergence and improve sample complexity. The trust-region radius is adaptively proportional to the norm of the variance-reduced gradient (Fang et al., 21 Jan 2026, Zheng, 2024).
Second-order and inexact Newton trust-region methods: Methods like STRON and the stochastic second-order TRish employ subsampled or stochastic Hessians, often using conjugate gradient solvers for the subproblem, and in some cases incorporate curvature or negative curvature directions for faster convergence and escape from saddle points (Chauhan et al., 2018, Curtis et al., 2019).
Radius adaptation and probabilistic model control: Algorithms such as STRME determine the trust-region radius as $f_i$ 3, with $f_i$ 4 and acceptance thresholds updated via stochastic criteria (Wang et al., 2019). In “trust-region-ish” (TRish) variants, piecewise rules based on the gradient norm control the “effective radius” without classical acceptance-rejection (Curtis et al., 2017, Bellavia et al., 2024).
Derivative-free and random-subspace trust-region methods: STARS confines model fitting and subproblem solving to a low-dimensional random subspace, substantially reducing per-iteration cost and making derivative-free stochastic trust-region optimization scalable (Dzahini et al., 2022).
Bi-fidelity and composite extensions: Methods such as ASTRO-BFDF leverage low-fidelity surrogates for variance reduction and reduced sample cost, while ProxSTORM extends the trust-region paradigm to composite functions with possibly nonsmooth convex regularizers (Ha et al., 2024, Baraldi et al., 3 Oct 2025).

These methods share a structure of iterative model construction, subproblem solution under a trust-region constraint, and adaptively controlled radii, with step acceptance (and possibly model sample size) governed by probabilistic reduction or improvement tests.

3. Mathematical Models, Variance Reduction, and Subproblem Formulation

Stochastic trust-region methods hinge on the formulation of the model $f_i$ 5 and the subproblem constraints. The general model is

$f_i$ 6

subject to $f_i$ 7

where $f_i$ 8 is a stochastic (mini-batch or variance-reduced) estimator of the gradient, and $f_i$ 9 is a stochastic Hessian, a diagonal/BB quasi-Newton approximation, or simply the identity. In several advanced variants, $C^1$ 0 may depend on $C^1$ 1 (“gradient-dependent Hessian”) and may be indefinite (Fang et al., 21 Jan 2026).

Variance-reduced gradient estimators of SVRG type are central in high-accuracy, low-variance methods:

$C^1$ 2

where $C^1$ 3 is the full gradient at a reference point (Fang et al., 21 Jan 2026, Zheng, 2024).

In derivative-free or bandit settings, $C^1$ 4 may be a regression/interpolation model fitted to noisy zeroth-order data, possibly in a random subspace (Dzahini et al., 2022). In composite or nonsmooth optimization, the model incorporates the proximal mapping of a convex term (Baraldi et al., 3 Oct 2025).

Radius rules are diverse, but prominent forms are:

$C^1$ 5 (gradient norm–proportional)
Multi-zone piecewise rules depending on $C^1$ 6 (TRish: normalized when $C^1$ 7 is moderate, scaled otherwise) (Curtis et al., 2017, Bellavia et al., 2024).

4. Theoretical Convergence and Complexity Analysis

The central theoretical contributions establish global convergence in expectation or almost surely and (when possible) quantitative complexity or sample complexity rates. A representative convergence theorem (using SVRG gradients, as in TRSVR) is:

$C^1$ 8

with $C^1$ 9, total sample complexity $f$ 0 to reach $f$ 1-stationarity (Fang et al., 21 Jan 2026). This matches the theoretical best rates of first-order variance-reduced methods.

Classical model-based schemes (e.g., STORM) prove almost-sure convergence to stationary points under high-probability “full-linearity” of models and an adaptive trust-region process. Under these assumptions (including $f$ 2 for model and estimate accuracy), $f$ 3 a.s. and $f$ 4 converges to a point $f$ 5 with $f$ 6 (Chen et al., 2015).

Second-order methods for nonconvex minimization (STR) achieve an $f$ 7 stochastic Hessian oracle complexity for finding $f$ 8–approximate local minima, outperforming existing cubic/subsampled cubic approaches (Shen et al., 2019).

In composite, constrained, or multiobjective extensions, analogous Lyapunov or potential function arguments using martingale and renewal-reward arguments underpin global convergence results, possibly to KKT or Pareto–criticality (Baraldi et al., 3 Oct 2025, Krejić et al., 10 Jan 2025, Fang et al., 2024, Fang et al., 2022).

5. Practical Implementation and Parameter Selection

Efficient realization of stochastic trust-region methods requires careful choices of mini-batch size, inner-loop length, radius-control parameters, and subproblem solver tolerance:

Mini-batch size $f$ 9 and inner-loop length $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 0 are tuned to balance per-epoch cost and variance: for dense problems, $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 1– $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 2, $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 3– $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 4; for high-dimensional sparse data, small $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 5 and large $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 6 are favored. The parameter $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 7 (radius-control) is grid searched (Fang et al., 21 Jan 2026).
For STORM and probabilistic model-based methods, accuracy in model fitting and function estimation is typically scaled as $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 8 for value (using $f(x) = \mathbb{E}_\xi[F(x,\xi)]$ 9 samples per iteration); linear-probabilistic accuracy with smaller batch sizes can be achieved under less restrictive conditions (Chen et al., 2015, Wang et al., 2019).
Subproblem solvers range from exact or inexact CG (typically 3–20 iterations sufficient), to closed-form updates in first-order (TRish) or diagonal BB steplength methods (Chauhan et al., 2018, Bellavia et al., 2024).
Adaptive sampling and bi-fidelity approaches further reduce cost by leveraging low-fidelity surrogates or streaming variance estimates (Ha et al., 2024).

6. Empirical Evaluation and Application Domains

Benchmark suites for stochastic trust-region methods span large-scale logistic regression, SVMs, deep neural network training, reinforcement learning policy optimization, multi-objective learning, and derivative-free black-box optimization:

In machine learning (e.g., Covertype, IJCNN1, RCV1, a9a), variance-reduced trust-region methods (TRSVR, TR-SVR) reach high-precision $m_k(p)$ 0 in less wall-clock time than SGD or Adam, and outperform even highly tuned Adam baselines (Fang et al., 21 Jan 2026, Zheng, 2024). Second-order and quasi-Newtonized variants (STRON, BB-TRish) further improve convergence on ill-conditioned or nonconvex problems (Chauhan et al., 2018, Bellavia et al., 2024).
In reinforcement learning, trust-region frameworks outperform trust-region policy optimization (TRPO) and proximal policy optimization (PPO) on MuJoCo and Atari benchmarks, owing to adaptive radius control and variance-aware acceptance (Zhao et al., 2019).
Derivative-free stochastic trust-region algorithms (STARS, ASTRO-BFDF) deliver competitive or superior performance to gradient-based and sample-average schemes in simulation and engineering design (Dzahini et al., 2022, Ha et al., 2024).
In nonlinear optimization with equality constraints, stochastic TR-SQP and fully stochastic TR-StoSQP attain robust convergence, escape saddles, and outperform line-search SQP in noisy or ill-conditioned settings (Fang et al., 2024, Fang et al., 2022).
Multi-objective methods (SMOP) and minimax decision-dependent variants have demonstrated a.s. convergence to Pareto or saddle points in challenging settings, including fair machine learning and robust optimization (Krejić et al., 10 Jan 2025, Gao et al., 16 Sep 2025).

7. Advances, Limitations, and Research Directions

Stochastic trust-region methodology has advanced significantly in recent years, with key innovations including:

Fully stochastic frameworks: Modern algorithms eliminate the need for exact function measurements, full gradients, or deterministic Hessians, enabling scalable deployment on large or simulation-generated datasets (Fang et al., 21 Jan 2026, Chen et al., 2015).
Variance-reduction integration: SVRG- and SAGA-type estimators, when combined with trust-region geometry, substantially lower sample complexity and improve asymptotic precision without sacrificing robustness.
Adaptive, probabilistic radius and model control: Designs such as $m_k(p)$ 1, random-subspace models, and renewal-reward-based complexity analysis grant both theoretical guarantees and practical scalability.
Extension to diverse settings: The trust-region principle now underpins state-of-the-art algorithms for nonsmooth composite optimization, multi-objective decision-making, derivative-free and high-dimensional settings, and minimax game-theory problems.

Nevertheless, limitations include:

The need for probabilistic or variance assumptions for global convergence proofs.
Sample sizes for model accuracy still scale as $m_k(p)$ 2 in nonsmooth, derivative-free, or multiobjective problems.
Hyperparameter tuning (radius-scaling, acceptance thresholds, etc.) remains nontrivial, though default regimes are suggested.
Second-order (Hessian-based) stochastic approaches, though empirically strong, rely on accurate curvature estimation, which can be expensive in noisy or mini-batch regimes.

Current research is actively developing adaptive sample sizing, non-convex constraint handling, hybrid variance reduction/model-based techniques, and improved theoretical rates under weaker noise assumptions.

References

TRSVR: "TRSVR: An Adaptive Stochastic Trust-Region Method with Variance Reduction" (Fang et al., 21 Jan 2026)
STORM: "Stochastic Optimization Using a Trust-Region Method and Random Models" (Chen et al., 2015)
TRON/STRON: "Stochastic Trust Region Inexact Newton Method for Large-scale Machine Learning" (Chauhan et al., 2018)
TR-SVR: "Trust-Region Stochastic Optimization with Variance Reduction Technique" (Zheng, 2024)
ProxSTORM: "ProxSTORM -- A Stochastic Trust-Region Algorithm for Nonsmooth Optimization" (Baraldi et al., 3 Oct 2025)
SMOP: "SMOP: Stochastic trust region method for multi-objective problems" (Krejić et al., 10 Jan 2025)
STRME: "Stochastic Trust Region Methods with Trust Region Radius Depending on Probabilistic Models" (Wang et al., 2019)
TRish/BB: "Fully stochastic trust-region methods with Barzilai-Borwein steplengths" (Bellavia et al., 2024), "A Stochastic Trust Region Algorithm Based on Careful Step Normalization" (Curtis et al., 2017)
STARS: "Stochastic trust-region algorithm in random subspaces" (Dzahini et al., 2022)
ASTRO-BFDF: "Adaptive Sampling-Based Bi-Fidelity Stochastic Trust Region Method" (Ha et al., 2024)
TR-SQP-STORM: "Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models" (Fang et al., 2024)
SIRTR: "A stochastic first-order trust-region method with inexact restoration" (Bellavia et al., 2021)
Policy optimization: "A Stochastic Trust-Region Framework for Policy Optimization" (Zhao et al., 2019)
Stochastic minimax: "Trust Region Algorithm for Stochastic Minimax Problems with Decision-Dependent Distributions" (Gao et al., 16 Sep 2025)