RNVI: Regularized Normalized Value Iteration

Updated 13 December 2025

RNVI is a fixed-point iterative algorithm that reformulates state-feedback stabilization as a nonlinear eigenvalue problem with regularization and normalization.
It leverages regularization to ensure strictly positive-definite Lyapunov certificates, offering robust performance even in degenerate or high-dimensional settings.
The algorithm converges linearly under contractive conditions and employs parameter continuation to efficiently handle non-contractive scenarios.

Regularized Normalized Value Iteration (RNVI) is a fixed-point iterative algorithm developed to compute both near-optimal state-feedback controllers and rigorous two-sided bounds for the optimal mean-square stabilizing rate, $\rho^*$ , in stochastic discrete-time linear systems with multiplicative noise. The method reformulates the classic stabilization objective as a nonlinear matrix eigenvalue problem augmented by regularization and normalization, ensuring the existence and strict positivity of Lyapunov-type solutions even in high-dimensional or degenerate settings. RNVI synthesizes feedback gains and certified performance guarantees while maintaining computational tractability for moderately large-scale systems (Jia et al., 6 Dec 2025).

1. Nonlinear Eigenvalue Formulation

The stochastic discrete linear system under consideration is governed by state transition matrices $(A, \bar A)$ and control matrices $(B, \bar B)$ , alongside a noise process $\omega \sim \mathcal N(0,\sigma^2)$ . Restricting to time-invariant state-feedback laws $u_k = -K x_k$ , any quadratic Lyapunov-type certificate $h(x) = x^\top P x$ , $P \in \mathbb S^n_{++}$ , must satisfy the Bellman-type equality: $e^{\lambda} h(x) = \inf_{v \in \mathbb R^m} \mathbb E \left[h\big((A+\bar A \omega)x + (B+\bar B \omega)v\big) \,\big|\, x \right].$ Optimization over $v$ yields the so-called stochastic Riccati blocks: $R(P) = B^\top P B + \sigma^2 \bar B^\top P \bar B,\quad S(P) = A^\top P B + \sigma^2 \bar A^\top P \bar B,$ and the nonlinear operator: $\Phi(P) = A^\top P A + \sigma^2 \bar A^\top P \bar A - S(P) R(P)^{-1} S(P)^\top.$ The stabilization rate problem is then equivalent to the nonlinear matrix eigenvalue problem (EV): $\Phi(P) = \gamma P, \quad \gamma = e^\lambda, \quad P \succ 0.$

2. Regularization and Normalization Approach

The principal challenge is that (EV) may only admit positive semi-definite solutions on the boundary of $\mathbb S^{n}_+$ , rendering them inadequate for certifying exponential stabilization or synthesizing controllers. To guarantee strictly positive-definite matrices and keep iterates interior to the cone, the operator $\Phi$ is perturbed and normalized using a regularization parameter $\tau \in (0,1)$ : $Y_\tau(P) = (1 - \tau) \Phi(P) + \frac{\tau}{n} I,\qquad \widehat\Phi_\tau(P) = \frac{Y_\tau(P)}{\mathrm{Tr}(Y_\tau(P))}.$ This regularization mixes the image of $\Phi(P)$ with the full-rank matrix $\frac{1}{n}I$ and ensures that the mapping $\widehat\Phi_\tau$ preserves the compact, convex trace slice $\{P \succeq 0 : \mathrm{Tr} P = 1\}$ .

3. RNVI Algorithmic Procedure

RNVI is specified as a fixed-point map on positive-definite matrices with unit trace: $P_{k+1} = \widehat\Phi_\tau(P_k),\quad P_0 \in \mathbb S^n_{++},\; \mathrm{Tr}(P_0) = 1.$ The update steps are:

Input: (A, Ā, B, Ḃ), noise variance σ², τ ∈ (0,1), initial P₀ ≻ 0, Tr(P₀) = 1.
For k = 0,1,2,...:
   1. Rₖ = BᵀPₖB + σ²·ḂᵀPₖḂ;  Sₖ = AᵀPₖB + σ²·ĀᵀPₖḂ.
   2. Φₖ = AᵀPₖA + σ²·ĀᵀPₖĀ – Sₖ·Rₖ⁻¹·Sₖᵀ.
   3. Yₖ = (1–τ)·Φₖ + (τ/n)·I.
   4. Trₖ = trace(Yₖ).
   5. Pₖ₊₁ = Yₖ / Trₖ.
   6. If ∥Pₖ₊₁ – Pₖ∥ ≤ ε, stop.
End For
Output: P^(τ) ≈ fixed point of widehatΦ₍τ₎.

Each update guarantees

P_k \in \mathbb S^{n}_{++}

with

\mathrm{Tr} P_k = 1

4. Existence and Convergence Properties

Existence: Brouwer’s theorem ensures that, for each $\tau \in (0,1)$ , the map $\widehat\Phi_\tau$ admits a fixed point $P^{(\tau)} \succ 0$ with $\mathrm{Tr} P^{(\tau)} = 1$ . All eigenvalues are uniformly bounded away from zero: $P^{(\tau)} \succeq \delta_\tau I$ for some $\delta_\tau > 0$ .
Contraction and Rate: Whenever the local Lipschitz constant $\Lambda(\tau)$ satisfies $\Lambda(\tau) < 1$ , the map is a contraction, guaranteeing global linear convergence:

$\|P_k - P^{(\tau)}\| \leq \Lambda(\tau)^k \|P_0 - P^{(\tau)}\|.$

Continuation in $\tau$ : If $\Lambda(\tau)\geq 1$ , convergence is attained by parameter continuation: starting from a “safe” large $\tau_0$ , RNVI is run to convergence, $\tau$ is reduced (e.g., halved) at each step, and each fixed point is used to initialize the next run. Local contraction arguments ensure success of this procedure.

5. Feedback Controller Synthesis and Certified Bounds

Upon obtaining $P^{(\tau)}$ , the corresponding state-feedback gain is

$K^{(\tau)} = R(P^{(\tau)})^{-1} S(P^{(\tau)})^\top.$

The algorithm additionally produces certified lower and upper bounds for the optimal stabilization performance. Define

$L(P) := \lambda_{\min}(P^{-1/2}\Phi(P)P^{-1/2}),\qquad U(P) := \lambda_{\max}(P^{-1/2}\Phi(P)P^{-1/2}).$

Let $J^*$ be the infimum of the rate objective, $J^* = 2\log\rho^*$ . For any feedback policy $u$ , $J(x_0,u) \geq \log L(P)$ , and under $u^{(\tau)}: u_k = - K^{(\tau)} x_k$ , $J(x_0,u^{(\tau)}) \leq \log\left(\frac{\gamma^{(\tau)}}{1 - \tau}\right)$ where $\gamma^{(\tau)} = \mathrm{Tr}((1-\tau)\Phi(P^{(\tau)}) + \frac{\tau}{n}I)$ . Thus, the true rate $\rho^*$ is bounded by: $\sqrt{L(P^{(\tau)})} \leq \rho^* \leq \sqrt{\frac{\gamma^{(\tau)}}{1 - \tau}}.$ A $\tau$ -sweep (Algorithm 1) identifies the tightest certified bounds by collecting results across a regularization path.

6. Computational Complexity and Practical Aspects

Each RNVI iterate comprises:

Two quadratic-form computations ( $A^\top P A$ , $\bar A^\top P \bar A$ )
Solving one $m \times m$ linear system $R(P)X = S(P)^\top$
Computing a matrix trace and an $m$ -dimensional matrix inversion

The per-iteration computational cost is $O(n^3 + m^3)$ . Under contraction ( $\Lambda(\tau) < 1$ ), achieving accuracy $\varepsilon$ requires $O(\ln(1/\varepsilon)/\ln(1/\Lambda(\tau)))$ steps. The continuation in $\tau$ typically involves $M = 10$ –$50$ steps, with each local RNVI converging geometrically. The overall numerical complexity is $O(M n^3 \ln(1/\varepsilon))$ , making RNVI feasible for state dimensions up to several hundred.

7. Significance and Scope

RNVI resolves the fundamental difficulty of well-posed control synthesis for stochastic systems with multiplicative noise, converting the intractable nonlinear spectral condition into an interior fixed-point iteration that is both computationally viable and rigorously certifiable. It guarantees strictly positive-definite certificates, supports systematic controller extraction, and yields verifiable performance bounds for the optimal mean-square stabilization rate. The regularization and normalization paradigm ensures robustness to ill-conditioning, and the algorithmic structure supports practical deployment in moderate-dimensional linear stochastic control scenarios (Jia et al., 6 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Stabilizing Rate of Stochastic Control Systems with Multiplicative Noise (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Regularized Normalized Value Iteration (RNVI).