Papers
Topics
Authors
Recent
Search
2000 character limit reached

RNVI: Regularized Normalized Value Iteration

Updated 13 December 2025
  • RNVI is a fixed-point iterative algorithm that reformulates state-feedback stabilization as a nonlinear eigenvalue problem with regularization and normalization.
  • It leverages regularization to ensure strictly positive-definite Lyapunov certificates, offering robust performance even in degenerate or high-dimensional settings.
  • The algorithm converges linearly under contractive conditions and employs parameter continuation to efficiently handle non-contractive scenarios.

Regularized Normalized Value Iteration (RNVI) is a fixed-point iterative algorithm developed to compute both near-optimal state-feedback controllers and rigorous two-sided bounds for the optimal mean-square stabilizing rate, ρ\rho^*, in stochastic discrete-time linear systems with multiplicative noise. The method reformulates the classic stabilization objective as a nonlinear matrix eigenvalue problem augmented by regularization and normalization, ensuring the existence and strict positivity of Lyapunov-type solutions even in high-dimensional or degenerate settings. RNVI synthesizes feedback gains and certified performance guarantees while maintaining computational tractability for moderately large-scale systems (Jia et al., 6 Dec 2025).

1. Nonlinear Eigenvalue Formulation

The stochastic discrete linear system under consideration is governed by state transition matrices (A,Aˉ)(A, \bar A) and control matrices (B,Bˉ)(B, \bar B), alongside a noise process ωN(0,σ2)\omega \sim \mathcal N(0,\sigma^2). Restricting to time-invariant state-feedback laws uk=Kxku_k = -K x_k, any quadratic Lyapunov-type certificate h(x)=xPxh(x) = x^\top P x, PS++nP \in \mathbb S^n_{++}, must satisfy the Bellman-type equality: eλh(x)=infvRmE[h((A+Aˉω)x+(B+Bˉω)v)x].e^{\lambda} h(x) = \inf_{v \in \mathbb R^m} \mathbb E \left[h\big((A+\bar A \omega)x + (B+\bar B \omega)v\big) \,\big|\, x \right]. Optimization over vv yields the so-called stochastic Riccati blocks: R(P)=BPB+σ2BˉPBˉ,S(P)=APB+σ2AˉPBˉ,R(P) = B^\top P B + \sigma^2 \bar B^\top P \bar B,\quad S(P) = A^\top P B + \sigma^2 \bar A^\top P \bar B, and the nonlinear operator: Φ(P)=APA+σ2AˉPAˉS(P)R(P)1S(P).\Phi(P) = A^\top P A + \sigma^2 \bar A^\top P \bar A - S(P) R(P)^{-1} S(P)^\top. The stabilization rate problem is then equivalent to the nonlinear matrix eigenvalue problem (EV): Φ(P)=γP,γ=eλ,P0.\Phi(P) = \gamma P, \quad \gamma = e^\lambda, \quad P \succ 0.

2. Regularization and Normalization Approach

The principal challenge is that (EV) may only admit positive semi-definite solutions on the boundary of S+n\mathbb S^{n}_+, rendering them inadequate for certifying exponential stabilization or synthesizing controllers. To guarantee strictly positive-definite matrices and keep iterates interior to the cone, the operator Φ\Phi is perturbed and normalized using a regularization parameter τ(0,1)\tau \in (0,1): Yτ(P)=(1τ)Φ(P)+τnI,Φ^τ(P)=Yτ(P)Tr(Yτ(P)).Y_\tau(P) = (1 - \tau) \Phi(P) + \frac{\tau}{n} I,\qquad \widehat\Phi_\tau(P) = \frac{Y_\tau(P)}{\mathrm{Tr}(Y_\tau(P))}. This regularization mixes the image of Φ(P)\Phi(P) with the full-rank matrix 1nI\frac{1}{n}I and ensures that the mapping Φ^τ\widehat\Phi_\tau preserves the compact, convex trace slice {P0:TrP=1}\{P \succeq 0 : \mathrm{Tr} P = 1\}.

3. RNVI Algorithmic Procedure

RNVI is specified as a fixed-point map on positive-definite matrices with unit trace: Pk+1=Φ^τ(Pk),P0S++n,  Tr(P0)=1.P_{k+1} = \widehat\Phi_\tau(P_k),\quad P_0 \in \mathbb S^n_{++},\; \mathrm{Tr}(P_0) = 1. The update steps are:

1
2
3
4
5
6
7
8
9
10
Input: (A, Ā, B, Ḃ), noise variance σ², τ ∈ (0,1), initial P₀ ≻ 0, Tr(P₀) = 1.
For k = 0,1,2,...:
   1. Rₖ = BᵀPₖB + σ²·ḂᵀPₖḂ;  Sₖ = AᵀPₖB + σ²·ĀᵀPₖḂ.
   2. Φₖ = AᵀPₖA + σ²·ĀᵀPₖĀ – Sₖ·Rₖ⁻¹·Sₖᵀ.
   3. Yₖ = (1–τ)·Φₖ + (τ/n)·I.
   4. Trₖ = trace(Yₖ).
   5. Pₖ₊₁ = Yₖ / Trₖ.
   6. If ∥Pₖ₊₁ – Pₖ∥ ≤ ε, stop.
End For
Output: P^(τ) ≈ fixed point of widehatΦ₍τ₎.
Each update guarantees PkS++nP_k \in \mathbb S^{n}_{++} with TrPk=1\mathrm{Tr} P_k = 1.

4. Existence and Convergence Properties

  • Existence: Brouwer’s theorem ensures that, for each τ(0,1)\tau \in (0,1), the map Φ^τ\widehat\Phi_\tau admits a fixed point P(τ)0P^{(\tau)} \succ 0 with TrP(τ)=1\mathrm{Tr} P^{(\tau)} = 1. All eigenvalues are uniformly bounded away from zero: P(τ)δτIP^{(\tau)} \succeq \delta_\tau I for some δτ>0\delta_\tau > 0.
  • Contraction and Rate: Whenever the local Lipschitz constant Λ(τ)\Lambda(\tau) satisfies Λ(τ)<1\Lambda(\tau) < 1, the map is a contraction, guaranteeing global linear convergence:

PkP(τ)Λ(τ)kP0P(τ).\|P_k - P^{(\tau)}\| \leq \Lambda(\tau)^k \|P_0 - P^{(\tau)}\|.

  • Continuation in τ\tau: If Λ(τ)1\Lambda(\tau)\geq 1, convergence is attained by parameter continuation: starting from a “safe” large τ0\tau_0, RNVI is run to convergence, τ\tau is reduced (e.g., halved) at each step, and each fixed point is used to initialize the next run. Local contraction arguments ensure success of this procedure.

5. Feedback Controller Synthesis and Certified Bounds

Upon obtaining P(τ)P^{(\tau)}, the corresponding state-feedback gain is

K(τ)=R(P(τ))1S(P(τ)).K^{(\tau)} = R(P^{(\tau)})^{-1} S(P^{(\tau)})^\top.

The algorithm additionally produces certified lower and upper bounds for the optimal stabilization performance. Define

L(P):=λmin(P1/2Φ(P)P1/2),U(P):=λmax(P1/2Φ(P)P1/2).L(P) := \lambda_{\min}(P^{-1/2}\Phi(P)P^{-1/2}),\qquad U(P) := \lambda_{\max}(P^{-1/2}\Phi(P)P^{-1/2}).

Let JJ^* be the infimum of the rate objective, J=2logρJ^* = 2\log\rho^*. For any feedback policy uu, J(x0,u)logL(P)J(x_0,u) \geq \log L(P), and under u(τ):uk=K(τ)xku^{(\tau)}: u_k = - K^{(\tau)} x_k, J(x0,u(τ))log(γ(τ)1τ)J(x_0,u^{(\tau)}) \leq \log\left(\frac{\gamma^{(\tau)}}{1 - \tau}\right) where γ(τ)=Tr((1τ)Φ(P(τ))+τnI)\gamma^{(\tau)} = \mathrm{Tr}((1-\tau)\Phi(P^{(\tau)}) + \frac{\tau}{n}I). Thus, the true rate ρ\rho^* is bounded by: L(P(τ))ργ(τ)1τ.\sqrt{L(P^{(\tau)})} \leq \rho^* \leq \sqrt{\frac{\gamma^{(\tau)}}{1 - \tau}}. A τ\tau-sweep (Algorithm 1) identifies the tightest certified bounds by collecting results across a regularization path.

6. Computational Complexity and Practical Aspects

Each RNVI iterate comprises:

  • Two quadratic-form computations (APAA^\top P A, AˉPAˉ\bar A^\top P \bar A)
  • Solving one m×mm \times m linear system R(P)X=S(P)R(P)X = S(P)^\top
  • Computing a matrix trace and an mm-dimensional matrix inversion

The per-iteration computational cost is O(n3+m3)O(n^3 + m^3). Under contraction (Λ(τ)<1\Lambda(\tau) < 1), achieving accuracy ε\varepsilon requires O(ln(1/ε)/ln(1/Λ(τ)))O(\ln(1/\varepsilon)/\ln(1/\Lambda(\tau))) steps. The continuation in τ\tau typically involves M=10M = 10–$50$ steps, with each local RNVI converging geometrically. The overall numerical complexity is O(Mn3ln(1/ε))O(M n^3 \ln(1/\varepsilon)), making RNVI feasible for state dimensions up to several hundred.

7. Significance and Scope

RNVI resolves the fundamental difficulty of well-posed control synthesis for stochastic systems with multiplicative noise, converting the intractable nonlinear spectral condition into an interior fixed-point iteration that is both computationally viable and rigorously certifiable. It guarantees strictly positive-definite certificates, supports systematic controller extraction, and yields verifiable performance bounds for the optimal mean-square stabilization rate. The regularization and normalization paradigm ensures robustness to ill-conditioning, and the algorithmic structure supports practical deployment in moderate-dimensional linear stochastic control scenarios (Jia et al., 6 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Regularized Normalized Value Iteration (RNVI).