RNVI: Regularized Normalized Value Iteration
- RNVI is a fixed-point iterative algorithm that reformulates state-feedback stabilization as a nonlinear eigenvalue problem with regularization and normalization.
- It leverages regularization to ensure strictly positive-definite Lyapunov certificates, offering robust performance even in degenerate or high-dimensional settings.
- The algorithm converges linearly under contractive conditions and employs parameter continuation to efficiently handle non-contractive scenarios.
Regularized Normalized Value Iteration (RNVI) is a fixed-point iterative algorithm developed to compute both near-optimal state-feedback controllers and rigorous two-sided bounds for the optimal mean-square stabilizing rate, , in stochastic discrete-time linear systems with multiplicative noise. The method reformulates the classic stabilization objective as a nonlinear matrix eigenvalue problem augmented by regularization and normalization, ensuring the existence and strict positivity of Lyapunov-type solutions even in high-dimensional or degenerate settings. RNVI synthesizes feedback gains and certified performance guarantees while maintaining computational tractability for moderately large-scale systems (Jia et al., 6 Dec 2025).
1. Nonlinear Eigenvalue Formulation
The stochastic discrete linear system under consideration is governed by state transition matrices and control matrices , alongside a noise process . Restricting to time-invariant state-feedback laws , any quadratic Lyapunov-type certificate , , must satisfy the Bellman-type equality: Optimization over yields the so-called stochastic Riccati blocks: and the nonlinear operator: The stabilization rate problem is then equivalent to the nonlinear matrix eigenvalue problem (EV):
2. Regularization and Normalization Approach
The principal challenge is that (EV) may only admit positive semi-definite solutions on the boundary of , rendering them inadequate for certifying exponential stabilization or synthesizing controllers. To guarantee strictly positive-definite matrices and keep iterates interior to the cone, the operator is perturbed and normalized using a regularization parameter : This regularization mixes the image of with the full-rank matrix and ensures that the mapping preserves the compact, convex trace slice .
3. RNVI Algorithmic Procedure
RNVI is specified as a fixed-point map on positive-definite matrices with unit trace: The update steps are:
1 2 3 4 5 6 7 8 9 10 |
Input: (A, Ā, B, Ḃ), noise variance σ², τ ∈ (0,1), initial P₀ ≻ 0, Tr(P₀) = 1. For k = 0,1,2,...: 1. Rₖ = BᵀPₖB + σ²·ḂᵀPₖḂ; Sₖ = AᵀPₖB + σ²·ĀᵀPₖḂ. 2. Φₖ = AᵀPₖA + σ²·ĀᵀPₖĀ – Sₖ·Rₖ⁻¹·Sₖᵀ. 3. Yₖ = (1–τ)·Φₖ + (τ/n)·I. 4. Trₖ = trace(Yₖ). 5. Pₖ₊₁ = Yₖ / Trₖ. 6. If ∥Pₖ₊₁ – Pₖ∥ ≤ ε, stop. End For Output: P^(τ) ≈ fixed point of widehatΦ₍τ₎. |
4. Existence and Convergence Properties
- Existence: Brouwer’s theorem ensures that, for each , the map admits a fixed point with . All eigenvalues are uniformly bounded away from zero: for some .
- Contraction and Rate: Whenever the local Lipschitz constant satisfies , the map is a contraction, guaranteeing global linear convergence:
- Continuation in : If , convergence is attained by parameter continuation: starting from a “safe” large , RNVI is run to convergence, is reduced (e.g., halved) at each step, and each fixed point is used to initialize the next run. Local contraction arguments ensure success of this procedure.
5. Feedback Controller Synthesis and Certified Bounds
Upon obtaining , the corresponding state-feedback gain is
The algorithm additionally produces certified lower and upper bounds for the optimal stabilization performance. Define
Let be the infimum of the rate objective, . For any feedback policy , , and under , where . Thus, the true rate is bounded by: A -sweep (Algorithm 1) identifies the tightest certified bounds by collecting results across a regularization path.
6. Computational Complexity and Practical Aspects
Each RNVI iterate comprises:
- Two quadratic-form computations (, )
- Solving one linear system
- Computing a matrix trace and an -dimensional matrix inversion
The per-iteration computational cost is . Under contraction (), achieving accuracy requires steps. The continuation in typically involves –$50$ steps, with each local RNVI converging geometrically. The overall numerical complexity is , making RNVI feasible for state dimensions up to several hundred.
7. Significance and Scope
RNVI resolves the fundamental difficulty of well-posed control synthesis for stochastic systems with multiplicative noise, converting the intractable nonlinear spectral condition into an interior fixed-point iteration that is both computationally viable and rigorously certifiable. It guarantees strictly positive-definite certificates, supports systematic controller extraction, and yields verifiable performance bounds for the optimal mean-square stabilization rate. The regularization and normalization paradigm ensures robustness to ill-conditioning, and the algorithmic structure supports practical deployment in moderate-dimensional linear stochastic control scenarios (Jia et al., 6 Dec 2025).