Recursive Least Squares with Forgetting Factor

Updated 14 February 2026

RLS with Forgetting Factor is an adaptive algorithm that recursively estimates parameters using exponential weighting of past data to handle time-varying systems.
It balances rapid tracking with reduced steady-state estimation variance by adjusting the forgetting factor, whether through static or adaptive schemes.
Advanced variants incorporate variable-rate, partial, and directional forgetting to optimize performance in nonstationary and sparse environments.

Recursive Least Squares (RLS) with Forgetting Factor is a foundational family of algorithms for online identification, parameter estimation, and adaptive filtering in nonstationary environments. By regularly discounting the impact of old data, these algorithms adapt to time-varying systems while preserving the computational advantages of exact recursive least squares. The forgetting factor—whether scalar, vector, direction-dependent, or dynamically optimized—controls the balance between rapid tracking and estimation variance. This article reviews the unified mathematical structures, algorithmic variants, adaptation strategies, stability guarantees, and key application domains of RLS with forgetting, referencing recent arXiv research.

1. Exponentially Weighted Least Squares and Classic RLS Recursions

The prototype RLS with forgetting factor minimizes the exponentially weighted cost

$J_k(\theta) = \sum_{i=0}^k \lambda^{k-i} (y_i - \phi_i^\top\theta)^2,$

where $0 < \lambda \le 1$ is the forgetting factor. The scalar $\lambda$ controls the exponential rate at which past data are down-weighted. This yields the hallmark matrix-vector recursions: $\begin{aligned} K_k &= \frac{P_{k-1} \phi_k}{\lambda + \phi_k^\top P_{k-1} \phi_k}, \ \theta_k &= \theta_{k-1} + K_k [y_k - \phi_k^\top \theta_{k-1}], \ P_k &= \frac{1}{\lambda}\left[P_{k-1} - K_k \phi_k^\top P_{k-1}\right]. \end{aligned}$ Here $P_k$ is the error covariance ("inverse information") matrix, and the gain $K_k$ scales the update based on current data novelty (Lai et al., 2024, Lai et al., 2023).

For vector-output or multi-output systems, the updates carry over via matrix analogues, with the forgetting factor entering the weighted information matrix and all major terms (Brüggemann et al., 2020).

2. Role and Effects of the Forgetting Factor

A small $\lambda$ (close to zero) aggressively discounts older data, providing fast tracking but amplifying steady-state estimation error and sensitivity to noise. A large $\lambda$ (approaching one) yields slower adaptation but reduces variance and improves noise immunity [(Lai et al., 2024); (Boya et al., 2014); (Yuan et al., 2019)]. Dynamic trade-offs are captured quantitatively: in online learning, a static $\lambda$ yields a regret bound interpolating between $O(\log T)$ (static environment) and $O(\sqrt{T V})$ (bounded path length $V$ of drift), with explicit control via $\lambda$ selection (Yuan et al., 2019).

3. Generalizations: Variable-Rate, Directional, and Multi-Scheme Forgetting

Beyond scalar, constant forgetting, recent research details several structured extensions:

Variable-rate forgetting (VRF): The forgetting factor sequence $\lambda_k$ (or $\beta_k = 1/\lambda_k$ ) adapts online, often as a function of residuals, yielding rapid responsiveness during abrupt parameter changes and noise rejection in stationary periods (Bruce et al., 2020).
Multiple/partial forgetting: A vector $\lambda = (\lambda_1,\dots,\lambda_p)$ allows each parameter or direction to evolve with its own forgetting profile. Generalized mapping schemes (e.g., Tuned/Correlated, Cubic-Spline-inspired) are designed for problems with heterogeneous rates of change in system subcomponents (Fraccaroli et al., 2015). This approach preserves positive definiteness of the information matrix and is computationally comparable to standard RLS.
Directional or subspace forgetting (SIFt-RLS, VDF-RLS): Forgetting is applied only in directions excited by new data (i.e., “information subspaces”). This prevents information loss and parameter drift in unexcited directions, bounding the covariance without persistent excitation and ensuring robust operation in low-rank or poorly excited environments (Lai et al., 2024, Park et al., 2024).
Segmented forgetting profiles: Designing a composite forgetting function with piecewise segments—fast (recent data), constant (plateau), and slow (distant past)—supports control over tracking speed, condition number, and estimator robustness. This allows encoding prior knowledge of system time scales and periodicities (Stotsky, 19 Nov 2025).

Scheme	Adaptation Target	Typical Use Case
Scalar $\lambda$	Uniform weight	Generic time-varying systems
Time-varying $\lambda_k$	Residual/adaptive	Abrupt or context-dependent changes
Vector multi-forgetting	Parameter/direction-selective	Heterogeneous subsystem variation
Directional forgetting	Information subspace	Sparse excitation, low-rank data, stability

4. Robustness, Stability, and Optimality Results

The underlying quadratic cost structure of RLS with forgetting allows direct analysis via Lyapunov techniques. Explicit global exponential stability and robustness (global uniform ultimate boundedness) are established for several variants, including with noise, drift, or errors-in-variables—assuming appropriate excitation and boundedness conditions (Lai et al., 2023).

In the generalized forgetting RLS (GF-RLS) framework, all major RLS extensions (exponential, variable-rate, resetting, directional/partial forgetting) are unified as specific choices of per-step forgetting matrix $F_k$ : $P_{k+1}^{-1} = P_k^{-1} - F_k + \phi_k^\top\phi_k.$ Selection of $F_k=(1-\lambda)P_k^{-1}$ recovers exponential forgetting; $F_k=(1-\lambda_k)P_k^{-1}$ yields adaptive/variable-rate forms; directional and partial forgetting correspond to more general $F_k$ structures (Lai et al., 2023).

In the context of impulsive or non-Gaussian noise, robust RLS generalizations employ M-estimators and sparsity regularizers. The jointly optimized S-RRLS (JO-S-RRLS) algorithm extends this by adaptively optimizing both $\lambda_k$ and the sparsity weighting $\rho_k$ at each step, achieving superior tracking and misadjustment trade-offs in sparse estimation under impulsive perturbations (Yu et al., 2022).

5. Practical Algorithms and Adaptive Mechanisms

Several adaptive mechanisms for online $\lambda$ tuning have emerged:

Error-correlation driven VFF: Forgetting factor is set via a running average of error energy, with explicit bounding to avoid instability (CTVFF). This technique outperforms both fixed- $\lambda$ and gradient-based VFF schemes in rapid adaptation and steady-state noise performance, while incurring minimal additional computation (Cai et al., 2013).
Criterion-aware VFF for blind adaptive filtering: Error metric violation (e.g., constant modulus) directly modulates $\lambda$ , yielding optimal steady-state MSE and faster nonstationarity tracking (Boya et al., 2014).
Augmented regressor and two-layered forgetting: Outer-loop (exponential) and inner-loop (directional) forgetting are combined to guarantee parameter convergence even under finite excitation, with global exponential stability established via Lyapunov arguments (Tsuruhara et al., 28 Apr 2025).

6. State-of-the-Art Directions and Domain Applications

Recent RLS with forgetting variants address context-specific challenges:

Sliding window RLS with rank-two/low-rank upgrades: These variants employ rank-two updates and composite forgetting, allowing precise trade-offs between transient adaptation, memory length, and numerical conditioning. The segmented forgetting profile RLS exemplifies this by partitioning memory into regions tailored for rapid estimation or condition number control (Stotsky, 15 Jul 2025, Stotsky, 19 Nov 2025).
Robustness under nonstationarity and noise: Theoretical and experimental analyses confirm that carefully constructed forgetting profiles, variable-rate mechanisms, and sparsity-promoting penalties yield exponential convergence, accurate tracking, and bounded estimator variance in both time-invariant and abruptly changing environments (Yu et al., 2022, Bruce et al., 2020).
Online learning and regret guarantees: Forgetting-factor RLS achieves order-optimal dynamic regret bounds in nonstationary data streams, rigorously balancing “static” performance with adaptation to time-varying targets, matching the best achievable rates up to logarithmic factors (Yuan et al., 2019).
Connections to Kalman filtering: RLS with (generalized) forgetting is a special case of adaptive Kalman filtering for static or slow parameter dynamics, and extensions to combined RLS/Kalman filters adopt richer forgetting structures for improved estimation in systems with unmodeled or abrupt dynamics (Lai et al., 2024).

7. Summary Table: Key RLS Forgetting Variants

Variant	Key Reference	Distinctive Feature(s)
Classic exponential forgetting	(Lai et al., 2024)	Uniform time-discounting
Variable-rate forgetting	(Bruce et al., 2020)	Residual/adapted $\lambda_k$ , proven convergence
Multiple/partial forgetting	(Fraccaroli et al., 2015)	Parameter/group-wise tuning of $\lambda_i$
Subspace/directional forgetting	(Lai et al., 2024, Park et al., 2024)	Forgetting only in excited directions
Jointly optimized robust/sparse RLS	(Yu et al., 2022)	$\lambda_k$ and $\rho_k$ adapted via closed-form formulas
Segmented forgetting profile	(Stotsky, 19 Nov 2025)	Piecewise composite exponential/plateau decay
Two-layered (outer+inner) forgetting	(Tsuruhara et al., 28 Apr 2025)	FE→PE lift, global exponential stability
Adaptive Kalman-RLS fusion	(Lai et al., 2024)	Generalized forgetting as structural design

References

Lai & Bernstein, “Generalized Forgetting Recursive Least Squares: Stability and Robustness Guarantees” (Lai et al., 2023)
Glushchenko et al., “Robust method to provide exponential convergence of model parameters…” (Glushchenko et al., 2020)
Stotsky, “Performance Enhancement of the Recursive Least Squares Algorithms with Rank Two Updates” (Stotsky, 15 Jul 2025)
Stotsky, “RLS Framework with Segmentation of the Forgetting Profile and Low Rank Updates” (Stotsky, 19 Nov 2025)
Xian, “SIFt-RLS: Subspace of Information Forgetting Recursive Least Squares” (Lai et al., 2024)
Uehara et al., “Discrete-time Two-Layered Forgetting RLS Identification under Finite Excitation” (Tsuruhara et al., 28 Apr 2025)
Wu et al., “Inverter Output Impedance Estimation in Power Networks: A Variable Direction Forgetting Recursive-Least-Square Algorithm Based Approach” (Park et al., 2024)
Roman et al., “Study of Robust Sparsity-Aware RLS algorithms with Jointly-Optimized Parameters…” (Yu et al., 2022)
Yu & Bernstein, “A New Recursive Least-Squares Method with Multiple Forgetting Schemes” (Fraccaroli et al., 2015)
Paleologu et al., “Low-Complexity Variable Forgetting Factor Techniques…” (Cai et al., 2013)
de Lamare & Sampaio-Neto, “Low-Complexity Variable Forgetting Factor… for Adaptive Beamforming” (Boya et al., 2014)
Hazan & Luo, “Trading-Off Static and Dynamic Regret in Online Least-Squares and Beyond” (Yuan et al., 2019)
Bernstein, “Adaptive Kalman Filtering Developed from Recursive Least Squares Forgetting Algorithms” (Lai et al., 2024)
Qiu et al., “Exponential convergence of recursive least squares with forgetting factor for multiple-output systems” (Brüggemann et al., 2020)

These works represent the current technical landscape and provide the rigorous foundations for design, implementation, and theoretical guarantees for RLS with various forgetting factor methodologies.