Recurrent Equilibrium Networks (RENs)

Updated 13 November 2025

Recurrent Equilibrium Networks are nonlinear dynamical models defined via implicit fixed-point equations that ensure unique equilibria under contractivity.
They employ an unconstrained parametrization that guarantees stability, dissipativity, and robust performance in tasks like system identification and control.
RENs generalize standard RNNs with both discrete and continuous-time formulations, preserving formal robustness via LMI-based certificates.

Recurrent Equilibrium Networks (RENs) are a class of nonlinear dynamical models that generalize recurrent neural networks (RNNs) via implicit layers, capturing a broad set of fading-memory, stable, and robust system behaviors. They are defined by fixed-point equations rather than explicit state transitions, and can be parameterized to be contracting and dissipative by construction, a property preserved even when scaling to large parameter spaces. RENs admit unconstrained learning, enabling the use of generic stochastic gradient-descent methods while guaranteeing strong stability, robustness, and incremental integral quadratic constraint (IQC) properties. The REN framework encompasses discrete- and continuous-time models and supports advanced applications in system identification, observer design, control, distributed control, and reduction of dynamics with formal preservation of robustness guarantees.

1. Mathematical Formulation and Core Properties

A discrete-time Recurrent Equilibrium Network is specified by the implicit layer equation $x = f(x, u)$ , where $x \in \mathbb{R}^n$ is the equilibrium state, $u \in \mathbb{R}^m$ is the input, and $f: \mathbb{R}^n \times \mathbb{R}^m \rightarrow \mathbb{R}^n$ is typically affine in $(x,u)$ with a nonlinear component:

$v = C_1 x + D_{11} \sigma(v) + D_{12} u + b_v$
$w = \sigma(v)$
$x = A x + B_1 w + B_2 u + b_x$
$y = C_2 x + D_{21} w + D_{22} u + b_y$

The solution $x$ for a given input $u$ is determined as the unique fixed-point of $f(\cdot, u)$ , under the condition that $f$ is globally Lipschitz in $x$ with constant $L < 1$ . In such case, uniqueness and existence follow from the Banach fixed-point theorem and the equilibrium can be found via simple iteration $x_{k+1} = f(x_k, u)$ . The contractivity of $f$ is equivalently enforced via the Jacobian spectral norm $\sup_{x,u} \left\| \frac{\partial f}{\partial x}(x, u) \right\|_2 < 1$ .

Continuous-time generalizations, termed NodeRENs, embed the REN structure into neural ODEs: $\begin{aligned} \dot{x}(t) &= A x(t) + B_1 w(t) + B_2 u(t) + b_x \ v(t) &= C_1 x(t) + D_{11} w(t) + D_{12} u(t) + b_v \ w(t) &= \sigma(v(t)) \ y(t) &= C_2 x(t) + D_{21} w(t) + D_{22} u(t) + b_y \end{aligned}$ Here, $\sigma$ is slope-restricted to $[0,1]$ , and structural constraints (e.g., strictly lower-triangular $D_{11}$ ) ensure well-posedness and uniqueness for given $u(t)$ .

2. Unconstrained Parametrization, Contractivity, and Dissipativity

The central innovation for large-scale, robust learning in RENs is the unconstrained parametrization of network weights so that contractivity and dissipativity are maintained for all parameter values. In the discrete-time case, the weight matrices are constructed from free parameters via block matrices (e.g., $H = X^\top X + \varepsilon I$ partitioned into suitable blocks), followed by closed-form recovery of REN parameters $(A, B_1, C_1, D_{11}, \ldots)$ that satisfy a contractivity linear matrix inequality (LMI). For contractive RENs (C-RENs), parameters $X, X_P, U, Y_1$ , etc., are used to synthesize weight matrices guaranteeing

$\left[ \begin{array}{cc} - A^\top P - P A & - C_1^\top \Lambda - P B_1 \ * & 2\Lambda - \Lambda D_{11} - D_{11}^\top \Lambda \end{array} \right] \succ 0$

where $P \succ 0$ , $\Lambda \succ 0$ diagonal.

For dissipativity or incremental IQC-enforced RENs (IQC-RENs), free parameters feed through a Cayley-type parameterization, ensuring the satisfaction of the extended LMI corresponding to a supply matrix $[Q, S^\top; S, R]$ (e.g., for $\mathcal{L}_2$ -gain, Lipschitz, or passivity properties). The continuous-time NodeREN versions mirror this process by constructing appropriate Lyapunov and supply matrices via the same parameterization logic, guaranteeing $\forall\ \theta$ that contractivity and dissipativity hold.

A summary of key contractivity and IQC LMI structures is provided below:

LMI Type	Key Block-Form Condition	Guarantee
Contractivity	$[-A^\top P - P A,\ -C_1^\top\Lambda - P B_1; *; 2\Lambda - \Lambda D_{11} - D_{11}^\top \Lambda] \succ 0$	Uniqueness, geometric convergence to equilibrium
Incremental IQC	See text (block matrix involving $A$ , $B$ , $C$ , $D$ , $P$ , $\Lambda$ , $Q$ , $S$ , $R$ )	Dissipativity in sense of the given IQC

The unconstrained design allows solvers like Adam (SGD) to operate directly on free parameters, without recourse to constrained optimization or projections.

3. Expressiveness, Universal Approximation, and System-Theoretic Perspective

RENs are strictly more general than standard RNNs and represent numerous important model classes:

All stable linear time-invariant (LTI) systems are included as special cases of the REN architecture.
Contracting classical RNNs and echo state networks appear as specializations through selected zero structures in the REN matrices.
Static deep feedforward neural networks are realized as block-triangular cases.
Wiener/Hammerstein and block-structured models emerge via structured zeroing.

The universal approximation property is established: as the REN's state and nonlinear dimension increase, the model class is dense in fading-memory operators and contracting nonlinear systems with finite incremental $\ell^2$ -gain. Special cases (e.g., truncated Volterra series) show that RENs can approximate a broad class of nonlinear dynamical systems while maintaining stability constraints.

4. Training Methodology and Implementation Considerations

Training an REN consists of:

Simulating the system via the implicit state-update: fixed-point iterations for the inner equilibrium, which are empirically efficient.
Optimization of a simulation- or trajectory-based loss (e.g., mean-squared error in system identification, or cost in control scenarios) via Adam or similar optimizers, acting on parameterizations that guarantee feasibility of contractivity/dissipativity.
Gradients with respect to parameters are computed efficiently using the implicit function theorem. For the inner equilibrium $w^*$ :

$\frac{\partial w^*}{\partial \theta} = (I - J D_{11})^{-1} J \frac{\partial (D_{11} w^* + b_v)}{\partial \theta}$

where $J$ is a diagonal matrix of $\sigma'$ .

Adaptive ODE solvers, e.g., Dormand-Prince (dopri5), can be employed for NodeREN time integration, with trade-offs in accuracy and function evaluations.

5. Applications: System Identification, Observer Design, and Robust Nonlinear Control

RENs have wide applicability:

System identification of nonlinear systems: contracting RENs have demonstrated superior performance versus RNNs, LSTMs, and robust RNNs, particularly in sensitivity and certification of Lipschitz bounds. For example, a contracting REN on the F-16 ground vibration dataset achieved NRMSE ≈ 20.1% with observed sensitivity γ̂ ≈ 36.7 under a $\gamma \leq 40$ bound, where RNNs/LSTMs failed to meet robustness requirements (Revay et al., 2021).
Observer design: by learning a contracting nonlinear observer and imposing correctness with respect to the nominal model, RENs guarantee convergence of the estimated to the true state, for instance, in semilinear PDE discretizations (Revay et al., 2021).
Nonlinear robust and distributed control: RENs are integrated into control frameworks, including the nonlinear Youla parameterization ("Youla-REN"), robust data-driven control, and networked/distributed settings with certifiable $\mathcal{L}_2$ $L_{2}$ -gain on interconnections (Wang et al., 2021, Saccani et al., 2024).
- Distributed RENs can be interconnected per a communication graph, each with gain-imposed certificates, and the composite system achieves a global $\mathcal{L}_2$ -stability bound, enforced by design and without solver constraints (Saccani et al., 2024).

In direct reinforcement learning (RL) scenarios, projected policy-gradient approaches allow maximization of arbitrary reward functions under enforced closed-loop stability. The projection onto the feasible set specified by the LMI can be posed as a standard semidefinite program (SDP) and solved via, e.g., CVXPY (Junnarkar et al., 2022).

6. Model Order Reduction and Scaling via Contraction Certificate

Large-scale (high-dimensional) RENs present challenges for real-time deployment on resource-constrained platforms. RENs support dimension reduction by projection, uniquely leveraging the pre-learned contraction or robustness certificate. The method constructs two projection matrices: one (involving $P$ ) fixes contractivity/robustness preservation by construction,

$W^T = (V^T P V)^{-1} V^T P$

The other, $V$ , is updated iteratively to minimize the $H_2$ -error in the LTI component using necessary optimality conditions—mirroring IRKA-type approaches (Shakib, 4 Aug 2025). The reduced-order REN satisfies the same LMI-based guarantees as the original. Numerical results confirm that up to 90% state reduction preserves the contraction and IQC properties with minimal accuracy loss.

7. Empirical Validation, Robustness, and Limitations

RENs and NodeRENs have been validated on standard system identification and control benchmarks:

System identification of nonlinear pendulum systems, with noisy and irregularly sampled data, demonstrating stability even beyond training horizons where unconstrained models fail (Martinelli et al., 2023).
Comparative studies against conventional RNNs, LSTMs, and robust baselines, consistently showing superior sensitivity margins and lower cost when explicit robustness constraints are imposed.
Distributed control applications, where REN networks enable formation control under obstacle avoidance, maintaining network-level stability by construction.
Scaling analyses showing that the unconstrained parameterization supports efficient and stable training on large models.

A key property is robustness to irregular sampling and model uncertainty: RENs trained on differently sampled datasets achieve tight clusters of test loss, confirming their performance consistency.

The primary computational cost lies in the fixed-point solve at each timestep or time-continuous step, but practical architectures and modern hardware easily amortize this overhead, especially as the parameterization eliminates the need for costly projections or stability checks during SGD.

RENs thus unify system-theoretic stability principles with expressive nonlinear modeling and scalable, unconstrained learning. Their structural guarantees enable principled deployment in sensitive applications demanding formal robustness, such as safety-critical control, distributed multi-agent systems, and nonlinear observer design.