Liquid-S4: Adaptive Sequence Modeling

Updated 4 January 2026

Liquid-S4 is a state-space model that integrates a linear liquid time-constant ODE with a diagonal-plus-low-rank (DPLR) decomposition for efficient long-range sequential learning.
The model employs input-dependent state transitions and multi-order liquid kernels to enhance adaptive representation and encode sequence correlations.
Empirical evaluations show Liquid-S4 consistently outperforms S4 on benchmarks like Long-Range Arena and audio classification while reducing model parameters.

Liquid-S4 is a structural state-space model (SSM) designed for high-performance representation learning on long-range sequential data. It is constructed by integrating a linear liquid time-constant (LTC) ODE with a diagonal plus low-rank decomposition (DPLR) of state transition matrices, leveraging methodology from S4 (Structured State Spaces). Liquid-S4 is characterized by its input-dependent state transitions and kernel structure that encodes similarities and correlations within the sequence, enabling state-of-the-art results across image, text, audio, and medical signal domains (Hasani et al., 2022).

1. Mathematical Structure of Liquid-S4

Liquid-S4 is founded upon a continuous-time linearized LTC ODE, specified for an $N$ -dimensional hidden state $x(t) \in \mathbb{R}^N$ , scalar input $u(t)$ , and scalar output $y(t)$ : $\frac{d\,x(t)}{dt} = \left[ A + B\,u(t) \right] x(t) + B\,u(t), \qquad y(t) = C\,x(t)$ where $A \in \mathbb{R}^{N\times N}$ is the transition matrix, $B \in \mathbb{R}^{N\times 1}$ is both the bias and input modulation, and $C \in \mathbb{R}^{1 \times N}$ computes the readout.

Discretization via the bilinear (trapezoidal) rule—using step size $\Delta t$ —yields: $\bar{A} = (I - \tfrac{\Delta t}{2}A)^{-1} (I + \tfrac{\Delta t}{2}A), \quad \bar{B} = (I - \tfrac{\Delta t}{2}A)^{-1} \Delta t\,B, \quad \bar{C} = C$ The resulting discrete-time recurrence for each timestep $x(t) \in \mathbb{R}^N$ 0: $x(t) \in \mathbb{R}^N$ 1

2. Diagonal-Plus-Low-Rank Parameterization

The state transition matrix $x(t) \in \mathbb{R}^N$ 2 is initialized using the HiPPO-LegS matrix (scaled Legendre measure) and rewritten in the Normal + Low-Rank (NPLR) decomposition: $x(t) \in \mathbb{R}^N$ 3 with $x(t) \in \mathbb{R}^N$ 4 unitary, $x(t) \in \mathbb{R}^N$ 5 diagonal (initially left half-plane for stability), $x(t) \in \mathbb{R}^N$ 6 low-rank ( $x(t) \in \mathbb{R}^N$ 7 in practice), and $x(t) \in \mathbb{R}^N$ 8, $x(t) \in \mathbb{R}^N$ 9.

This DPLR form ensures computational efficiency, numerical stability, and the flexibility to learn long temporal dependencies. Input dependence is injected strictly via the $u(t)$ 0 modulation.

3. Kernel Construction and Correlation Structure

Liquid-S4 introduces a kernel framework comprising both linear and higher-order correlation terms. For $u(t)$ 1-independent recurrence, the convolutional kernel is: $u(t)$ 2 Efficient computation via FFT leverages the Cauchy kernel pipeline, offering nearly linear-time complexity in sequence length.

The LTC recurrence introduces additional “liquid” kernels representing auto-correlations: $u(t)$ 3 For each chosen order $u(t)$ 4: $u(t)$ 5 Or in PB mode: $u(t)$ 6 Liquid-S4 concatenates the $u(t)$ 7 S4 kernel with higher-order liquid kernels ( $u(t)$ 8), computed in $u(t)$ 9 time.

4. Model Architecture and Training Protocols

Liquid-S4 constructs deep sequence models by stacking multiple state space blocks, each comprising the Liquid-S4 kernel convolution, residual connections, feed-forward layers, and pointwise nonlinearities (GeLU, ReLU). The architecture is causal—each output $y(t)$ 0 depends exclusively on past and current inputs $y(t)$ 1.

Training employs cross-entropy loss for classification or $y(t)$ 2 loss for regression. Regularization strategies include weight decay on learnable parameters $y(t)$ 3, dropout in feed-forward submodules, and gradient penalties on eigenvalues to enhance stability. Optimization is via Adam/AdamW with learning rates tuned per task, typically smaller for Liquid-S4 compared to S4.

The kernel’s input dependence—modulating the recurrence at each timestep—enables Liquid-S4 to re-weight historical inputs adaptively, yielding improved generalization in non-stationary and highly correlated data regimes.

5. Empirical Evaluation and Performance Benchmarks

Liquid-S4 establishes new state-of-the-art results on multiple long-range sequence modeling benchmarks:

Long-Range Arena (1K–16K sequences)

Method	ListOps	IMDB	AAN	CIFAR	PathFinder	Path-X	Average
S4-LegS (reprod.)	59.60	86.82	90.90	88.65	94.20	96.35	86.09
Liquid-S4 (PB, $y(t)$ 4)	62.75	89.02	91.20	89.50	94.80	96.66	87.32

Raw Speech Commands (35 classes, 16 kHz)

Model	Params	Accuracy
S4-LegS	307 K	96.08%
S4D-Lin	306 K	96.25%
Liquid-S4	224 K	96.78%

BIDMC Vital Signs (RMSE)

Model	HR	RR	SpO₂
S4-LegS	0.332	0.247	0.090
S4D-Inv	0.373	0.254	0.110
Liquid-S4 (best $y(t)$ 5)	0.303	0.158	0.066

These results indicate consistent 1–3% gains over S4, with Liquid-S4 frequently achieving higher accuracy and reduced parameterization (e.g., 30% fewer parameters on Speech Commands).

6. Implementation Details and Hyperparameter Choices

Standard settings are:

Number of state space blocks (“depth”): 4–9, task-dependent
Hidden units per block ( $y(t)$ 6): 128–512
State dimension ( $y(t)$ 7): 7 (ListOps/IMDB) to 512 (CIFAR)
Low-rank factor ( $y(t)$ 8): $y(t)$ 9
Liquid kernel order ( $\frac{d\,x(t)}{dt} = \left[ A + B\,u(t) \right] x(t) + B\,u(t), \qquad y(t) = C\,x(t)$ 0): typically 2–4, default at 3
FFT-based convolution via PyKeops for memory efficiency
Forward pass complexity: $\frac{d\,x(t)}{dt} = \left[ A + B\,u(t) \right] x(t) + B\,u(t), \qquad y(t) = C\,x(t)$ 1; memory similar to S4 with minor overhead for correlation kernels

7. Interpretations and Relevance

Liquid-S4’s lightweight kernel modulation introduces input-correlation structure into SSMs, directly encoding similarities of input samples during both training and inference. This property facilitates data-dependent adaptive filtering and generalization under long-sequence, highly-correlated, or non-stationary conditions. The empirical improvement at negligible computational or parameter cost suggests further applicability in domains typified by long-range dependencies and dynamic signal correlation.

Liquid-S4 extends S4’s diagonal-plus-low-rank parametrization by input-dependent kernel construction, resulting in a robust and scalable sequence modeling framework with consistent empirical benefits across modalities (Hasani et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

Liquid Structural State-Space Models (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Liquid-S4.