Liquid-S4: Adaptive Sequence Modeling
- Liquid-S4 is a state-space model that integrates a linear liquid time-constant ODE with a diagonal-plus-low-rank (DPLR) decomposition for efficient long-range sequential learning.
- The model employs input-dependent state transitions and multi-order liquid kernels to enhance adaptive representation and encode sequence correlations.
- Empirical evaluations show Liquid-S4 consistently outperforms S4 on benchmarks like Long-Range Arena and audio classification while reducing model parameters.
Liquid-S4 is a structural state-space model (SSM) designed for high-performance representation learning on long-range sequential data. It is constructed by integrating a linear liquid time-constant (LTC) ODE with a diagonal plus low-rank decomposition (DPLR) of state transition matrices, leveraging methodology from S4 (Structured State Spaces). Liquid-S4 is characterized by its input-dependent state transitions and kernel structure that encodes similarities and correlations within the sequence, enabling state-of-the-art results across image, text, audio, and medical signal domains (Hasani et al., 2022).
1. Mathematical Structure of Liquid-S4
Liquid-S4 is founded upon a continuous-time linearized LTC ODE, specified for an -dimensional hidden state , scalar input , and scalar output : where is the transition matrix, is both the bias and input modulation, and computes the readout.
Discretization via the bilinear (trapezoidal) rule—using step size —yields: The resulting discrete-time recurrence for each timestep 0: 1
2. Diagonal-Plus-Low-Rank Parameterization
The state transition matrix 2 is initialized using the HiPPO-LegS matrix (scaled Legendre measure) and rewritten in the Normal + Low-Rank (NPLR) decomposition: 3 with 4 unitary, 5 diagonal (initially left half-plane for stability), 6 low-rank (7 in practice), and 8, 9.
This DPLR form ensures computational efficiency, numerical stability, and the flexibility to learn long temporal dependencies. Input dependence is injected strictly via the 0 modulation.
3. Kernel Construction and Correlation Structure
Liquid-S4 introduces a kernel framework comprising both linear and higher-order correlation terms. For 1-independent recurrence, the convolutional kernel is: 2 Efficient computation via FFT leverages the Cauchy kernel pipeline, offering nearly linear-time complexity in sequence length.
The LTC recurrence introduces additional “liquid” kernels representing auto-correlations: 3 For each chosen order 4: 5 Or in PB mode: 6 Liquid-S4 concatenates the 7 S4 kernel with higher-order liquid kernels (8), computed in 9 time.
4. Model Architecture and Training Protocols
Liquid-S4 constructs deep sequence models by stacking multiple state space blocks, each comprising the Liquid-S4 kernel convolution, residual connections, feed-forward layers, and pointwise nonlinearities (GeLU, ReLU). The architecture is causal—each output 0 depends exclusively on past and current inputs 1.
Training employs cross-entropy loss for classification or 2 loss for regression. Regularization strategies include weight decay on learnable parameters 3, dropout in feed-forward submodules, and gradient penalties on eigenvalues to enhance stability. Optimization is via Adam/AdamW with learning rates tuned per task, typically smaller for Liquid-S4 compared to S4.
The kernel’s input dependence—modulating the recurrence at each timestep—enables Liquid-S4 to re-weight historical inputs adaptively, yielding improved generalization in non-stationary and highly correlated data regimes.
5. Empirical Evaluation and Performance Benchmarks
Liquid-S4 establishes new state-of-the-art results on multiple long-range sequence modeling benchmarks:
Long-Range Arena (1K–16K sequences)
| Method | ListOps | IMDB | AAN | CIFAR | PathFinder | Path-X | Average |
|---|---|---|---|---|---|---|---|
| S4-LegS (reprod.) | 59.60 | 86.82 | 90.90 | 88.65 | 94.20 | 96.35 | 86.09 |
| Liquid-S4 (PB, 4) | 62.75 | 89.02 | 91.20 | 89.50 | 94.80 | 96.66 | 87.32 |
Raw Speech Commands (35 classes, 16 kHz)
| Model | Params | Accuracy |
|---|---|---|
| S4-LegS | 307 K | 96.08% |
| S4D-Lin | 306 K | 96.25% |
| Liquid-S4 | 224 K | 96.78% |
BIDMC Vital Signs (RMSE)
| Model | HR | RR | SpO₂ |
|---|---|---|---|
| S4-LegS | 0.332 | 0.247 | 0.090 |
| S4D-Inv | 0.373 | 0.254 | 0.110 |
| Liquid-S4 (best 5) | 0.303 | 0.158 | 0.066 |
These results indicate consistent 1–3% gains over S4, with Liquid-S4 frequently achieving higher accuracy and reduced parameterization (e.g., 30% fewer parameters on Speech Commands).
6. Implementation Details and Hyperparameter Choices
Standard settings are:
- Number of state space blocks (“depth”): 4–9, task-dependent
- Hidden units per block (6): 128–512
- State dimension (7): 7 (ListOps/IMDB) to 512 (CIFAR)
- Low-rank factor (8): 9
- Liquid kernel order (0): typically 2–4, default at 3
- FFT-based convolution via PyKeops for memory efficiency
- Forward pass complexity: 1; memory similar to S4 with minor overhead for correlation kernels
7. Interpretations and Relevance
Liquid-S4’s lightweight kernel modulation introduces input-correlation structure into SSMs, directly encoding similarities of input samples during both training and inference. This property facilitates data-dependent adaptive filtering and generalization under long-sequence, highly-correlated, or non-stationary conditions. The empirical improvement at negligible computational or parameter cost suggests further applicability in domains typified by long-range dependencies and dynamic signal correlation.
Liquid-S4 extends S4’s diagonal-plus-low-rank parametrization by input-dependent kernel construction, resulting in a robust and scalable sequence modeling framework with consistent empirical benefits across modalities (Hasani et al., 2022).