State Space Models: Fundamentals & Applications

Updated 3 February 2026

State space models (SSMs) are mathematical frameworks that model sequential data by capturing latent dynamics through process and observation equations.
They enable practical applications in forecasting, control theory, and deep learning using efficient inference techniques like Kalman filtering and variational methods.
Recent advances extend SSMs with structured, scalable architectures that enhance expressivity for modeling complex temporal dependencies in real-world data.

A state space model (SSM) is a mathematical framework for modeling sequential data, widely used in time-series analysis, signal processing, control theory, econometrics, and, more recently, large-scale sequence learning architectures. SSMs characterize the evolution of a latent state via a Markov process and relate this latent state to observed outputs through a possibly stochastic observation mechanism. This modeling approach enables explicit representation of temporal dependencies, separation of process and observation noise, flexible incorporation of domain structure, and—when combined with modern machine learning—integration into deep, scalable sequence models.

1. Mathematical Formulation and Core Architectures

An SSM is composed of two key components: the latent state evolution (the process equation) and the observation generation mechanism (the observation equation). In continuous time, the model is generally specified as: $\frac{d}{dt}h(t) = A\,h(t) + B\,x(t), \quad y(t) = C\,h(t)$ with $h(0) = 0$ for the zero-initialized case. In discrete time (e.g., via zero-order hold discretization),

$h_{k+1} = \bar{A}\,h_k + \bar{B}\,x_k, \quad y_k = \bar{C}\,h_k$

where $(A, B, C)$ (continuous) or $(\bar{A}, \bar{B}, \bar{C})$ (discrete) are matrices of appropriate shape encoding system dynamics and input/output mappings (Liu et al., 2024, Zhang et al., 2023).

In the general probabilistic setting, SSMs define a latent Markov chain,

$x_0 \sim p_\theta(x_0), \quad x_t \sim p_\theta(x_t | x_{t-1}), \quad y_t \sim p_\theta(y_t | x_t)$

and seek to model the joint density $p_\theta(x_{0:T}, y_{1:T})$ . In the classical linear Gaussian case, the process and observation noises are both Gaussian, facilitating closed-form recursive estimation through the Kalman filter (Auger-Méthé et al., 2020, Hargreaves et al., 29 May 2025). Nonlinear or non-Gaussian SSMs require approximate inference, typically via sequential Monte Carlo (particle filters) or variational techniques.

Extensions include models with structured, sparse, or modular state spaces (e.g., SlotSSM (Jiang et al., 2024)), graph-based dynamics (GG-SSM (Zubić et al., 2024)), continuous-time SSMs for irregular sampling (Mews et al., 2020), and deep SSM layers parameterized by neural networks (Lin et al., 2024, Gedon et al., 2020).

2. Expressivity, Temporal Logic, and Theoretical Scope

The expressive power of SSMs is sharply stratified by (i) the gating mechanism in the recurrence and (ii) the numerical precision available. When SSMs employ time-invariant or purely diagonal gates (as in S4, S5, or Mamba), with bounded-precision arithmetic, they characterize exactly the star-free fragment of regular languages corresponding to pure-past linear temporal logic (pLTL $_\mathsf{fe}$ ). If gates are input-dependent or precision is logarithmic in input length, SSMs can realize richer counting logics pLTL $_\mathsf{fe}$ [#], capturing counting properties and some non-regular languages (Alsmann et al., 27 Jan 2026). However, under fixed-width arithmetic, SSMs are not more expressive than hard-attention transformers, and cannot capture non-monotonic or counting patterns unless one specifically augments the architecture.

The structural form of the SSM (e.g., time-invariant, companion, or block-diagonal) directly maps to its temporal logic expressiveness and corresponding model class (Zhang et al., 2023, Zubić et al., 2024).

3. Generalization Bounds, Optimization, and Training Schemes

Recent developments in the generalization theory of SSMs provide data-dependent risk bounds, revealing how the interplay between model parameterization (i.e., the system’s memory kernel $\rho_\theta(s) = C e^{A s} B$ ) and the covariance structure of the input data critically determines generalization performance. For a sample of i.i.d. process realizations $h(0) = 0$ 0, the excess risk is bounded as

$h(0) = 0$ 1

where $h(0) = 0$ 2 quantifies the data-dependent, model-data memory interaction (Liu et al., 2024).

Two practical optimization implications arise:

Initialization Scaling Rule: Normalizing the output matrix $h(0) = 0$ 3 by a quantity derived from $h(0) = 0$ 4 and the data covariance so that the initial generalization complexity $h(0) = 0$ 5 is $h(0) = 0$ 6, ensuring robust output scales across diverse temporal patterns.
Generalization-based Regularization: Augmenting the training objective with a penalty proportional to $h(0) = 0$ 7, directly targeting generalization rather than parameter norm.

Empirical validation on synthetic and real sequence modeling (Long Range Arena) demonstrates that these procedures stabilize training and yield consistent improvements in test accuracy at minimal computational overhead (Liu et al., 2024).

4. Parameterization Choices and Model Classes

The state transition and input matrices may be parameterized in a diversity of ways, critically affecting the model’s representational capabilities:

Companion Matrix SSMs: Adopting the canonical companion form enables exact realization of discrete AR( $h(0) = 0$ 8), ARMA, and LTI systems. This approach, as in the SpaceTime architecture, is strictly more expressive for such tasks than SSMs restricted to continuous-time exponential or diagonal-plus-low-rank parameterizations (Zhang et al., 2023). The companion form enables efficient FFT-based inference with $h(0) = 0$ 9 complexity, supporting both open-loop and closed-loop (long-range) forecasting.
HiPPO, SaFARi, and Lag-Operator Frameworks: The HiPPO framework (and its generalization, SaFARi) shows how SSMs may be constructed to track rolling projections onto arbitrary basis functions, not limited to polynomials (Legendre, Fourier, wavelets) (Gu et al., 2022, Babaei et al., 13 May 2025, Tomonaga et al., 22 Dec 2025). The lag-operator formalism provides a direct geometric construction of discrete-time SSMs via inner products on lagged basis functions, circumventing the need for continuous-time modeling and discretization, and supporting modular basis and memory kernel design (Tomonaga et al., 22 Dec 2025).
Graph-Generating SSMs: By dynamically constructing minimum spanning trees based on feature similarity, GG-SSMs efficiently model non-local, adaptive dependencies in high-dimensional settings, lifting sequential scanning restrictions and achieving linear per-layer complexity (Zubić et al., 2024).

5. Practical Inference, Computational Aspects, and Implementations

Inference in SSMs is performed via a hierarchy of algorithms:

Kalman Filter and Smoother: For linear, Gaussian SSMs, the Kalman filter/smoother delivers exact recursive estimates, supporting both filtering (on-line estimation) and smoothing (retrospective state decoding) (Hargreaves et al., 29 May 2025).
Expectation-Maximization (EM): EM algorithms, including advanced variants like GraphEM, facilitate maximum a posteriori estimation of unknown transition matrices, accommodating structural and sparsity priors via convex-constrained M-steps implemented with consensus-based proximal splitting (Elvira et al., 2022).
Particle Filtering and PMCMC: For nonlinear and/or non-Gaussian SSMs, bootstrap and Rao-Blackwellized particle filters, as well as particle MCMC, provide approximate yet scalable inference and parameter learning (Hargreaves et al., 29 May 2025, Ryder et al., 2018).
Variational Inference and Deep SSMs: Recent deep SSM frameworks use autoregressive flows, variational autoencoders, doubly stochastic inference, and neural network–parameterized dynamics to support amortized, end-to-end learning, combining flexibility and uncertainty quantification with scalability to high-dimensional and long-range data (Lin et al., 2024, Gedon et al., 2020, Doerr et al., 2018).

Efficient large-scale SSM implementations leverage diagonalizability (S4, S5), FFT-based convolution, and GPU-friendly parallel scans. Modular Julia ecosystems (SSMProblems.jl, GeneralisedFilters.jl) and quantization-aware compute-in-memory hardware implementations (via memristive crossbar arrays) further expand the SSM computational paradigm (Hargreaves et al., 29 May 2025, Zhang et al., 17 Nov 2025).

6. Applications, Limitations, and Empirical Performance

SSMs are foundational across a spectrum of domains:

Forecasting and Signal Processing: SSM-based sequence models (e.g., S4, SpaceTime, GG-SSM) match or exceed state-of-the-art results in time series forecasting (Informer, Monash, LRA), speech and ECG classification, and event-based camera processing. Empirical studies show SSMs often train faster and generalize more robustly than Transformer or RNN-based architectures, particularly under distribution shift or frequency scaling (Zhang et al., 2023, Zubić et al., 2024).
Multimodal, Modular, and High-dimensional Data: Recent advances such as SlotSSM introduce modular state representations for object-centric and multi-entity sequence modeling, maintaining parallelizable training and scaling efficiently to multi-slot, long-context reasoning problems (Jiang et al., 2024).
Probabilistic Uncertainty and Control: Fully probabilistic and deep SSMs quantify predictive uncertainty critical for robust system identification, Bayesian filtering, and safety-critical control tasks (Doerr et al., 2018, Gedon et al., 2020).
Hardware-Efficient Sequence Processing: Hardware/software co-design enables real-time, event-driven SSM computation on RRAM/STM arrays, leveraging real-valued parameterizations and native device dynamics for extreme energy efficiency (Zhang et al., 17 Nov 2025).

Despite their versatility, SSMs—especially in high-noise or high-dimension regimes—can suffer from identifiability and estimation challenges (notably in cases where measurement noise dominates process noise), necessitating careful diagnostics, external calibration, or regularization (Auger-Méthé et al., 2015, Auger-Méthé et al., 2020). Advances in expressiveness (e.g., to count or address non-regular patterns) require either increased arithmetic precision or architectural augmentation beyond current diagonal-gated and fixed-width models (Alsmann et al., 27 Jan 2026).

7. Future Directions and Open Challenges

Ongoing research on SSMs targets several fronts:

Expressivity Augmentation: Investigating SSM architectures with richer gating or higher-precision arithmetic to transcend regular language expressiveness and enable direct counting or nested temporal logic.
Compositional, Graph, and Modular Designs: Systematic exploration of frame-agnostic and graph-structured SSMs (e.g., SaFARi, GG-SSM, SlotSSM) to enhance flexibility, scalability, and inductive bias alignment in complex domains (Babaei et al., 13 May 2025, Zubić et al., 2024, Jiang et al., 2024).
Efficient Training and Inference: Continued development of scalable, robust inference procedures for high-dimensional and long-range tasks, including end-to-end GPU implementations, parallel scan algorithms, and full exploitation of hardware acceleration (Zhang et al., 17 Nov 2025, Hargreaves et al., 29 May 2025).
Application Expansion: Extending SSM modeling to domains beyond vision and time series—such as language, multi-agent systems, and unsupervised structure learning—while ensuring computational and statistical reliability (Zubić et al., 2024, Jiang et al., 2024).
Uncertainty Quantification and Bayesian Learning: Integrating robust, scalable Bayesian and variational methods into deep SSMs for reliable uncertainty estimation in prediction and counterfactual inference (Lin et al., 2024, Gedon et al., 2020).

SSMs thus remain a dynamically evolving paradigm, uniting principled dynamical systems theory with the demands of modern large-scale machine learning.