SINDYc: Sparse Identification with Control

Updated 18 February 2026

SINDYc is a data-driven framework that extends sparse identification to nonlinear systems under external inputs and control actions.
It employs libraries of nonlinear candidate functions to recover parsimonious governing equations, supporting control, trajectory optimization, and feedback integration.
The method enhances interpretability and computational efficiency in industrial, PDE, and networked applications while maintaining high prediction accuracy.

Sparse Identification of Nonlinear Dynamics with Control (SINDYc) generalizes the Sparse Identification of Nonlinear Dynamics (SINDy) framework to data-driven modeling of nonlinear dynamical systems under external inputs, control actions, and exogenous parameters. By extending sparse regression to libraries of nonlinear candidate functions of both states and controls, SINDYc enables the parsimonious recovery of governing equations that capture forced, actuated, or feedback-modulated phenomena with a high degree of interpretability and robustness. This paradigm directly supports model design for control, trajectory optimization, and the integration into modern feedback architectures, including model predictive control (MPC) and reinforcement learning-based closed-loop control of both finite- and infinite-dimensional systems.

1. Mathematical Formulation

The SINDYc formulation targets a controlled nonlinear dynamical system of the form

$\dot{x}(t) = f(x(t), u(t)), \qquad x \in \mathbb{R}^n,\, u \in \mathbb{R}^m$

or, in discrete time,

$x_{k+1} = f_d(x_k, u_k)$

To identify $f$ (or $f_d$ ), one constructs a “library” $\Theta(x,u)$ of candidate nonlinear functions—e.g., monomials, trigonometric terms, cross-multiplied state-control terms—each evaluated at measured data points. A typical library is: $\Theta(x,u) = [\,1,\ x_1,\dots,x_n,\ u_1,\dots,u_m,\ x_i x_j,\ x_i u_j,\ u_j^2,\ \sin(x_i),\ \cdots\,]$ The derivative or difference data $\dot X$ , $X^+$ is then modeled as

$\dot X \approx \Xi\,\Theta(X, U)$

where $\Xi \in \mathbb{R}^{n \times p}$ is a sparse coefficient matrix. The core regression problem for each state component is: $x_{k+1} = f_d(x_k, u_k)$ 0 using either $x_{k+1} = f_d(x_k, u_k)$ 1 regularization (LASSO), or sequential thresholded least squares (STLSQ) for sparsity enforcement (Brunton et al., 2016, Fasel et al., 2021, Vancayseele et al., 27 Feb 2025, Kaiser et al., 2017, Yahagi et al., 7 Mar 2025).

For feedback or exogenous control laws $x_{k+1} = f_d(x_k, u_k)$ 2, identifiability is preserved by actively perturbing $x_{k+1} = f_d(x_k, u_k)$ 3 or decoupling its dependencies in the library construction (Brunton et al., 2016).

2. Algorithmic Workflow

The canonical workflow for SINDYc proceeds through the following major steps (Fasel et al., 2021, Kaiser et al., 2017, Vancayseele et al., 27 Feb 2025, Yahagi et al., 7 Mar 2025):

Data Collection and Preprocessing: Acquire snapshot sequences of $x_{k+1} = f_d(x_k, u_k)$ 4 and $x_{k+1} = f_d(x_k, u_k)$ 5, estimate $x_{k+1} = f_d(x_k, u_k)$ 6 or $x_{k+1} = f_d(x_k, u_k)$ 7 (using robust numerical differentiation or differences). Optionally, normalize or non-dimensionalize data.
Library Construction: Assemble the basis $x_{k+1} = f_d(x_k, u_k)$ 8 across all data. Typical choices involve polynomial, trigonometric, and cross-term expansions up to some prescribed degree. Domain knowledge can further inform the inclusion of physically relevant features.
Sparse Regression: For each dynamical component, solve the $x_{k+1} = f_d(x_k, u_k)$ 9-regularized or STLSQ regression to obtain sparse coefficients $f$ 0. Hyperparameters ( $f$ 1 for LASSO; threshold for STLSQ) are chosen by cross-validation, Pareto-front analysis, or information-theoretic criteria.
Model Selection and Validation: Validate the identified model on held-out data, systematically varying the library and sparsity threshold. For robustness, ensemble strategies such as bagging, elite gathering, and clustering can be employed to filter models achieving high multi-step $f$ 2 (e.g., $f$ 3 over long horizons) before averaging and final selection (Yahagi et al., 7 Mar 2025).
Deployment: The discovered model is embedded in an estimation, control, or simulation loop, supporting further online adaptation or closed-loop optimization.

3. Library Design and Regression Strategies

A SINDYc library must span the anticipated nonlinearities and couplings between states and controls relevant to the system class:

Polynomial Libraries: All monomials in $f$ 4 and $f$ 5 up to degree $f$ 6 (e.g., $f$ 7 for cubic models) (Kaiser et al., 2017, Brunton et al., 2016).
Trigonometric or Rational Terms: Inclusion of $f$ 8, etc., is dictated by the physical context (oscillatory or non-polynomial effects).
Composite/Hybrid Libraries: For high-dimensional or PDE states, latent space embeddings—e.g., from autoencoders—enable SINDYc on reduced-order coordinates, jointly optimizing the SINDYc coefficients and the encoder/decoder weights (Wolf et al., 2024).
Bagged and Elite Models: To address multicollinearity and overfitting in very large libraries, random sublibrary bagging and “elite” filtering by multi-step prediction skill is essential for industrial or noisy systems (Yahagi et al., 7 Mar 2025).
Weak Formulation: For noisy measurements or PDEs, weak formulations project the regression onto test functions, bypassing the need for direct differentiation and increasing noise robustness (Nicolaou et al., 2023).

Solvers include standard LASSO, sequential thresholding, or SR3; the choice is largely pragmatic and depends on problem size, speed, and noise sensitivity (Vancayseele et al., 27 Feb 2025).

4. Integration with Control Architectures

SINDYc is architected for direct integration into modern control strategies:

Model Predictive Control (MPC): The SINDYc-discovered model defines the state transition or flow map. At each time step, the receding-horizon optimal control problem is solved using the learned $f$ 9, subject to constraints and optimality criteria (Kaiser et al., 2017, Fasel et al., 2021).
Feed-forward and Bifurcation Control: By leveraging SINDYc regression on reduced-order models, bifurcation and fixed-point analysis is tractable, enabling computation of constant or switching control inputs to manipulate the global attractor structure (Morrison et al., 2020).
Reinforcement Learning (RL) Acceleration: SINDYc serves as a model-based planning surrogate (e.g., in Dyna-style RL loops), generating synthetic rollouts for sample-efficient TD3, PPO, or actor-critic policy updates. This hybridization has been shown to reduce real-environment data requirements by up to 10x in PDE control and low-dimensional physical systems (Abdelsalam et al., 24 Dec 2025, Wolf et al., 2024).
Closed-loop Embedding for Industrial Systems: For complex constrained systems (e.g., airpath control of diesel engines, induction motors), SINDYc models can be embedded as low-latency update rules within real-time control software (Vancayseele et al., 27 Feb 2025, Yahagi et al., 7 Mar 2025).

5. Quantitative Performance, Validation, and Limitations

SINDYc models achieve high accuracy and superior data efficiency when compared to both black-box neural (NN) surrogates and classical linear DMDc models, particularly in the low-data regime or under moderate noise (Kaiser et al., 2017). Relative coefficient errors are typically $f_d$ 0 with SNR $f_d$ 1 dB and parameter support recovery is exact in noiseless settings (Brunton et al., 2016). For stiff or partially observed systems—with strong nonlinear or cross-coupling effects—SINDYc outperforms DMDc (which frequently extrapolates poorly), and is computationally more efficient and more interpretable than deep neural models (Kaiser et al., 2017, Vancayseele et al., 27 Feb 2025).

Enhancements relying on bagging, R² elite filtering, and clustering yield robust multi-step predictions ( $f_d$ 2) at up to 20% additive measurement noise (Yahagi et al., 7 Mar 2025). Limitations include sensitivity to library size and structure (overfitting or multicollinearity for excessively large or uninformative libraries), and possible loss of physical interpretability if models are selected by averaging across highly heterogeneous clusters.

SINDYc model performance summary (extracted metric comparisons from (Kaiser et al., 2017, Yahagi et al., 7 Mar 2025)):

Criterion	DMDc	SINDYc	Neural Net
Data Efficiency	Excellent	High	Poor
Nonlinear Expressivity	Poor	High	High
Noise Robustness	Fair	Excellent	Fair
Computational Speed	High	High	Low
Interpretability	Good	Excellent	Poor

6. Extensions and Representative Applications

SINDYc has been extended and rigorously validated for:

Feedback and Exogenous Input Modeling: SINDYc can accommodate time-varying, feedback, or exogenous disturbances via library expansion (Brunton et al., 2016, Yahagi et al., 7 Mar 2025).
High-dimensional/Networked Systems: Dimensionality reduction (e.g., POD, autoencoders) prior to SINDYc regression provides access to interpretable governing equations for very large network or PDE systems, with back-projection enabling practical control synthesis (Morrison et al., 2020, Wolf et al., 2024).
PDE and Spatiotemporal Control (SINDyCP, Latent SINDYc): For parameterized pattern-forming PDEs, SINDyCP augments the library with tensorized state-feature/parameter terms, and weak formulations ensure noise robustness (Nicolaou et al., 2023).
Industrial and Real-Time Applications: Demonstrations encompass flight-control, induction motor torque and force estimation (MAE $f_d$ 3 Nm, 65% improvement over classical models (Vancayseele et al., 27 Feb 2025)), and diesel engine airpath modeling under closed-loop noise disturbance (Yahagi et al., 7 Mar 2025).
Synthetic Model-Based RL for Nonlinear Systems: In Dyna-style RL (TD3-SINDYc), the approach halves learning time and improves control accuracy in bi-rotor systems and Navier-Stokes PDE controllers (Abdelsalam et al., 24 Dec 2025, Wolf et al., 2024).

7. Discussion and Contextualization

SINDYc occupies a central position among modern system identification techniques by combining the interpretability and computational efficiency of sparse-regression with the capacity to capture nonlinear, input-affine, and feedback-modulated dynamics. Its link to DMDc is direct in the case of linear-in-state libraries; its connection to Koopman operator theory is formalized when observables in the library “lift” the dynamics linearly in an augmented space (Brunton et al., 2016, Fasel et al., 2021). Emerging trends include:

Enhanced ensemble and elite strategies to guarantee stability and prediction skill in large or noisy candidate sets (Yahagi et al., 7 Mar 2025).
Weak form and latent-space SINDYc for PDE and distributed-parameter systems where robust differentiation and dimensionality reduction are essential (Nicolaou et al., 2023, Wolf et al., 2024).
Real-time and adaptive updating for tracking abrupt system changes or embedding within receding-horizon architectures (Kaiser et al., 2017, Fasel et al., 2021).
Model-extrapolation capability for parametric regime changes and bifurcation analysis, supported by explicit feature-parameter library tensorization (SINDyCP) (Nicolaou et al., 2023).

The method’s continued development addresses challenges in ill-conditioning, high-dimensionality, and the integration with neural-network-inspired surrogate modeling for dynamics inaccessible to sparse polynomial expansions. As a data-driven model discovery tool for control, SINDYc is recognized for interpretable, fast, and reliable closed-loop deployment across diverse scientific, engineering, and industrial domains.