Discovery of interpretable structural model errors by combining Bayesian sparse regression and data assimilation: A chaotic Kuramoto-Sivashinsky test case

Published 1 Oct 2021 in physics.comp-ph, cs.NA, math.NA, physics.flu-dyn, stat.CO, and stat.ML | (2110.00546v2)

Abstract: Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, there is increasing interest in reducing model errors, particularly by leveraging the rapidly growing observational data to understand their physics and sources. Here, we introduce a framework named MEDIDA: Model Error Discovery with Interpretability and Data Assimilation. MEDIDA only requires a working numerical solver of the model and a small number of noise-free or noisy sporadic observations of the system. In MEDIDA, first the model error is estimated from differences between the observed states and model-predicted states (the latter are obtained from a number of one-time-step numerical integrations from the previous observed states). If observations are noisy, a data assimilation (DA) technique such as ensemble Kalman filter (EnKF) is employed to provide the analysis state of the system, which is then used to estimate the model error. Finally, an equation-discovery technique, here the relevance vector machine (RVM), a sparsity-promoting Bayesian method, is used to identify an interpretable, parsimonious, and closed-form representation of the model error. Using the chaotic Kuramoto-Sivashinsky (KS) system as the test case, we demonstrate the excellent performance of MEDIDA in discovering different types of structural/parametric model errors, representing different types of missing physics, using noise-free and noisy observations.

Abstract PDF Upgrade to Chat

Citations (14)

View on Semantic Scholar

Summary

The paper introduces MEDIDA, a framework that uses Bayesian sparse regression to identify structural errors in chaotic dynamic models.
It integrates data assimilation via the Ensemble Kalman Filter to mitigate observational noise and enhance model state estimation.
The method achieves up to 99% error reduction in noise-free cases and precise corrections in noisy scenarios, highlighting its practical utility.

Discovering Interpretable Structural Model Errors with Bayesian Sparse Regression

The paper presents MEDIDA, a novel framework combining Bayesian sparse regression and data assimilation (DA) to identify structural model errors. The focus is on complex nonlinear systems, exemplified by the chaotic Kuramoto-Sivashinsky (KS) equation. The paper systematically explores how MEDIDA can efficiently uncover interpretable structural errors in dynamical models, leveraging limited observational data.

Framework Overview

Methodology

MEDIDA consists of three main steps. The first step involves collecting sporadic observations and performing short-term numerical integrations of the model under study. The second step uses Bayesian Sparse Regression, specifically Relevance Vector Machine (RVM) techniques, to identify structural errors by comparing model predictions with observations. In the third step, if the observations are noisy, a data assimilation technique such as the Ensemble Kalman Filter (EnKF) is employed to mitigate observation noise and improve model error estimation.

Figure 1: Improvement in the performance of MEDIDA as the ensemble size, $N$ , is increased ( $n=100$ ). $N/M=0$ corresponds to no DA.

Bayesian Sparse Regression

The employment of RVM provides a parsimonious representation of model error, promoting sparsity in the learned correction terms. This approach enhances interpretability, a critical aspect when dealing with models requiring transparency and generalization capabilities. The RVM processes the regression problem by determining relevant bases for describing the model discrepancy.

Data Assimilation Integration

Incorporating DA allows MEDIDA to effectively handle noisy observation data. The EnKF smooths out observational noise, providing a more accurate representation of the model's state for subsequent regression analysis. The minimalist integration required in EnKF, typically only over one time step, ensures computational efficiency even with large ensemble sizes.

Practical Application and Results

The framework's efficacy is demonstrated using the KS equation, a canonical example in studying spatiotemporal chaos. The study compares the performance of MEDIDA-corrected models against baseline models with known structural errors, both under noise-free and noisy conditions.

Noise-Free Observations: Even with sparse data (as few as 10 sample points), MEDIDA effectively discovers the underlying structural errors and corrects the models with minimal deviation from the true dynamics, achieving error reductions upwards of 99% in most scenarios.
Noisy Observations: By deploying EnKF, MEDIDA achieves significant error correction even with high observational noise levels. When ensemble sizes are adequately large relative to state dimensionality, model errors are corrected with deviations as low as 0.5% from the reference.

Table in the paper systematically lists the performance of corrected models across various test cases, highlighting MEDIDA's ability to adapt to different structural error scenarios, including parametric and structural uncertainties.

Implications and Future Work

MEDIDA highlights a scalable approach for enhancing model fidelity in nonlinear systems, particularly useful in fields like climate modeling where accurate representation of system dynamics is crucial. This methodology bridges physical understanding and data-driven approaches, offering a structured path for augmenting incomplete models with interpretable corrections.

Future works can aim at automating library selection for regression, broadening the range of systems applicable, and enhancing the computational efficiency of EnKF for very high-dimensional problems. Adaptation to online learning paradigms could further extend MEDIDA's applicability to real-time systems. The study underscores the potential for this integrated framework to serve as a cornerstone methodology for model error correction in complex dynamical systems.