Distilling Governing Laws and Source Input for Dynamical Systems from Videos

Published 3 May 2022 in cs.CV, cs.AI, cs.LG, and physics.app-ph | (2205.01314v1)

Abstract: Distilling interpretable physical laws from videos has led to expanded interest in the computer vision community recently thanks to the advances in deep learning, but still remains a great challenge. This paper introduces an end-to-end unsupervised deep learning framework to uncover the explicit governing equations of dynamics presented by moving object(s), based on recorded videos. Instead in the pixel (spatial) coordinate system of image space, the physical law is modeled in a regressed underlying physical coordinate system where the physical states follow potential explicit governing equations. A numerical integrator-based sparse regression module is designed and serves as a physical constraint to the autoencoder and coordinate system regression, and, in the meanwhile, uncover the parsimonious closed-form governing equations from the learned physical states. Experiments on simulated dynamical scenes show that the proposed method is able to distill closed-form governing equations and simultaneously identify unknown excitation input for several dynamical systems recorded by videos, which fills in the gap in literature where no existing methods are available and applicable for solving this type of problem.

Abstract PDF Upgrade to Chat

Citations (5)

View on Semantic Scholar

Summary

The paper presents a deep learning framework that extracts explicit governing equations and identifies unknown system excitations directly from raw videos.
It employs an encoder-decoder network with physical coordinate regression and sparse regression to model and discover dynamic system laws.
Experimental evaluations demonstrate robustness to noise and improved accuracy over traditional trajectory-based approaches in dynamic law discovery.

Distilling Governing Laws and Source Input for Dynamical Systems from Videos

Introduction

The study presents an innovative approach to uncovering explicit governing equations for dynamical systems using raw video inputs without any prior physical law assumptions. This methodology leverages a deep learning framework to address the complexities involved in deriving data-driven models directly from video sequences. Earlier approaches mainly derived the laws from pre-extracted motion trajectories or required extensive prior knowledge on the system's physical constraints. The framework proposed here overcomes these limitations by simultaneously discovering governing equations and identifying unknown excitations in a single unsupervised learning task.

Methodology

Network Architecture

The solution integrates a multi-segment network architecture designed to regress the coordinate system and identify governing equations from high-dimensional video data. The architecture components include:

Encoder-Decoder Network: Utilized for mapping video frames into a latent space representing spatial coordinates of moving objects. This reduction facilitates compact representations necessary for physical dynamics characterization.
Physical Coordinate System Regression: This segment introduces a Cartesian transformation to remap spatial coordinates into physical states, where underlying physical laws are hypothesized to reside.
Physical Law Embedding and Discovery: Physical laws are modeled using a sparse regression approach. The equations describe system dynamics as a combination of candidate functions, each parameterized by coefficients optimized during training. This provides a mechanism for embedding and subsequently identifying governing equations and unknown system inputs.

Loss Functions and Training

The learning process involves a combination of multiple loss functions, including:

Reconstruction Loss ( $\mathcal{L}_{recon}$ ): Ensures accurate video frame synthesis.
Physical Derivative Loss ($\mathcal{L}_{\dot{\mathbf{x}_p}$): Aligns predicted physical states with derived dynamics.
Integration Loss ( $\mathcal{L}_{int}$ ): Validates forward predictions through a Runge-Kutta integration scheme.
Regularization Loss ( $\mathcal{L}_{reg}$ ): Encourages sparsity in dynamic equations by applying an $\ell_{0.5}$ norm on the coefficients.

The network is trained using a multi-phase process to optimize initial conditions, refine candidates through sequential thresholding, and stabilize training loss.

Experimental Evaluation

A series of experiments demonstrate the efficacy of the proposed framework across simulated dynamical systems, each varying in complexity.

Discovery Results

The system successfully derives governing laws for single and multi-object systems and identifies external excitations. Comparative analysis with baselines, like the Champion et al. method, highlights the superior capability of this framework in dealing with real video data rather than highly specific, functional representations. Scaling and translation adjustments ensure fidelity to original dynamics, even when raw predictions slightly deviate.

Figure 1: The studied dynamical systems excited by unknown inputs.

Robustness and Ablation Studies

Tests on noisy data demonstrate the noise resilience of the model, retaining accuracy in reconstructed dynamics and identified inputs, albeit with increased variability. Ablation studies further confirm the critical role of the coordinate transformation step, emphasizing its necessity for accurate physical law discovery.

Figure 2: Schematic architecture of the proposed framework for governing law discovery.

Baseline Comparisons

Baseline models incorporating conventional autoencoders or alternate regression strategies (Figure 3) struggle with distilling accurate physical laws, primarily due to their reliance on explicit pixel-coordinate dynamics or failure to accommodate unknown inputs.

Figure 3: Discovery results of the studied dynamical systems.

Conclusions

The methodological advances presented in this paper significantly enhance the capability to discover interpretable governing equations from video data. By integrating unsupervised deep learning frameworks with innovative regressive systems, the approach fills a significant gap in the domain of physics-driven video analysis. However, limitations persist, particularly in handling dynamic backgrounds or three-dimensional scenes, which suggest avenues for future research to build on the limitations and refine the approach.

This study sets precedence in the automated, data-driven discovery of governing laws, showing potential for broader applications in areas requiring intricate dynamics understanding from non-trivial observational data.