Denoising Hamiltonian Network for Physical Reasoning

Published 10 Mar 2025 in cs.LG and cs.AI | (2503.07596v1)

Abstract: Machine learning frameworks for physical problems must capture and enforce physical constraints that preserve the structure of dynamical systems. Many existing approaches achieve this by integrating physical operators into neural networks. While these methods offer theoretical guarantees, they face two key limitations: (i) they primarily model local relations between adjacent time steps, overlooking longer-range or higher-level physical interactions, and (ii) they focus on forward simulation while neglecting broader physical reasoning tasks. We propose the Denoising Hamiltonian Network (DHN), a novel framework that generalizes Hamiltonian mechanics operators into more flexible neural operators. DHN captures non-local temporal relationships and mitigates numerical integration errors through a denoising mechanism. DHN also supports multi-system modeling with a global conditioning mechanism. We demonstrate its effectiveness and flexibility across three diverse physical reasoning tasks with distinct inputs and outputs.

Abstract PDF Upgrade to Chat

Summary

The paper introduces the Denoising Hamiltonian Network (DHN), which generalizes Hamiltonian neural operators using block-wise processing, a denoising mechanism, and global conditioning to enhance physical reasoning.
DHN effectively captures non-local temporal relationships through state blocks and mitigates numerical integration errors by iteratively refining predictions via its integrated denoising objective.
Evaluations show DHN outperforms baselines on physical reasoning tasks, including trajectory prediction and completion, inferring physical parameters, and interpolating sparse trajectories, demonstrating improved accuracy and generalization.

The Denoising Hamiltonian Network (DHN) generalizes Hamiltonian mechanics into neural operators, addressing limitations of existing methods that primarily model local temporal relations and focus on forward simulation. Here's a breakdown of its architecture, methodology, and evaluation:

1. Architecture and Methodology:

Generalization of Hamiltonian Mechanics: DHN extends Hamiltonian neural operators to capture non-local temporal relationships and mitigate numerical integration errors through a denoising mechanism. It supports multi-system modeling with global conditioning.
Block-Wise Discrete Hamiltonian: DHN uses state blocks, which are stacks of generalized coordinates ( $q$ ) and momenta ( $p$ ) concatenated along the time dimension. A block size $b$ and stride $s$ are introduced. This allows the network to capture broader temporal correlations while preserving the underlying Hamiltonian structure. Classical HNNs can be viewed as a special case where $b = 1$ and $s = 1$ .
Denoising Mechanism: Inspired by denoising diffusion models, DHN integrates a denoising objective to mitigate numerical integration errors. It refines predictions toward physically valid trajectories, enhancing stability and adapting to diverse noise conditions. It uses a masked modeling strategy where input states are perturbed with noise sampled at varying magnitudes. A sequence of increasing noise levels is used, and the network learns to recover physically meaningful states from corrupted observations. During inference, the unknown states are progressively denoised with a sequence of decreasing noise scales.
Global Conditioning: DHN employs global conditioning to facilitate multi-system modeling. A shared global latent code $z$ encodes system-specific properties (e.g., mass, pendulum length), enabling DHN to model heterogeneous physical systems under a unified framework while maintaining disentangled representations of underlying dynamics. An autodecoder framework is used, maintaining a learnable latent code $z$ for each trajectory.
Network Architecture: DHN uses a decoder-only transformer architecture. The inputs are stacks of $q$ and $p$ states along with the global latent code $z$ . Self-attention is applied to all input tokens, and the latent code serves as a query token for outputting the Hamiltonian value $H$ . Per-state noise scales are encoded and added to the positional embedding.

2. Capturing Non-Local Temporal Relationships:

DHN captures non-local temporal relationships by treating groups of system states as tokens in state blocks, allowing it to reason holistically about system dynamics rather than in isolated steps. By using block sizes $b > 1$ and strides $s > 1$ , the network observes the system at different scales, enabling it to incorporate information from more temporally distant states.

3. Mitigating Numerical Integration Errors:

DHN mitigates numerical integration errors through its denoising mechanism. The network learns to refine predictions iteratively by iteratively updating $q$ and $p$ using both forward and backward Hamiltonians ( $H+$ and $H-$ ). This optimization process is incorporated into the network, unifying the denoising update rules for state optimization at each time step and the Hamiltonian-modeled state relations across time steps.

4. Multi-System Modeling with Global Conditioning:

Global conditioning is achieved by using a shared global latent code $z$ that encodes system-specific properties. This latent code is used as a conditioning variable in the decoder-only transformer, allowing the network to model heterogeneous physical systems under a unified framework. The latent codes are learned using an autodecoder framework.

5. Physical Reasoning Tasks for Evaluation and Results:

DHN was evaluated on three physical reasoning tasks:

Trajectory Prediction and Completion (Forward Simulation): The model predicts future states given initial conditions or completes trajectories from partial observations.
- Results: DHN demonstrates more accurate state prediction with better energy conservation compared to HNN baselines. Smaller block sizes show stable energy conservation, while larger block sizes can cause energy fluctuations.
Inferring Physical Parameters from Partial Observations (Representation Learning): The model infers physical parameters (e.g., length ratio of a double pendulum) from partial observations using random masking.
- Results: DHN achieves a much lower MSE in predicting physical parameters compared to baseline networks, indicating that the learned latent codes capture meaningful physical properties. A block size of 4 was found to be the best temporal scale for inferring parameters in the double pendulum system.
Interpolating Sparse Trajectories via Progressive Super-Resolution: The model interpolates sparse trajectories using a progressive super-resolution approach.
- Results: DHN demonstrates strong generalization to trajectories with unseen initial states compared to a CNN-based implementation. DHN's physically constrained representations enable it to infer plausible intermediate states even under distribution shifts.

In summary, the DHN architecture leverages block-wise processing, a denoising mechanism, and global conditioning to enhance Hamiltonian Neural Networks. This enables the model to capture non-local temporal relationships, mitigate numerical integration errors, and effectively handle multi-system modeling, as validated across three physical reasoning tasks.

Markdown Report Issue