Papers
Topics
Authors
Recent
Search
2000 character limit reached

Denoising Hamiltonian Network for Physical Reasoning

Published 10 Mar 2025 in cs.LG and cs.AI | (2503.07596v1)

Abstract: Machine learning frameworks for physical problems must capture and enforce physical constraints that preserve the structure of dynamical systems. Many existing approaches achieve this by integrating physical operators into neural networks. While these methods offer theoretical guarantees, they face two key limitations: (i) they primarily model local relations between adjacent time steps, overlooking longer-range or higher-level physical interactions, and (ii) they focus on forward simulation while neglecting broader physical reasoning tasks. We propose the Denoising Hamiltonian Network (DHN), a novel framework that generalizes Hamiltonian mechanics operators into more flexible neural operators. DHN captures non-local temporal relationships and mitigates numerical integration errors through a denoising mechanism. DHN also supports multi-system modeling with a global conditioning mechanism. We demonstrate its effectiveness and flexibility across three diverse physical reasoning tasks with distinct inputs and outputs.

Summary

  • The paper introduces the Denoising Hamiltonian Network (DHN), which generalizes Hamiltonian neural operators using block-wise processing, a denoising mechanism, and global conditioning to enhance physical reasoning.
  • DHN effectively captures non-local temporal relationships through state blocks and mitigates numerical integration errors by iteratively refining predictions via its integrated denoising objective.
  • Evaluations show DHN outperforms baselines on physical reasoning tasks, including trajectory prediction and completion, inferring physical parameters, and interpolating sparse trajectories, demonstrating improved accuracy and generalization.

The Denoising Hamiltonian Network (DHN) generalizes Hamiltonian mechanics into neural operators, addressing limitations of existing methods that primarily model local temporal relations and focus on forward simulation. Here's a breakdown of its architecture, methodology, and evaluation:

1. Architecture and Methodology:

  • Generalization of Hamiltonian Mechanics: DHN extends Hamiltonian neural operators to capture non-local temporal relationships and mitigate numerical integration errors through a denoising mechanism. It supports multi-system modeling with global conditioning.
  • Block-Wise Discrete Hamiltonian: DHN uses state blocks, which are stacks of generalized coordinates (qq) and momenta (pp) concatenated along the time dimension. A block size bb and stride ss are introduced. This allows the network to capture broader temporal correlations while preserving the underlying Hamiltonian structure. Classical HNNs can be viewed as a special case where b=1b = 1 and s=1s = 1.
  • Denoising Mechanism: Inspired by denoising diffusion models, DHN integrates a denoising objective to mitigate numerical integration errors. It refines predictions toward physically valid trajectories, enhancing stability and adapting to diverse noise conditions. It uses a masked modeling strategy where input states are perturbed with noise sampled at varying magnitudes. A sequence of increasing noise levels is used, and the network learns to recover physically meaningful states from corrupted observations. During inference, the unknown states are progressively denoised with a sequence of decreasing noise scales.
  • Global Conditioning: DHN employs global conditioning to facilitate multi-system modeling. A shared global latent code zz encodes system-specific properties (e.g., mass, pendulum length), enabling DHN to model heterogeneous physical systems under a unified framework while maintaining disentangled representations of underlying dynamics. An autodecoder framework is used, maintaining a learnable latent code zz for each trajectory.
  • Network Architecture: DHN uses a decoder-only transformer architecture. The inputs are stacks of qq and pp states along with the global latent code zz. Self-attention is applied to all input tokens, and the latent code serves as a query token for outputting the Hamiltonian value HH. Per-state noise scales are encoded and added to the positional embedding.

2. Capturing Non-Local Temporal Relationships:

DHN captures non-local temporal relationships by treating groups of system states as tokens in state blocks, allowing it to reason holistically about system dynamics rather than in isolated steps. By using block sizes b>1b > 1 and strides s>1s > 1, the network observes the system at different scales, enabling it to incorporate information from more temporally distant states.

3. Mitigating Numerical Integration Errors:

DHN mitigates numerical integration errors through its denoising mechanism. The network learns to refine predictions iteratively by iteratively updating qq and pp using both forward and backward Hamiltonians (H+H+ and H−H-). This optimization process is incorporated into the network, unifying the denoising update rules for state optimization at each time step and the Hamiltonian-modeled state relations across time steps.

4. Multi-System Modeling with Global Conditioning:

Global conditioning is achieved by using a shared global latent code zz that encodes system-specific properties. This latent code is used as a conditioning variable in the decoder-only transformer, allowing the network to model heterogeneous physical systems under a unified framework. The latent codes are learned using an autodecoder framework.

5. Physical Reasoning Tasks for Evaluation and Results:

DHN was evaluated on three physical reasoning tasks:

  • Trajectory Prediction and Completion (Forward Simulation): The model predicts future states given initial conditions or completes trajectories from partial observations.
    • Results: DHN demonstrates more accurate state prediction with better energy conservation compared to HNN baselines. Smaller block sizes show stable energy conservation, while larger block sizes can cause energy fluctuations.
  • Inferring Physical Parameters from Partial Observations (Representation Learning): The model infers physical parameters (e.g., length ratio of a double pendulum) from partial observations using random masking.
    • Results: DHN achieves a much lower MSE in predicting physical parameters compared to baseline networks, indicating that the learned latent codes capture meaningful physical properties. A block size of 4 was found to be the best temporal scale for inferring parameters in the double pendulum system.
  • Interpolating Sparse Trajectories via Progressive Super-Resolution: The model interpolates sparse trajectories using a progressive super-resolution approach.
    • Results: DHN demonstrates strong generalization to trajectories with unseen initial states compared to a CNN-based implementation. DHN's physically constrained representations enable it to infer plausible intermediate states even under distribution shifts.

In summary, the DHN architecture leverages block-wise processing, a denoising mechanism, and global conditioning to enhance Hamiltonian Neural Networks. This enables the model to capture non-local temporal relationships, mitigate numerical integration errors, and effectively handle multi-system modeling, as validated across three physical reasoning tasks.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 279 likes about this paper.