Papers
Topics
Authors
Recent
Search
2000 character limit reached

POBench-PDE Benchmark Suite

Updated 24 January 2026
  • POBench-PDE is a standardized family of benchmarks designed to test PDE solvers and scientific machine learning algorithms across diverse models.
  • It provides reproducible datasets, unified APIs, and detailed evaluation metrics for tasks such as operator learning, Bayesian inversion, mesh quality analysis, and dynamic poroelasticity.
  • The framework enables quantitative comparisons with high-fidelity ground-truth solutions and rigorous diagnostic protocols for both forward and inverse problems.

POBench-PDE denotes a family of systematically designed benchmarks for Partial Differential Equations (PDEs) that facilitate evaluation and comparison of algorithms in scientific machine learning, uncertainty quantification, mesh geometry, and numerical PDE solvers. Through its various forms—spanning data-driven operator learning, Bayesian inverse problems, mesh quality analysis for polytopal elements, handling partial observations, and dynamic poroelasticity—it provides reproducible, extensible platforms for standardized algorithmic assessment.

1. Scope and Rationale

The core objective of POBench-PDE is to address the need for rigorous, widely-adopted PDE benchmarks supporting diverse scientific machine learning and computational PDE research. These benchmarks feature:

  • Canonical and real-world PDE models (time-dependent, steady, 1D–3D, linear/nonlinear, scalar/vector, forward/inverse)
  • Large, off-the-shelf datasets with code for further customizable generation and parameter variation
  • Unified APIs for data access, model evaluation, and extension
  • Reference solutions and high-fidelity ground-truth statistics for quantitative, like-for-like algorithm comparison

Specific implementations include:

2. Benchmark Architectures and PDE Catalogs

Data-Driven Operator Learning Benchmarks

PDEBench includes eleven canonical and application-oriented PDEs characterized by varied spatial dimension ($1$D–$3$D), type, and solution complexity (Takamoto et al., 2022):

  • 1D: Advection, Burgers’, Diffusion–Reaction, Diffusion–Sorption
  • 2D: Diffusion–Reaction (FitzHugh–Nagumo), Darcy flow, Incompressible/Compressible Navier–Stokes, Shallow water
  • 3D: Compressible Navier–Stokes

Each PDE is precisely specified by governing equations, initial/boundary conditions, and randomized parameters. For example, the 1D advection equation is

tu(t,x)+βxu(t,x)=0\partial_t u(t,x) + \beta \,\partial_x u(t,x) = 0

with periodic boundary conditions and randomly sampled initial conditions constructed as sums of sines.

Bayesian Inversion Benchmark (MCMC)

The problem is coefficient identification in the Poisson equation on Ω=(0,1)2\Omega = (0,1)^2: [a(x)u(x)]=f(x),uΩ=0-\nabla\cdot[a(x)\nabla u(x)] = f(x), \quad u|_{\partial\Omega} = 0 where a(x)a(x) is piecewise constant, parameterized by θR64\theta\in\mathbb{R}^{64} over an 8×88\times 8 grid. Observables are u(x)u(x) at $169$ points with i.i.d. Gaussian noise. The task is to reconstruct θ\theta from data using MCMC, with precisely specified priors and acceptance ratios (Aristoff et al., 2021).

Mesh Quality and Solver Performance for PEM

POBench-PDE analyzes the effect of polygonal mesh geometry on solver conditioning and approximation:

  • Eight parametric polygon families, each systematically degenerating in shape
  • Twelve per-cell quality metrics: kernel-area ratio (KAR), minimum angle (MA), circle ratio (CR), edge ratio (ER), perimeter-area ratio (PAR), etc.
  • Solver metrics: relative LL_\infty, L2L_2, energy-norm errors; condition number; empirical convergence rates
  • Statistical Pearson correlations between shape descriptors and solver performance (Attene et al., 2019)

Partial-Observation Operator Learning

POBench-PDE (partial observation) assesses neural operator performance under missing data regimes for:

  • 2D incompressible Navier–Stokes turbulence
  • Reaction–diffusion (Gray–Scott system)
  • Real-world climate fields (ERA5) Diverse missingness types (point- and patch-wise), sparsity up to 75%75\%, and a unified evaluation protocol (Hou et al., 22 Jan 2026).

Dynamic Poroelasticity (Biot Equations)

A coupled hyperbolic–parabolic system on a 2D L-shaped domain: ρt2u(Cϵ(u))+αp=ρf c0tp+α(tu)(Kp)=g\begin{align*} \rho\,\partial_t^2 u - \nabla\cdot(C\epsilon(u)) + \alpha\nabla p &= \rho f \ c_0\,\partial_t p + \alpha\nabla\cdot(\partial_t u) - \nabla\cdot(K\nabla p) &= g \end{align*} with specific Dirichlet/Neumann/slip boundary regimes. Reference goal quantities are line integrals of uu and pp on predefined subdomains (Anselmann et al., 2023).

3. Dataset Generation, Formats, and Task Protocols

Datasets are distributed in HDF5 format, structured as arrays: (Nsamples×Ntime×X×Y×Z×V)(N_\text{samples} \times N_\text{time} \times X \times Y \times Z \times V) where VV is the number of physical fields. Sample counts per PDE and resolutions are problem-specific (e.g., 1D advection: 10410^4 samples, T=200500T=200-500, X=1024X=1024).

Initial conditions and PDE parameters are randomly sampled according to specified ranges or distributions. Metadata (PDE type, parameter values, BC types) is encoded as YAML attributes. Data splits for training/evaluation are user-configurable, with a typical 90/10 train/test division (Takamoto et al., 2022).

For Bayesian benchmarks, the full posterior is archived based on 2×10112\times 10^{11} MCMC samples, ensuring statistical reference for new algorithms (Aristoff et al., 2021).

PEM mesh benchmarks provide C++ and MATLAB code for generating, measuring, and solving on polygonal mesh families, including full reproducibility for all experiments (Attene et al., 2019).

4. APIs, Extension Points, and Reproducibility

PDEBench (and related POBench-PDEs) expose unified Python/PyTorch APIs:

  • Data loading via specialized Dataset wrappers
  • Hydra-configured generation scripts for new cases
  • Uniform model evaluation interfaces: evaluate_model(model, dataset, metrics)
  • Extending to new PDEs: subclass BasePDE, implement methods for IC generation, stepping, and BC enforcement, register in the benchmark factory, and supply YAML configs

Reproducibility is ensured by version-controlled configs, random seed logging, and modular APIs for integrating new baseline models, metrics, or task variants (Takamoto et al., 2022Attene et al., 2019).

5. Baseline Methods and Performance Summaries

PDEBench provides baselines across operator learning paradigms:

  • Fourier Neural Operator (FNO): Spectral convolutions, MLP update, high resolution invariance (Takamoto et al., 2022)
  • U-Net: Multiscale CNN, open/closed-loop training, "pushforward trick" for multi-step stability
  • PINN: Fully-connected networks (DeepXDE), physics-based losses, per-sample training
  • Gradient-Based Inverse: IC/parameter inference via differentiable surrogates (FNO/UNet)

Reported performance:

  • FNO: RMSE 10310^{-3}10110^{-1}, robust to frequency, best conservation and boundary error
  • U-Net: Strong on diffusion, less accurate for shocks/high frequency
  • PINN: Competitive at high frequency (costly, small domains)
  • Gradient inversion: FNO outperforms U-Net, errors at mid/high frequencies

Partial-observation variants benchmark LNO, LANO, and associated frameworks, with LANO achieving 18\sim 1869%69\% reduction in relative 2\ell_2 error at moderate sparsity (Hou et al., 22 Jan 2026).

In the PEM benchmark, lowest-order VEM is stress-tested, revealing critical geometric metrics for solver stability and accuracy (notably KAR and MA) (Attene et al., 2019).

6. Evaluation Metrics and Diagnostic Protocols

PDEBench and related benchmarks define multi-faceted evaluation suites (Takamoto et al., 2022), including:

Metric Definition Targeted effect
nRMSE upredutrue2/utrue2\|u_\text{pred}–u_\text{true}\|_2 / \|u_\text{true}\|_2 Relative error norm
cRMSE (1/N)xupredxutrue2(1/N)\|\sum_x u_\text{pred}–\sum_x u_\text{true}\|_2 Conservation (mass/energy) fidelity
bRMSE RMSE on boundary cells Boundary condition enforcement
fRMSE Frequency-banded RMSE via discrete Fourier transform Multi-scale spatial fidelity
L2, Linf Standard L2L_2, LL_\infty norms (PME) Error measures for solver analysis
Condition κ1\kappa_1 Matrix condition number Solver stability (PEM)
ESS Effective sample size (MCMC) Posterior sampling efficiency

Interpretations: cRMSE diagnoses conservation properties, bRMSE tests BC learning, fRMSE quantifies fidelity at various spatial frequencies, and error norms/conditioning expose mesh-related solver breakdowns.

For Bayesian inverse problems, the reference mean, covariance, convergence, and posterior histograms must agree within 2σ2\sigma of archived ground-truth from 2×10112\times 10^{11} MCMC samples to validate new algorithms (Aristoff et al., 2021).

7. Impact, Extensions, and Best Practices

POBench-PDE provides the first reproducible community standards for many classes of PDE learning and simulation challenges:

  • PDEBench enables rapid benchmarking of new neural surrogates, emulators, and inverse solvers, supporting plug-in extension of PDEs, models, and evaluation metrics (Takamoto et al., 2022).
  • In PEM, systematic analysis across geometric degeneracies guides relaxed mesh quality criteria, promoting flexible, practical meshing strategies (Attene et al., 2019).
  • The Bayesian inversion benchmark supports algorithmic research on high-dimensional inference by providing precise, high-fidelity reference posteriors (Aristoff et al., 2021).
  • For partial observation, POBench-PDE sets the standard for quantifying robustness to missing data across neural operator families, crucial for deployment in real-world sensing applications (Hou et al., 22 Jan 2026).
  • In dynamic poroelasticity, the benchmarked problem, discretization, and output functionals establish a common testbed for evaluating accuracy, efficiency, and robustness of space–time discretizations and iterative solvers (Anselmann et al., 2023).

Standard practices:

  • Explicitly subclass base benchmark modules when adding models/PDEs
  • Commit task-parameter configs and random seeds for reproducibility
  • Use published diagnostic protocols (e.g., error-vs-cost curves, correlation matrices, standardized evaluation metrics) for transparent, objective comparison

POBench-PDE benchmarks thus form an essential infrastructure for advancing data-driven scientific PDE modeling, uncertainty quantification, and numerical simulation.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to POBench-PDE.