POBench-PDE Benchmark Suite
- POBench-PDE is a standardized family of benchmarks designed to test PDE solvers and scientific machine learning algorithms across diverse models.
- It provides reproducible datasets, unified APIs, and detailed evaluation metrics for tasks such as operator learning, Bayesian inversion, mesh quality analysis, and dynamic poroelasticity.
- The framework enables quantitative comparisons with high-fidelity ground-truth solutions and rigorous diagnostic protocols for both forward and inverse problems.
POBench-PDE denotes a family of systematically designed benchmarks for Partial Differential Equations (PDEs) that facilitate evaluation and comparison of algorithms in scientific machine learning, uncertainty quantification, mesh geometry, and numerical PDE solvers. Through its various forms—spanning data-driven operator learning, Bayesian inverse problems, mesh quality analysis for polytopal elements, handling partial observations, and dynamic poroelasticity—it provides reproducible, extensible platforms for standardized algorithmic assessment.
1. Scope and Rationale
The core objective of POBench-PDE is to address the need for rigorous, widely-adopted PDE benchmarks supporting diverse scientific machine learning and computational PDE research. These benchmarks feature:
- Canonical and real-world PDE models (time-dependent, steady, 1D–3D, linear/nonlinear, scalar/vector, forward/inverse)
- Large, off-the-shelf datasets with code for further customizable generation and parameter variation
- Unified APIs for data access, model evaluation, and extension
- Reference solutions and high-fidelity ground-truth statistics for quantitative, like-for-like algorithm comparison
Specific implementations include:
- Data-driven emulation and surrogate learning for dynamical systems (Takamoto et al., 2022)
- Bayesian inversion of spatially variable coefficients via MCMC (Aristoff et al., 2021)
- Benchmarking mesh quality for polytopal element methods (PEM) (Attene et al., 2019)
- Operator learning from partial observations (Hou et al., 22 Jan 2026)
- Benchmarking dynamic poroelasticity solvers (Anselmann et al., 2023)
2. Benchmark Architectures and PDE Catalogs
Data-Driven Operator Learning Benchmarks
PDEBench includes eleven canonical and application-oriented PDEs characterized by varied spatial dimension ($1$D–$3$D), type, and solution complexity (Takamoto et al., 2022):
- 1D: Advection, Burgers’, Diffusion–Reaction, Diffusion–Sorption
- 2D: Diffusion–Reaction (FitzHugh–Nagumo), Darcy flow, Incompressible/Compressible Navier–Stokes, Shallow water
- 3D: Compressible Navier–Stokes
Each PDE is precisely specified by governing equations, initial/boundary conditions, and randomized parameters. For example, the 1D advection equation is
with periodic boundary conditions and randomly sampled initial conditions constructed as sums of sines.
Bayesian Inversion Benchmark (MCMC)
The problem is coefficient identification in the Poisson equation on : where is piecewise constant, parameterized by over an grid. Observables are at $169$ points with i.i.d. Gaussian noise. The task is to reconstruct from data using MCMC, with precisely specified priors and acceptance ratios (Aristoff et al., 2021).
Mesh Quality and Solver Performance for PEM
POBench-PDE analyzes the effect of polygonal mesh geometry on solver conditioning and approximation:
- Eight parametric polygon families, each systematically degenerating in shape
- Twelve per-cell quality metrics: kernel-area ratio (KAR), minimum angle (MA), circle ratio (CR), edge ratio (ER), perimeter-area ratio (PAR), etc.
- Solver metrics: relative , , energy-norm errors; condition number; empirical convergence rates
- Statistical Pearson correlations between shape descriptors and solver performance (Attene et al., 2019)
Partial-Observation Operator Learning
POBench-PDE (partial observation) assesses neural operator performance under missing data regimes for:
- 2D incompressible Navier–Stokes turbulence
- Reaction–diffusion (Gray–Scott system)
- Real-world climate fields (ERA5) Diverse missingness types (point- and patch-wise), sparsity up to , and a unified evaluation protocol (Hou et al., 22 Jan 2026).
Dynamic Poroelasticity (Biot Equations)
A coupled hyperbolic–parabolic system on a 2D L-shaped domain: with specific Dirichlet/Neumann/slip boundary regimes. Reference goal quantities are line integrals of and on predefined subdomains (Anselmann et al., 2023).
3. Dataset Generation, Formats, and Task Protocols
Datasets are distributed in HDF5 format, structured as arrays: where is the number of physical fields. Sample counts per PDE and resolutions are problem-specific (e.g., 1D advection: samples, , ).
Initial conditions and PDE parameters are randomly sampled according to specified ranges or distributions. Metadata (PDE type, parameter values, BC types) is encoded as YAML attributes. Data splits for training/evaluation are user-configurable, with a typical 90/10 train/test division (Takamoto et al., 2022).
For Bayesian benchmarks, the full posterior is archived based on MCMC samples, ensuring statistical reference for new algorithms (Aristoff et al., 2021).
PEM mesh benchmarks provide C++ and MATLAB code for generating, measuring, and solving on polygonal mesh families, including full reproducibility for all experiments (Attene et al., 2019).
4. APIs, Extension Points, and Reproducibility
PDEBench (and related POBench-PDEs) expose unified Python/PyTorch APIs:
- Data loading via specialized Dataset wrappers
- Hydra-configured generation scripts for new cases
- Uniform model evaluation interfaces:
evaluate_model(model, dataset, metrics) - Extending to new PDEs: subclass
BasePDE, implement methods for IC generation, stepping, and BC enforcement, register in the benchmark factory, and supply YAML configs
Reproducibility is ensured by version-controlled configs, random seed logging, and modular APIs for integrating new baseline models, metrics, or task variants (Takamoto et al., 2022Attene et al., 2019).
5. Baseline Methods and Performance Summaries
PDEBench provides baselines across operator learning paradigms:
- Fourier Neural Operator (FNO): Spectral convolutions, MLP update, high resolution invariance (Takamoto et al., 2022)
- U-Net: Multiscale CNN, open/closed-loop training, "pushforward trick" for multi-step stability
- PINN: Fully-connected networks (DeepXDE), physics-based losses, per-sample training
- Gradient-Based Inverse: IC/parameter inference via differentiable surrogates (FNO/UNet)
Reported performance:
- FNO: RMSE –, robust to frequency, best conservation and boundary error
- U-Net: Strong on diffusion, less accurate for shocks/high frequency
- PINN: Competitive at high frequency (costly, small domains)
- Gradient inversion: FNO outperforms U-Net, errors at mid/high frequencies
Partial-observation variants benchmark LNO, LANO, and associated frameworks, with LANO achieving – reduction in relative error at moderate sparsity (Hou et al., 22 Jan 2026).
In the PEM benchmark, lowest-order VEM is stress-tested, revealing critical geometric metrics for solver stability and accuracy (notably KAR and MA) (Attene et al., 2019).
6. Evaluation Metrics and Diagnostic Protocols
PDEBench and related benchmarks define multi-faceted evaluation suites (Takamoto et al., 2022), including:
| Metric | Definition | Targeted effect |
|---|---|---|
| nRMSE | Relative error norm | |
| cRMSE | Conservation (mass/energy) fidelity | |
| bRMSE | RMSE on boundary cells | Boundary condition enforcement |
| fRMSE | Frequency-banded RMSE via discrete Fourier transform | Multi-scale spatial fidelity |
| L2, Linf | Standard , norms (PME) | Error measures for solver analysis |
| Condition | Matrix condition number | Solver stability (PEM) |
| ESS | Effective sample size (MCMC) | Posterior sampling efficiency |
Interpretations: cRMSE diagnoses conservation properties, bRMSE tests BC learning, fRMSE quantifies fidelity at various spatial frequencies, and error norms/conditioning expose mesh-related solver breakdowns.
For Bayesian inverse problems, the reference mean, covariance, convergence, and posterior histograms must agree within of archived ground-truth from MCMC samples to validate new algorithms (Aristoff et al., 2021).
7. Impact, Extensions, and Best Practices
POBench-PDE provides the first reproducible community standards for many classes of PDE learning and simulation challenges:
- PDEBench enables rapid benchmarking of new neural surrogates, emulators, and inverse solvers, supporting plug-in extension of PDEs, models, and evaluation metrics (Takamoto et al., 2022).
- In PEM, systematic analysis across geometric degeneracies guides relaxed mesh quality criteria, promoting flexible, practical meshing strategies (Attene et al., 2019).
- The Bayesian inversion benchmark supports algorithmic research on high-dimensional inference by providing precise, high-fidelity reference posteriors (Aristoff et al., 2021).
- For partial observation, POBench-PDE sets the standard for quantifying robustness to missing data across neural operator families, crucial for deployment in real-world sensing applications (Hou et al., 22 Jan 2026).
- In dynamic poroelasticity, the benchmarked problem, discretization, and output functionals establish a common testbed for evaluating accuracy, efficiency, and robustness of space–time discretizations and iterative solvers (Anselmann et al., 2023).
Standard practices:
- Explicitly subclass base benchmark modules when adding models/PDEs
- Commit task-parameter configs and random seeds for reproducibility
- Use published diagnostic protocols (e.g., error-vs-cost curves, correlation matrices, standardized evaluation metrics) for transparent, objective comparison
POBench-PDE benchmarks thus form an essential infrastructure for advancing data-driven scientific PDE modeling, uncertainty quantification, and numerical simulation.