- The paper introduces a data-driven framework combining denoising diffusion probabilistic models with conformal prediction via the Learn Then Test procedure to achieve PAC guarantees.
- The paper demonstrates tight statistical and geometric coverage, ensuring low false negative rates and robust reachable set estimation in high-dimensional nonlinear systems.
- The paper validates the approach on systems ranging from the chaotic Duffing oscillator to an 8192-dimensional Gray–Scott PDE, outperforming traditional methods.
Data-Driven Reachability Analysis via Diffusion Models with PAC Guarantees
Introduction and Motivation
This paper introduces a data-driven framework for reachability analysis of nonlinear dynamical systems that dispenses with the need for explicit models. The methodology leverages denoising diffusion probabilistic models (DDPMs) to learn the distribution of system states purely from trajectory data, and calibrates statistical guarantees on reachable states using recent advances in conformal prediction via the Learn Then Test (LTT) procedure. By directly modeling the state distribution and avoiding a surrogate dynamics model, the framework circumvents coverage degradation due to model error and the intractability faced by polynomial and grid-based methods in high-dimensional spaces.
The approach yields PAC-type statistical coverage guarantees on the predicted reachable sets, quantified as bounds on the probability of excluding an actually reachable state at each time step. Additionally, the authors provide geometric volume bounds on the excluded set under mild assumptions, tightening for dissipative systems.
Methodological Framework
Pipeline and Theoretical Foundation
The core pipeline (Figure 1) consists of four phases: (1) collecting trajectory data from the unknown dynamical system, (2) training a time-conditional DDPM to denoise potentially corrupted state samples, (3) calibrating a nonconformity score threshold via LTT so that the predicted reachable set achieves the desired false negative rate within a prescribed PAC confidence, and (4) constructing the reachable set as the sublevel set of all states whose denoising error does not exceed this threshold.
Figure 1: Pipeline overview for the data-driven reachability analysis via diffusion models, from data collection to PAC-calibrated reachable set prediction.
The DDPM reconstructs the diffusion noise sequence added to samples, and the squared denoising error—summed over selected timesteps—serves as a nonconformity score. Through a detailed analysis (see Proposition: score-likelihood correspondence), the score is shown to approximate a negative evidence lower bound (ELBO) plus an explicit data-dependent quadratic term. This connection provides strong theoretical motivation: score sublevel sets robustly approximate density level sets of the induced state distribution, converging to Neyman-Pearson optimality under calibration.
The final step employs LTT to select a threshold that ensures the probability of missing a reachable state is tightly controlled across the time horizon, with a union bound providing a familywise PAC guarantee.
For each physical time step, the LTT method chooses a score threshold on held-out calibration data such that the empirical miss rate does not exceed the targeted risk level, with joint PAC guarantees across all steps. The Hoeffding–Bentkus hybrid p-value computation further tightens these thresholds relative to standard quantile- or Bonferroni-based conformal methods, thus improving precision and set tightness.
From Statistical to Geometric Guarantees
Under a uniform lower bound on the initial state distribution and mild regularity (diffeomorphicity) of the flow map, a geometric volume bound is given: the expected missed volume of the reachable set at any time step is upper bounded by the product of the coverage miss rate, the supremum of the flow Jacobian, and the reciprocal of the initial density lower bound. For systems with constant negative divergence, this volume bound shrinks exponentially in time due to Liouville’s formula.
Experimental Validation
The framework is validated on three canonical systems of varied complexity and dimension: the forced Duffing oscillator (2D, chaotic), a planar quadrotor (6D), and the Gray–Scott reaction-diffusion PDE (8192D).
Duffing Oscillator
The DDPM-based method exhibits significant advantages over normalizing flows and Christoffel-function polynomial sublevel sets for the Duffing oscillator. The DDPM achieves IoU of 0.887 and precision of 0.918, with observed false negative rate (FNR) 0.08% (well below the α=0.10% calibration), outperforming baselines which are limited either by mode collapse (normalizing flows) or polynomial degree truncation artifacts (Christoffel). The PAC geometric bound ensures the actual missed volume remains uniformly negligible across time.


















Figure 2: Predicted reachable sets for the Duffing oscillator at six time steps, visualizing fine-structure captured by the diffusion approach.
The ablation study demonstrates both monotonic improvement in set tightness with increased data and model size, and invariance of coverage (FNR) to model sweep.











Figure 3: Ablation visualizations at k=149, highlighting prediction gains with more data and larger model capacities.
Planar Quadrotor
On the 6D quadrotor model, the DDPM approach projected onto (x,h) space at t=5.0 achieves IoU $0.827$, tightly concentrating on the reachable region with an empirical miss rate of 0.091%. Christoffel-function polynomials (degree 4) underfit set boundaries, unable to resolve finer geometric features even as sample complexity is relaxed.

Figure 4: Predicted reachable sets for the planar quadrotor projected onto (x,h) at t=5.0, comparing DDPM and polynomial approaches.
High-dimensional Reaction-Diffusion System
In the $8192$-dimensional Gray–Scott system, the DDPM generalizes: it maintains FNR at α=0.10%0 (well below α=0.10%1), verified to pass the PAC coverage criterion in α=0.10%2 of α=0.10%3 random calibration/evaluation splits. The VAE baseline—nearly the only comparably scalable alternative—produces looser sets, with a much more gradual drop in acceptance rate under systematic perturbations.


Figure 5: Acceptance rate under input perturbation, demonstrating sharper DDPM selectivity compared to VAE and enhanced sensitivity to out-of-support states.
High-dimensional score distributions confirm the calibrated threshold robustly discriminates clean in-support, noisy, and out-of-distribution samples. The diffusion-based score provides substantially more pronounced separation than VAE-based negative ELBO.
Implications and Future Directions
This work bridges generative modeling and conformal prediction, allowing rigorous, data-driven reachability analysis with explicit PAC guarantees even in previously intractable high-dimensional regimes. The replacement of surrogate model error bounds with direct density-level thresholding, combined with distribution-free statistical calibration, leads to both practical precision and robust, interpretable guarantees.
Strong numerical results include:
- Tightest reachable sets among all tested approaches, with empirical miss rates strictly below the specified PAC risk (e.g., FNR α=0.10%4 for the Duffing system, FNR α=0.10%5 on high-dimensional PDEs).
- Demonstrated scalability to state spaces that exceed the limits of grid-based (curse-of-dimensionality) and polynomial methods.
Practically, this implies tractable, quantifiable safety verification for complex nonlinear and high-dimensional systems from data alone, with minimal user heuristic tuning. For theory, the connection between diffusion model likelihood and conformal risk control suggests new avenues for optimal support estimation.
Further research directions include:
- Instance-adaptive refinement of score functions,
- Efficient algorithms for online or streaming calibration,
- Tighter union corrections beyond the familywise LTT method,
- Extensions to non-autonomous or partially observed systems,
- Formal robustness guarantees under trajectory or measurement noise.
Conclusion
The presented framework realizes principled data-driven reachability analysis by coupling diffusion models and conformal risk control. It achieves PAC-calibrated guarantees, precise set estimates, and robustness to uncertainty or high dimensionality. The approach is supported by theoretical reductions and strong empirical validation across system classes. Its modularity and rigor position it as a general tool for safety verification and probabilistic control in scenarios where analytic models are absent or intractable.
Reference:
"Data-Driven Reachability Analysis via Diffusion Models with PAC Guarantees" (2604.00283)