Diffusion Maps Kernel Ridge Regression
- DM-KRR is a kernel method that integrates diffusion maps to encode intrinsic data geometry and correct sampling bias in high-dimensional dynamical systems.
- It enhances long-term prediction by adapting the kernel to the manifold structure, thereby improving sample efficiency and forecasting stability.
- Empirical evaluations show DM-KRR outperforms conventional methods in accuracy and robustness across nonlinear ODEs and chaotic PDEs.
Diffusion Maps Kernel Ridge Regression (DM-KRR) is a kernel-based framework for learning solution operators of high-dimensional dynamical systems by incorporating data-driven diffusion geometry into kernel ridge regression. DM-KRR is designed to address the geometric and sampling challenges encountered when learning long-term dynamics, particularly when system trajectories are constrained to an invariant set, such as a smooth manifold or fractal attractor. By utilizing a diffusion maps kernel that encodes the intrinsic geometry and local sampling density of the data, DM-KRR achieves both higher accuracy and sample efficiency relative to conventional isotropic kernels. Empirical evaluations across a range of nonlinear ODEs and PDEs demonstrate that DM-KRR consistently outperforms random-feature, neural-network, and operator-learning alternatives in long-horizon forecasting and solution operator approximation (Song et al., 19 Dec 2025).
1. Motivation and Methodological Overview
Many scientific and engineering systems governed by high-dimensional ordinary or partial differential equations exhibit dynamics that are concentrated on low-dimensional invariant sets. These sets may be smooth manifolds or fractal attractors and are typically unknown a priori. Standard KRR with isotropic radial basis function kernels does not account for this intrinsic geometry or nonuniform sampling density, resulting in inefficient learning and poor long-term predictions.
DM-KRR integrates the diffusion maps (DM) algorithm to construct a data-driven kernel aligned with the underlying geometry. This approach adapts to local density variations, removes sampling bias through normalization, and leverages eigenfunctions of the graph Laplacian to approximate the heat kernel on the invariant set. The fundamental regression map for approximating the time- solution operator is parameterized as
where is the DM kernel. KRR solves for coefficients via minimization of the regularized least squares error. This can be used directly for one-step prediction (direct estimator) or on skip-connection targets (e.g., ), with long-term forecasting performed by iterative rollouts of (Song et al., 19 Dec 2025).
2. Construction of the Diffusion Maps Kernel
The DM kernel is constructed to respect the invariance and geometry of the data as follows:
- Affinity (Unnormalized): Compute the Gaussian affinity from all pairs in the dataset .
- Density Normalization: Estimate the local density . Normalize to obtain .
- Markov Normalization: Row-sum normalization yields . Define the reversible Markov kernel .
- Symmetrization: The final DM kernel is
In the limit , , this kernel converges to the heat kernel on the underlying manifold.
An optional spectral expansion is available, where the kernel is expressed using only the leading eigenfunctions. This provides a low-rank approximation beneficial for scalability and bias-variance trade-off.
3. Kernel Ridge Regression Formulation
Given training pairs , the KRR objective seeks minimizing
By the representer theorem, reduces to a finite sum. Let denote the kernel Gram matrix and the matrix of -dimensional targets. For each output dimension, the closed-form solution is
with predictions given by .
For complex systems, rollout prediction is achieved by recursively applying to propagate the state forward over time.
4. Algorithmic Workflow and Hyperparameter Selection
DM-KRR proceeds according to the following steps:
- Data Preparation: Targets may be set as (direct) or (skip-connection) to enhance stability for non-stiff systems.
- DM Kernel Computation: Sequentially build the unnormalized affinity, density-normalized kernel, Markov-normalized kernel, and symmetrize.
- (Optional) Spectral Expansion: Compute leading eigenpairs ; define with .
- Gram Matrix Formation and Training: Construct (or its low-rank ) and solve .
- Prediction (Rollout): Generate long-term trajectories by successive applications of , either adding skip-connection outputs or substituting direct predictions.
- Hyperparameter Selection:
- Perform a random search over logarithmic ranges of (bandwidth), (regularization), and (spectral rank, if used).
- For smooth manifolds use trajectory RMSE: .
- For chaotic attractors maximize valid prediction time (VPT), defined as the time until the normalized error exceeds a tolerance.
- Heuristic initialization involves estimating intrinsic dimension and bandwidth from graph-integral scaling and setting , accordingly.
5. Theoretical Properties and Insights
The DM kernel converges to the heat kernel on the invariant set as sampling becomes dense and the kernel bandwidth vanishes. The RKHS associated with approximates the span of Laplace–Beltrami operator eigenfunctions, up to the smoothing bandwidth. This results in superior targeting of manifold-constrained functions and mitigates extrapolation into unsupported ambient regions.
For any continuous kernel on a compact invariant set, the RKHS is dense in . The DM kernel optimally weights eigenvalues as (with the Laplacian eigenvalues), leading to an advantageous bias–variance trade-off relative to generic kernels. The normalization within DM-KRR compensates for variable sampling density, providing robustness across heterogeneous datasets (Song et al., 19 Dec 2025).
6. Empirical Evaluation and Benchmarks
DM-KRR was evaluated across several dynamical systems with varying intrinsic and ambient dimensions, sample sizes, and geometry (manifolds, chaotic attractors, high-dimensional flows):
| System | Ambient dim. | Samples N | Metric | DM-KRR Outcome | RBF-KRR Outcome | Other Baselines |
|---|---|---|---|---|---|---|
| Torus rotation | 3, 7, 15 | 1K–8K | RMSE | , 10 lower error | Higher error | n/a |
| Lorenz-63 (chaotic) | 3 | 512–4096 | VPT (Lyap time) | 11–14.1 (±0.5) | 8.9–13.5 (±0.9) | DeepSkip (RF): VPT∼12 (50K points) |
| KS PDE (chaotic) | 64 | 2K–16K | VPT | 0.86–4.98 (±0.2) | 0.79–4.31 (±0.3) | n/a |
| KS (travelling wave) | 64 | 3K (skip) | RMSE | level | GMKRR/RBF, NODE, LDNet: to $1$ | |
| Pitch–plunge flow | PCA | traj. | WNRMSE | 10% error, stable | Diverges | ResDMD: stable, 30% worse |
Key findings include:
- For Lorenz-63, DM-KRR surpasses random-feature methods trained on an order of magnitude more data.
- On travelling-wave dynamics, DM-KRR achieves RMSE reductions by $5$–$6$ orders of magnitude compared to neural operator and operator-valued kernel methods, even with specialized preprocessing by those baselines.
- On high-dimensional turbulent fluid flows (e.g., pitch–plunge), DM-KRR maintains stable error, while conventional KRR diverges.
7. Summary and Practical Implications
DM-KRR provides a methodology for learning solution operators that is robust to unknown geometry and heterogeneous sampling, without requiring explicit manifold reconstruction or attractor encoding. Its core advantages derive from treating the data's diffusion geometry as a first-class object: normalization steps correct sampling bias, and spectral alignment matches the heat semigroup on the underlying set. The method is implemented as a direct modification to KRR, retaining algorithmic simplicity while achieving major empirical improvements in stringent long-term prediction settings.
A plausible implication is that respecting the geometric constraints of data distributions may set a new standard for both forecasting skill and data efficiency, especially in complex, high-dimensional, or chaotic dynamical systems where prevailing operator-learning paradigms encounter substantial degradation (Song et al., 19 Dec 2025).