Diffusion Maps Kernel Ridge Regression

Updated 27 December 2025

DM-KRR is a kernel method that integrates diffusion maps to encode intrinsic data geometry and correct sampling bias in high-dimensional dynamical systems.
It enhances long-term prediction by adapting the kernel to the manifold structure, thereby improving sample efficiency and forecasting stability.
Empirical evaluations show DM-KRR outperforms conventional methods in accuracy and robustness across nonlinear ODEs and chaotic PDEs.

Diffusion Maps Kernel Ridge Regression (DM-KRR) is a kernel-based framework for learning solution operators of high-dimensional dynamical systems by incorporating data-driven diffusion geometry into kernel ridge regression. DM-KRR is designed to address the geometric and sampling challenges encountered when learning long-term dynamics, particularly when system trajectories are constrained to an invariant set, such as a smooth manifold or fractal attractor. By utilizing a diffusion maps kernel that encodes the intrinsic geometry and local sampling density of the data, DM-KRR achieves both higher accuracy and sample efficiency relative to conventional isotropic kernels. Empirical evaluations across a range of nonlinear ODEs and PDEs demonstrate that DM-KRR consistently outperforms random-feature, neural-network, and operator-learning alternatives in long-horizon forecasting and solution operator approximation (Song et al., 19 Dec 2025).

1. Motivation and Methodological Overview

Many scientific and engineering systems governed by high-dimensional ordinary or partial differential equations exhibit dynamics that are concentrated on low-dimensional invariant sets. These sets may be smooth manifolds or fractal attractors and are typically unknown a priori. Standard KRR with isotropic radial basis function kernels does not account for this intrinsic geometry or nonuniform sampling density, resulting in inefficient learning and poor long-term predictions.

DM-KRR integrates the diffusion maps (DM) algorithm to construct a data-driven kernel aligned with the underlying geometry. This approach adapts to local density variations, removes sampling bias through normalization, and leverages eigenfunctions of the graph Laplacian to approximate the heat kernel on the invariant set. The fundamental regression map for approximating the time- $\Delta t$ solution operator $\varphi_{\Delta t}\!: x_n \to x_{n+1}$ is parameterized as

$f(x) = \sum_{i=1}^N \alpha_i K(x, x_i),$

where $K$ is the DM kernel. KRR solves for coefficients $\alpha$ via minimization of the regularized least squares error. This can be used directly for one-step prediction (direct estimator) or on skip-connection targets (e.g., $y_i = x_{i+1} - x_i$ ), with long-term forecasting performed by iterative rollouts of $f$ (Song et al., 19 Dec 2025).

2. Construction of the Diffusion Maps Kernel

The DM kernel is constructed to respect the invariance and geometry of the data as follows:

Affinity (Unnormalized): Compute the Gaussian affinity $\tilde{k}_\epsilon(x, y) = \exp(-\|x - y\|^2 / 4\epsilon)$ from all pairs in the dataset $X = \{x_i\}_{i=1}^N \subset \mathbb{R}^n$ .
Density Normalization: Estimate the local density $q_{\epsilon,N}(x_i) = \frac{1}{N} \sum_{j=1}^N \tilde{k}_\epsilon(x_i, x_j)$ . Normalize to obtain $\hat{k}_{\epsilon,N}(x_i, x_j) = \tilde{k}_\epsilon(x_i, x_j) / [q_{\epsilon,N}(x_i) q_{\epsilon,N}(x_j)]$ .
Markov Normalization: Row-sum normalization yields $\hat{q}_{\epsilon,N}(x_i) = \frac{1}{N} \sum_{j=1}^{N} \hat{k}_{\epsilon,N}(x_i, x_j)$ . Define the reversible Markov kernel $k^{\text{DM}}_{\epsilon,N}(x_i, x_j) = \hat{k}_{\epsilon,N}(x_i, x_j) / \hat{q}_{\epsilon,N}(x_i)$ .
Symmetrization: The final DM kernel is

$K_{\epsilon,N}(x_i, x_j) = [\hat{q}_{\epsilon,N}(x_i) \hat{q}_{\epsilon,N}(x_j)]^{-1/2} \hat{k}_{\epsilon,N}(x_i, x_j).$

In the limit $N \to \infty$ , $\epsilon \to 0$ , this kernel converges to the heat kernel on the underlying manifold.

An optional spectral expansion is available, where the kernel is expressed using only the leading $r$ eigenfunctions. This provides a low-rank approximation beneficial for scalability and bias-variance trade-off.

3. Kernel Ridge Regression Formulation

Given training pairs $(x_i, y_i)_{i=1}^N$ , the KRR objective seeks $f \in \mathcal{H}_K$ minimizing

$\min_{f \in \mathcal{H}_K} \sum_{i=1}^N \|f(x_i) - y_i\|^2 + \lambda \|f\|_{\mathcal{H}_K}^2.$

By the representer theorem, $f$ reduces to a finite sum. Let $K$ denote the kernel Gram matrix $K_{ij} = K(x_i, x_j)$ and $Y \in \mathbb{R}^{m \times N}$ the matrix of $m$ -dimensional targets. For each output dimension, the closed-form solution is

$\alpha^{(j)} = (K + \lambda I)^{-1} Y_j^T,$

with predictions given by $f(x) = K(x, X)(K + \lambda I)^{-1} Y^T$ .

For complex systems, rollout prediction is achieved by recursively applying $f$ to propagate the state forward over time.

4. Algorithmic Workflow and Hyperparameter Selection

DM-KRR proceeds according to the following steps:

Data Preparation: Targets $y_i$ may be set as $x_{i+1}$ (direct) or $x_{i+1} - x_i$ (skip-connection) to enhance stability for non-stiff systems.
DM Kernel Computation: Sequentially build the unnormalized affinity, density-normalized kernel, Markov-normalized kernel, and symmetrize.
(Optional) Spectral Expansion: Compute leading eigenpairs $(\mu_i, \phi_i)$ ; define $K_r(x, y) = \sum_{i=1}^r \mu_i \psi_i(x) \psi_i(y)$ with $\psi_i(x) = \sum_j \phi_i(j) k^{\text{DM}}_{\epsilon,N}(x, x_j)$ .
Gram Matrix Formation and Training: Construct $K$ (or its low-rank $K_r$ ) and solve $(K + \lambda I)\alpha = Y^T$ .
Prediction (Rollout): Generate long-term trajectories by successive applications of $f$ , either adding skip-connection outputs or substituting direct predictions.
Hyperparameter Selection:
- Perform a random search over logarithmic ranges of $\epsilon$ (bandwidth), $\lambda$ (regularization), and $r$ (spectral rank, if used).
- For smooth manifolds use trajectory RMSE: $\text{RMSE} = \sqrt{1/(Tn) \sum \| \hat{x}_t - x_t \|^2}$ .
- For chaotic attractors maximize valid prediction time (VPT), defined as the time until the normalized error exceeds a tolerance.
- Heuristic initialization involves estimating intrinsic dimension and bandwidth from graph-integral scaling and setting $\epsilon^*$ , $\lambda^*$ accordingly.

5. Theoretical Properties and Insights

The DM kernel converges to the heat kernel on the invariant set as sampling becomes dense and the kernel bandwidth vanishes. The RKHS associated with $K_\epsilon$ approximates the span of Laplace–Beltrami operator eigenfunctions, up to the smoothing bandwidth. This results in superior targeting of manifold-constrained functions and mitigates extrapolation into unsupported ambient regions.

For any continuous kernel on a compact invariant set, the RKHS is dense in $L^2(\mu)$ . The DM kernel optimally weights eigenvalues as $\mu_i = e^{-\epsilon \lambda_i}$ (with $\lambda_i$ the Laplacian eigenvalues), leading to an advantageous bias–variance trade-off relative to generic kernels. The normalization within DM-KRR compensates for variable sampling density, providing robustness across heterogeneous datasets (Song et al., 19 Dec 2025).

6. Empirical Evaluation and Benchmarks

DM-KRR was evaluated across several dynamical systems with varying intrinsic and ambient dimensions, sample sizes, and geometry (manifolds, chaotic attractors, high-dimensional flows):

System	Ambient dim.	Samples N	Metric	DM-KRR Outcome	RBF-KRR Outcome	Other Baselines
Torus rotation	3, 7, 15	1K–8K	RMSE	$O(N^{-1/2})$ , 10 $\times$ lower error	Higher error	n/a
Lorenz-63 (chaotic)	3	512–4096	VPT (Lyap time)	11–14.1 (±0.5)	8.9–13.5 (±0.9)	DeepSkip (RF): VPT∼12 (50K points)
KS PDE (chaotic)	64	2K–16K	VPT	0.86–4.98 (±0.2)	0.79–4.31 (±0.3)	n/a
KS (travelling wave)	64	3K (skip)	RMSE	$10^{-6}$ level	$10^{-5}$	GMKRR/RBF, NODE, LDNet: $10^{-1}$ to $1$
Pitch–plunge flow	$1.8 \times 10^5 \to 81$ PCA	$12 \times 200$ traj.	WNRMSE	$\sim$ 10% error, stable	Diverges	ResDMD: stable, 30% worse

Key findings include:

For Lorenz-63, DM-KRR surpasses random-feature methods trained on an order of magnitude more data.
On travelling-wave dynamics, DM-KRR achieves RMSE reductions by $5$–$6$ orders of magnitude compared to neural operator and operator-valued kernel methods, even with specialized preprocessing by those baselines.
On high-dimensional turbulent fluid flows (e.g., pitch–plunge), DM-KRR maintains stable error, while conventional KRR diverges.

7. Summary and Practical Implications

DM-KRR provides a methodology for learning solution operators that is robust to unknown geometry and heterogeneous sampling, without requiring explicit manifold reconstruction or attractor encoding. Its core advantages derive from treating the data's diffusion geometry as a first-class object: normalization steps correct sampling bias, and spectral alignment matches the heat semigroup on the underlying set. The method is implemented as a direct modification to KRR, retaining algorithmic simplicity while achieving major empirical improvements in stringent long-term prediction settings.

A plausible implication is that respecting the geometric constraints of data distributions may set a new standard for both forecasting skill and data efficiency, especially in complex, high-dimensional, or chaotic dynamical systems where prevailing operator-learning paradigms encounter substantial degradation (Song et al., 19 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Learning solution operator of dynamical systems with diffusion maps kernel ridge regression (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diffusion Maps Kernel Ridge Regression (DM-KRR).