Papers
Topics
Authors
Recent
Search
2000 character limit reached

Diffusion Maps Kernel Ridge Regression

Updated 27 December 2025
  • DM-KRR is a kernel method that integrates diffusion maps to encode intrinsic data geometry and correct sampling bias in high-dimensional dynamical systems.
  • It enhances long-term prediction by adapting the kernel to the manifold structure, thereby improving sample efficiency and forecasting stability.
  • Empirical evaluations show DM-KRR outperforms conventional methods in accuracy and robustness across nonlinear ODEs and chaotic PDEs.

Diffusion Maps Kernel Ridge Regression (DM-KRR) is a kernel-based framework for learning solution operators of high-dimensional dynamical systems by incorporating data-driven diffusion geometry into kernel ridge regression. DM-KRR is designed to address the geometric and sampling challenges encountered when learning long-term dynamics, particularly when system trajectories are constrained to an invariant set, such as a smooth manifold or fractal attractor. By utilizing a diffusion maps kernel that encodes the intrinsic geometry and local sampling density of the data, DM-KRR achieves both higher accuracy and sample efficiency relative to conventional isotropic kernels. Empirical evaluations across a range of nonlinear ODEs and PDEs demonstrate that DM-KRR consistently outperforms random-feature, neural-network, and operator-learning alternatives in long-horizon forecasting and solution operator approximation (Song et al., 19 Dec 2025).

1. Motivation and Methodological Overview

Many scientific and engineering systems governed by high-dimensional ordinary or partial differential equations exhibit dynamics that are concentrated on low-dimensional invariant sets. These sets may be smooth manifolds or fractal attractors and are typically unknown a priori. Standard KRR with isotropic radial basis function kernels does not account for this intrinsic geometry or nonuniform sampling density, resulting in inefficient learning and poor long-term predictions.

DM-KRR integrates the diffusion maps (DM) algorithm to construct a data-driven kernel aligned with the underlying geometry. This approach adapts to local density variations, removes sampling bias through normalization, and leverages eigenfunctions of the graph Laplacian to approximate the heat kernel on the invariant set. The fundamental regression map for approximating the time-Δt\Delta t solution operator φΔt ⁣:xnxn+1\varphi_{\Delta t}\!: x_n \to x_{n+1} is parameterized as

f(x)=i=1NαiK(x,xi),f(x) = \sum_{i=1}^N \alpha_i K(x, x_i),

where KK is the DM kernel. KRR solves for coefficients α\alpha via minimization of the regularized least squares error. This can be used directly for one-step prediction (direct estimator) or on skip-connection targets (e.g., yi=xi+1xiy_i = x_{i+1} - x_i), with long-term forecasting performed by iterative rollouts of ff (Song et al., 19 Dec 2025).

2. Construction of the Diffusion Maps Kernel

The DM kernel is constructed to respect the invariance and geometry of the data as follows:

  1. Affinity (Unnormalized): Compute the Gaussian affinity k~ϵ(x,y)=exp(xy2/4ϵ)\tilde{k}_\epsilon(x, y) = \exp(-\|x - y\|^2 / 4\epsilon) from all pairs in the dataset X={xi}i=1NRnX = \{x_i\}_{i=1}^N \subset \mathbb{R}^n.
  2. Density Normalization: Estimate the local density qϵ,N(xi)=1Nj=1Nk~ϵ(xi,xj)q_{\epsilon,N}(x_i) = \frac{1}{N} \sum_{j=1}^N \tilde{k}_\epsilon(x_i, x_j). Normalize to obtain k^ϵ,N(xi,xj)=k~ϵ(xi,xj)/[qϵ,N(xi)qϵ,N(xj)]\hat{k}_{\epsilon,N}(x_i, x_j) = \tilde{k}_\epsilon(x_i, x_j) / [q_{\epsilon,N}(x_i) q_{\epsilon,N}(x_j)].
  3. Markov Normalization: Row-sum normalization yields q^ϵ,N(xi)=1Nj=1Nk^ϵ,N(xi,xj)\hat{q}_{\epsilon,N}(x_i) = \frac{1}{N} \sum_{j=1}^{N} \hat{k}_{\epsilon,N}(x_i, x_j). Define the reversible Markov kernel kϵ,NDM(xi,xj)=k^ϵ,N(xi,xj)/q^ϵ,N(xi)k^{\text{DM}}_{\epsilon,N}(x_i, x_j) = \hat{k}_{\epsilon,N}(x_i, x_j) / \hat{q}_{\epsilon,N}(x_i).
  4. Symmetrization: The final DM kernel is

Kϵ,N(xi,xj)=[q^ϵ,N(xi)q^ϵ,N(xj)]1/2k^ϵ,N(xi,xj).K_{\epsilon,N}(x_i, x_j) = [\hat{q}_{\epsilon,N}(x_i) \hat{q}_{\epsilon,N}(x_j)]^{-1/2} \hat{k}_{\epsilon,N}(x_i, x_j).

In the limit NN \to \infty, ϵ0\epsilon \to 0, this kernel converges to the heat kernel on the underlying manifold.

An optional spectral expansion is available, where the kernel is expressed using only the leading rr eigenfunctions. This provides a low-rank approximation beneficial for scalability and bias-variance trade-off.

3. Kernel Ridge Regression Formulation

Given training pairs (xi,yi)i=1N(x_i, y_i)_{i=1}^N, the KRR objective seeks fHKf \in \mathcal{H}_K minimizing

minfHKi=1Nf(xi)yi2+λfHK2.\min_{f \in \mathcal{H}_K} \sum_{i=1}^N \|f(x_i) - y_i\|^2 + \lambda \|f\|_{\mathcal{H}_K}^2.

By the representer theorem, ff reduces to a finite sum. Let KK denote the kernel Gram matrix Kij=K(xi,xj)K_{ij} = K(x_i, x_j) and YRm×NY \in \mathbb{R}^{m \times N} the matrix of mm-dimensional targets. For each output dimension, the closed-form solution is

α(j)=(K+λI)1YjT,\alpha^{(j)} = (K + \lambda I)^{-1} Y_j^T,

with predictions given by f(x)=K(x,X)(K+λI)1YTf(x) = K(x, X)(K + \lambda I)^{-1} Y^T.

For complex systems, rollout prediction is achieved by recursively applying ff to propagate the state forward over time.

4. Algorithmic Workflow and Hyperparameter Selection

DM-KRR proceeds according to the following steps:

  1. Data Preparation: Targets yiy_i may be set as xi+1x_{i+1} (direct) or xi+1xix_{i+1} - x_i (skip-connection) to enhance stability for non-stiff systems.
  2. DM Kernel Computation: Sequentially build the unnormalized affinity, density-normalized kernel, Markov-normalized kernel, and symmetrize.
  3. (Optional) Spectral Expansion: Compute leading eigenpairs (μi,ϕi)(\mu_i, \phi_i); define Kr(x,y)=i=1rμiψi(x)ψi(y)K_r(x, y) = \sum_{i=1}^r \mu_i \psi_i(x) \psi_i(y) with ψi(x)=jϕi(j)kϵ,NDM(x,xj)\psi_i(x) = \sum_j \phi_i(j) k^{\text{DM}}_{\epsilon,N}(x, x_j).
  4. Gram Matrix Formation and Training: Construct KK (or its low-rank KrK_r) and solve (K+λI)α=YT(K + \lambda I)\alpha = Y^T.
  5. Prediction (Rollout): Generate long-term trajectories by successive applications of ff, either adding skip-connection outputs or substituting direct predictions.
  6. Hyperparameter Selection:
    • Perform a random search over logarithmic ranges of ϵ\epsilon (bandwidth), λ\lambda (regularization), and rr (spectral rank, if used).
    • For smooth manifolds use trajectory RMSE: RMSE=1/(Tn)x^txt2\text{RMSE} = \sqrt{1/(Tn) \sum \| \hat{x}_t - x_t \|^2}.
    • For chaotic attractors maximize valid prediction time (VPT), defined as the time until the normalized error exceeds a tolerance.
    • Heuristic initialization involves estimating intrinsic dimension and bandwidth from graph-integral scaling and setting ϵ\epsilon^*, λ\lambda^* accordingly.

5. Theoretical Properties and Insights

The DM kernel converges to the heat kernel on the invariant set as sampling becomes dense and the kernel bandwidth vanishes. The RKHS associated with KϵK_\epsilon approximates the span of Laplace–Beltrami operator eigenfunctions, up to the smoothing bandwidth. This results in superior targeting of manifold-constrained functions and mitigates extrapolation into unsupported ambient regions.

For any continuous kernel on a compact invariant set, the RKHS is dense in L2(μ)L^2(\mu). The DM kernel optimally weights eigenvalues as μi=eϵλi\mu_i = e^{-\epsilon \lambda_i} (with λi\lambda_i the Laplacian eigenvalues), leading to an advantageous bias–variance trade-off relative to generic kernels. The normalization within DM-KRR compensates for variable sampling density, providing robustness across heterogeneous datasets (Song et al., 19 Dec 2025).

6. Empirical Evaluation and Benchmarks

DM-KRR was evaluated across several dynamical systems with varying intrinsic and ambient dimensions, sample sizes, and geometry (manifolds, chaotic attractors, high-dimensional flows):

System Ambient dim. Samples N Metric DM-KRR Outcome RBF-KRR Outcome Other Baselines
Torus rotation 3, 7, 15 1K–8K RMSE O(N1/2)O(N^{-1/2}), 10×\times lower error Higher error n/a
Lorenz-63 (chaotic) 3 512–4096 VPT (Lyap time) 11–14.1 (±0.5) 8.9–13.5 (±0.9) DeepSkip (RF): VPT∼12 (50K points)
KS PDE (chaotic) 64 2K–16K VPT 0.86–4.98 (±0.2) 0.79–4.31 (±0.3) n/a
KS (travelling wave) 64 3K (skip) RMSE 10610^{-6} level 10510^{-5} GMKRR/RBF, NODE, LDNet: 10110^{-1} to $1$
Pitch–plunge flow 1.8×105811.8 \times 10^5 \to 81 PCA 12×20012 \times 200 traj. WNRMSE \sim10% error, stable Diverges ResDMD: stable, 30% worse

Key findings include:

  • For Lorenz-63, DM-KRR surpasses random-feature methods trained on an order of magnitude more data.
  • On travelling-wave dynamics, DM-KRR achieves RMSE reductions by $5$–$6$ orders of magnitude compared to neural operator and operator-valued kernel methods, even with specialized preprocessing by those baselines.
  • On high-dimensional turbulent fluid flows (e.g., pitch–plunge), DM-KRR maintains stable error, while conventional KRR diverges.

7. Summary and Practical Implications

DM-KRR provides a methodology for learning solution operators that is robust to unknown geometry and heterogeneous sampling, without requiring explicit manifold reconstruction or attractor encoding. Its core advantages derive from treating the data's diffusion geometry as a first-class object: normalization steps correct sampling bias, and spectral alignment matches the heat semigroup on the underlying set. The method is implemented as a direct modification to KRR, retaining algorithmic simplicity while achieving major empirical improvements in stringent long-term prediction settings.

A plausible implication is that respecting the geometric constraints of data distributions may set a new standard for both forecasting skill and data efficiency, especially in complex, high-dimensional, or chaotic dynamical systems where prevailing operator-learning paradigms encounter substantial degradation (Song et al., 19 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diffusion Maps Kernel Ridge Regression (DM-KRR).