Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gaussian Kernel Feature Space

Updated 28 January 2026
  • Feature Space for Gaussian Kernels is the infinite-dimensional reproducing kernel Hilbert space induced by the Gaussian function, enabling inner product representations and universal approximations.
  • Various constructions including Mercer, Hermite, Fourier, and Segal–Bargmann expansions provide distinct analytical frameworks based on eigen-decompositions and spectral methods.
  • Practical techniques like random Fourier features and explicit finite-dimensional mappings offer efficient approximations and implementations in kernel-based statistical learning.

A Gaussian kernel is a symmetric, positive-definite function of the form

k(x,y)=exp(xy22σ2),x,yRdk(x, y) = \exp\left(-\frac{\|x - y\|^2}{2\sigma^2}\right), \quad x, y \in \mathbb{R}^d

or, more generally, defined on a real separable Hilbert space H\mathcal{H}. The feature space corresponding to a Gaussian kernel is the Hilbert space into which data are mapped such that k(x,y)k(x, y) equals the inner product between the mapped points. This feature space is typically infinite-dimensional, and its structure underlies the expressive power, universality, and practical implementation of kernel methods in statistical learning, Gaussian processes, SVMs, and related fields. Several analytically distinct but isometrically equivalent constructions capture the geometry and analytic regularity of the Gaussian kernel's feature space across finite- and infinite-dimensional domains.

1. Reproducing Kernel Hilbert Space of the Gaussian Kernel

Let H\mathcal{H} be a real separable Hilbert space. The Gaussian kernel k(x,y)=exp(xy2/(2σ2))k(x, y)=\exp(-\|x-y\|^2/(2\sigma^2)) is positive definite on H\mathcal{H}. By Aronszajn’s theorem, there exists a unique reproducing kernel Hilbert space (RKHS) Hk\mathcal{H}_k consisting of functions f:HRf:\mathcal{H}\to\mathbb{R} with the following properties (Guella, 2020):

  • For each yHy\in\mathcal{H}, k(,y)Hkk(\cdot, y) \in \mathcal{H}_k
  • The reproducing property: fHk\forall f \in \mathcal{H}_k, f(y)=f,k(,y)Hkf(y) = \langle f, k(\cdot, y) \rangle_{\mathcal{H}_k}
  • The span {k(,y):yH}\{k(\cdot, y) : y \in \mathcal{H}\} is dense in Hk\mathcal{H}_k

Every fHkf \in \mathcal{H}_k admits a representation f()=i=1cik(,xi)f(\cdot) = \sum_{i=1}^\infty c_i k(\cdot, x_i) (converging in Hk\mathcal{H}_k), with norm fHk2=i,jcicjk(xi,xj)\|f\|_{\mathcal{H}_k}^2 = \sum_{i,j} c_i c_j k(x_i, x_j). The feature map Φ:xk(,x)\Phi: x \mapsto k(\cdot, x) embeds H\mathcal{H} into Hk\mathcal{H}_k so that k(x,y)=Φ(x),Φ(y)Hkk(x, y) = \langle \Phi(x), \Phi(y) \rangle_{\mathcal{H}_k}.

2. Mercer and Hermite Expansions: Explicit Feature Maps

On Rd\mathbb{R}^d, Mercer’s theorem applies to the Gaussian kernel and gives an eigen-decomposition: k(x,y)=n=0λnφn(x)φn(y)k(x,y) = \sum_{n=0}^\infty \lambda_n \varphi_n(x) \varphi_n(y) The expansion is explicitly realized in terms of Hermite polynomials. For the one-dimensional case, with standard normal weight μ0(dx)\mu_0(dx) and orthonormal Hermite functions ψn(x)\psi_n(x), the kernel admits

exp(σ2(xy)2)=n=0λnφn(x)φn(y)\exp\left(-\sigma^2(x-y)^2\right) = \sum_{n=0}^\infty \lambda_n \varphi_n(x) \varphi_n(y)

where the φn\varphi_n are scaled Hermite functions and λn\lambda_n depends explicitly on σ2\sigma^2 (Gnewuch et al., 2021). In higher dimensions, the eigenfunctions and eigenvalues tensorize across input coordinates.

The associated feature map is

Φ(x)=(λβΦβ(x))βN0s2(N0s)\Phi(x) = \left(\sqrt{\lambda_\beta} \Phi_\beta(x) \right)_{\beta \in \mathbb{N}_0^s} \in \ell^2(\mathbb{N}_0^s)

where Φβ(x)\Phi_\beta(x) is the tensor product of univariate Hermite functions. For general (possibly infinite) ss, the maximal domain consists of sequences (xj)(x_j) with jσj2xj2<\sum_j \sigma_j^2 x_j^2 < \infty, and the RKHS is the incomplete tensor product of the univariate spaces (Gnewuch et al., 2021).

3. Fourier (Bochner) Representations and Random Features

For shift-invariant kernels, Bochner’s theorem yields a spectral representation: k(x,y)=Rdeiω(xy)μ(dω)k(x,y) = \int_{\mathbb{R}^d} e^{i \omega \cdot (x-y)} \mu(d\omega) For the Gaussian kernel, μ\mu is a Gaussian measure with density μ(dω)=(2πσ2)d/2eσ2ω2/2dω\mu(d\omega) = (2\pi\sigma^2)^{-d/2} e^{-\sigma^2 \|\omega\|^2/2} d\omega (Jorgensen et al., 2017). The (complex-valued) feature map is ϕ(x,ω)=eiωx\phi(x,\omega) = e^{i \omega \cdot x} and ϕ(x,)\phi(x, \cdot) belongs to L2(μ)L^2(\mu).

In practical applications, random Fourier features (RFF) approximate the Gaussian kernel by finite-dimensional real-valued feature maps (Ton et al., 2017). Drawing wiN(0,σ2Id)w_i \sim \mathcal{N}(0,\sigma^{-2} I_d) and biUniform[0,2π]b_i \sim \text{Uniform}[0, 2\pi],

ϕ(x)=2D(cos(w1x+b1),,cos(wDx+bD))T\phi(x) = \sqrt{\frac{2}{D}} \left( \cos(w_1^\top x + b_1), \ldots, \cos(w_D^\top x + b_D) \right)^T

The inner product of finite feature vectors converges to k(x,y)k(x,y) at rate O(1/D)O(1/\sqrt{D}) in spectral norm, as DD\to\infty (Ton et al., 2017).

4. Polynomial and Maclaurin Expansions; Localized Feature Maps

Equivalently, the exponential can be expanded via Maclaurin series,

k(x,y)=σ2exp(x2+y222)n=0(xy)nn!2nk(x,y) = \sigma^2 \exp\left(-\frac{\|x\|^2 + \|y\|^2}{2\ell^2} \right) \sum_{n=0}^\infty \frac{(x\cdot y)^n}{n! \ell^{2n}}

yielding the "polynomial sketch" feature map, where monomials are approximated by random projections (polynomial sketches) (Wacker et al., 2022). Localization—centering the feature map around the test point—cures the pathologies of naive Maclaurin truncation, yielding accurate, finite-dimensional local approximations, especially effective for high-frequency (short lengthscale) data regimes.

5. Finite-Dimensional Explicit Feature Maps for Finite Sets

For a finite dataset {x1,...,xN}\{x_1, ..., x_N\}, an explicit, exact NN-dimensional feature map can be constructed (Ghiasi-Shirazi et al., 2024). Defining the kernel matrix KRN×NK \in \mathbb{R}^{N \times N}, its inverse square root K1/2K^{-1/2}, and kz=(k(x1,z),...,k(xN,z))Tk_z = (k(x_1, z), ..., k(x_N, z))^T, one defines

ϕ(z)=K1/2kz\phi(z) = K^{-1/2} k_z

This map reproduces the kernel exactly between any pair x,yx, y where at least one argument is a training point: ϕ(x)Tϕ(y)=k(x,y)\phi(x)^T \phi(y) = k(x, y). The explicit feature representation enables kernel PCA and other linear methods to be directly implemented in the primal space.

6. Fock and Segal–Bargmann Space Constructions

The Gaussian feature space can be realized as a Fock (Segal–Bargmann) space of entire functions F:CdCF: \mathbb{C}^d \to \mathbb{C},

FHσ2=(2πσ2)dCdF(z)2e2z2/σ2dA(z)<\|F\|_{H_\sigma}^2 = \left(\frac{2}{\pi \sigma^2}\right)^d \int_{\mathbb{C}^d} |F(z)|^2 e^{-2|z|^2 / \sigma^2} dA(z) < \infty

with an orthonormal basis of monomials and explicit reproducing kernel KF(z,w)=exp(azw)K_F(z,w) = \exp(a z \cdot w) (a=2/σ2a = 2/\sigma^2) (Alpay et al., 2022). The Segal–Bargmann transform provides an isometric isomorphism between L2(Rd)L^2(\mathbb{R}^d) and HσH_\sigma. The Gaussian kernel is realized as the inner product in this space,

kσ(x,y)=Φσ(x),Φσ(y)Hσk_\sigma(x, y) = \langle \Phi_\sigma(x), \Phi_\sigma(y) \rangle_{H_\sigma}

with explicit Hermite-basis expansion.

7. Universality, ISPD, and Domain Extensions

The Gaussian kernel is universal, integrally strictly positive definite (ISPD), and C0C_0-universal on Hilbert spaces; Hk\mathcal{H}_k is dense in C0(H)C_0(\mathcal{H}), ensuring that the feature-map embedding is rich enough to approximate all continuous functions vanishing at infinity (Guella, 2020). These properties ensure strong consistency and flexibility for statistical learning. Analogous constructions exist for Gaussian-type kernels on non-Euclidean domains, including hyperbolic spaces via Schoenberg’s theorem and generalizations to conditional negative definiteness (CND) kernels.

Table: Major Constructions of Gaussian Kernel Feature Spaces

Construction Type Space / Domain Feature Map / Expansion
Hermite/Mercer Expansion Rd,H\mathbb{R}^d,\mathcal{H} Hermite polynomials via Mercer, 2\ell^2
Bochner/Fourier Representation Rd\mathbb{R}^d L2(μ)L^2(\mu) (Fourier basis, RFF)
Polynomial/Maclaurin Rd\mathbb{R}^d Monomials/polynomial sketches, localized
Fock/Segal–Bargmann Space Cd\mathbb{C}^d Entire holomorphic functions, Hermite basis
Explicit finite-dimensional NN-point subset NN-dimensional exact construction

Each construction realizes the RKHS with different analytical and computational properties: Hermite gives orthonormal bases, Bochner yields spectral sampling, polynomial/Maclaurin facilitates sparse approximations, Fock spaces bring connections to complex analysis and quantum mechanics, while finite-dimensional explicit maps enable primal implementations over finite data sets.

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Feature Space for Gaussian Kernels.