Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scale-Mixture Representations for Isotropic Kernels

Updated 10 January 2026
  • Scale-mixture representations for isotropic kernels express positive-definite kernels as integrals over scale parameters, unifying classical RBFs with multiscale approaches.
  • The framework leverages Bochner’s and Schoenberg’s theorems to provide explicit constructions of reproducing kernel Hilbert spaces with minimal-decomposition norms.
  • Applications range from efficient random Fourier feature sampling in machine learning to multiscale image registration and neural network kernel limits.

A scale-mixture representation for isotropic kernels expresses a positive-definite (PD) kernel as an integral (or sum) over a parametric family of isotropic kernels, typically controlled by a scale parameter. This framework unifies classical descriptions of radial basis functions (RBFs), allows for multiscale modeling, and provides explicit constructions for both the kernel and the associated reproducing kernel Hilbert space (RKHS). Scale mixtures are central to topics ranging from machine learning via random Fourier features to image registration, and connect directly to foundational characterizations by Bochner and Schoenberg.

1. Fundamental Representation and Theoretical Framework

A function k(xy)k(\|x-y\|) on Rd×Rd\mathbb{R}^d \times \mathbb{R}^d is called an isotropic kernel if it depends only on the Euclidean distance between xx and yy. Classical results (Bochner, Schoenberg) establish that any continuous, shift-invariant, positive-definite isotropic kernel is a scale mixture of basic kernel “atoms.” Specifically, for a wide class of parameterized kernels φ(r,s)\varphi(r, s) and a finite nonnegative measure μ\mu on [0,)[0, \infty):

k(xy)=0φ(xy,s)  dμ(s).k(\|x-y\|) = \int_0^\infty \varphi(\|x-y\|, s)\; d\mu(s).

Typical choices include φ(r,s)=exp(sr2)\varphi(r, s) = \exp(-s r^2) (Gaussian), yielding mixtures of Gaussians, and other forms yielding Matérn or compactly supported kernels (Hotz et al., 2012).

These integrals produce kernels that are positive definite for any nonnegative measure μ\mu, as nonnegative linear combinations or integrals of PD kernels are themselves PD (Bruveris et al., 2011). The kernel kk is then the reproducing kernel of the image of a direct-integral Hilbert space of functions parameterized by ss, with the RKHS norm given by a minimal decomposition property:

fH2=inf{0a(s)2dμ(s):f(x)=0a(s)φ(x,s)dμ(s)}.\|f\|_H^2 = \inf \left\{ \int_0^\infty |a(s)|^2 d\mu(s) : f(x) = \int_0^\infty a(s) \varphi(\|x\|, s) d\mu(s) \right\}.

2. Scale-Mixture Representations: Classical and Generalized Forms

Numerous kernel families admit scale-mixture representations as specific cases of the above framework:

  • Rational Quadratic: (1+r2/(2α2))α(1 + r^2/(2\alpha \ell^2))^{-\alpha} is a scale mixture of Gaussians, with a mixing measure corresponding to an inverse-gamma distribution (Hotz et al., 2012).
  • Matérn Kernel: The Matérn family is expressible as

k(r)=0exp(sr2)  wMat(s)dsk(r) = \int_0^\infty \exp(-s r^2)\; w_{\text{Mat}}(s) ds

with wMat(s)w_{\text{Mat}}(s) derived from the Bessel function representation (Hotz et al., 2012, Langrené et al., 2024).

  • Generalized Cauchy and Exponential Power: For generalized Cauchy (1+uα)β(1 + \|u\|^\alpha)^{-\beta} and exponential-power exp(up)\exp(-\|u\|^p), the spectral (Bochner) densities also admit scale-mixture forms as integrals over Gaussians with a properly chosen density g(τ)g(\tau) (Langrené et al., 2024).

Schoenberg’s theorem provides the most general characterization: any O(d)O(d)-invariant kernel on Rd\mathbb{R}^d can be written as an infinite series of radial kernels weighted by normalized Gegenbauer polynomials (zonal polynomials), with strictly positive definite kernels characterized by conditions on the radial coefficients αn(d)\alpha_n^{(d)} (Benning et al., 27 Jun 2025). The scale-mixture form in the stationary case recovers classical “Gaussian mixtures,” with the kernel written as

k(xy)=0Ωd(sxy)dμ(s),k(\|x-y\|) = \int_0^\infty \Omega_d(s \|x-y\|) d\mu(s),

where Ωd\Omega_d is a dimension-dependent Bessel-type function.

3. Associated Hilbert Spaces and Minimal-Decomposition Norms

The scale-mixture construction induces a RKHS via a direct integral. For a discrete mixture with nn scales and kernels kσik_{\sigma_i}, the corresponding RKHS H=H1++HnH=H_1+\dots+H_n is equipped with a norm defined by

vH2=infv=v1++vni=1nviHi2,\|v\|_H^2 = \inf_{v=v_1+\dots+v_n} \sum_{i=1}^n \|v_i\|_{H_i}^2,

where each HiH_i is the RKHS associated to kσik_{\sigma_i} (Bruveris et al., 2011). The reproducing kernel of HH is then the sum of the individual kernels:

k(x,y)=i=1nkσi(xy).k(x, y) = \sum_{i=1}^n k_{\sigma_i}(\|x-y\|).

In the continuous case, the direct-integral space consists of functions u(s,)u(s, \cdot) such that u(s,)Hs2dλ(s)<\int \|u(s, \cdot)\|_{H_s}^2 d\lambda(s) < \infty, and the kernel is given by the integral over ss (Hotz et al., 2012). The RKHS norm is again a minimal-decomposition norm.

This formalism generalizes to include Mercer expansions, integral-operator kernels, and various compactly supported kernels (e.g., Wendland kernels), providing a unified approach to many classical and modern kernel classes.

4. Applications in Learning and Geometry

Scale-mixture representations have significant practical and theoretical applications:

  • Random Fourier Features (RFF): For shift-invariant isotropic kernels, the spectral density can be written as a Gaussian-scale mixture, enabling efficient sampling for RFF construction. Instead of sampling from a fixed Gaussian, one samples a variance parameter τ\tau from the mixing law and then samples ω\omega from N(0,(2τ)1Id)\mathcal{N}(0, (2\tau)^{-1}I_d). This enables RFF approximations for a wide range of kernels, including Matérn, generalized Cauchy, exponential power, Beta, Kummer, and Tricomi families (Langrené et al., 2024).
  • Kernel Ridge Regression, SVM, Gaussian Processes: Scale mixtures yield closed-form expressions for kernels suitable for low-rank approximation and efficient learning (Langrené et al., 2024).
  • Image Registration and LDDMM: In large-deformation diffeomorphic metric mapping (LDDMM), mixed-kernel RKHSs correspond to multiscale models for diffeomorphic flows. The equivalence between variational formulations using a single sum-kernel and joint multiscale optimization is established via Lagrange multipliers and relates to an iterated semidirect-product decomposition of diffeomorphism groups (Bruveris et al., 2011).
  • Inverse Problems and Integral Operators: Regularization strategies can be implemented in large direct-integral RKHSs, then pulled back to finite-rank expansions (Hotz et al., 2012).

5. Spectral and Structural Characterizations

The scale-mixture view is underpinned by spectral theory. By Bochner’s theorem, any continuous, shift-invariant, PD kernel K(u)=k(u)K(u) = k(\|u\|) is the Fourier transform of a finite nonnegative measure SS. Schoenberg’s extension ensures that for isotropic kernels, k(r)k(r) is the Laplace transform of a positive measure over [0,)[0, \infty) in r2r^2, implying complete monotonicity (Langrené et al., 2024, Hotz et al., 2012, Benning et al., 27 Jun 2025).

In spectral mixture representations:

K(u)=eiωudν(ω),S(ω)=0eτω2g(τ)dτ,K(u) = \int e^{i \omega^\top u} d\nu(\omega), \quad S(\omega) = \int_0^{\infty} e^{-\tau \|\omega\|^2} g(\tau) d\tau,

where g(τ)g(\tau) is the mixture density determined by the kernel, allowing constructive sampling and explicit feature map construction for a large set of RBF kernels (Langrené et al., 2024).

6. Connections to General Isotropic Kernels and Neural Network Limits

The most general O(d)O(d)-invariant (isotropic) kernels are parametrized not only by the distance but also by the dot product, reducing to scale mixtures in the stationary case and to Taylor expansions in the dot-product case. Continuous, O(d)O(d)-invariant, PD kernels admit expansions of the form:

K(x,y)=n=0αn(d)(x,y)P~nλ(x,yxy),K(x, y) = \sum_{n=0}^\infty \alpha_n^{(d)}(\|x\|, \|y\|) \widetilde{P}_n^\lambda \left( \frac{\langle x, y \rangle}{\|x\| \|y\|} \right),

where αn(d)\alpha_n^{(d)} are scale-mixture coefficients and P~nλ\widetilde{P}_n^\lambda are normalized Gegenbauer polynomials. The stationary case corresponds to αn(d)\alpha_n^{(d)} as explicit scale mixtures over radial functions, while dot-product kernels have αn(d)(r,s)=anrnsn\alpha_n^{(d)}(r, s) = a_n r^n s^n (Benning et al., 27 Jun 2025).

Infinite-width limits of neural networks yield O(d)O(d)-invariant kernels in this class, with explicit Gegenbauer or Hermite expansions determined by the activation function (Benning et al., 27 Jun 2025).

7. Examples and Practical Construction

A comparative summary of prototypical isotropic kernel scale-mixtures:

Kernel Class Mixture Formulation Mixing Measure / Density
Gaussian k(r)=exp(r2/σ2)k(r) = \exp(-r^2 / \sigma^2) Dirac at σ\sigma
Rational Quadratic k(r)=(1+r2/(2α2))αk(r) = (1 + r^2/(2\alpha\ell^2))^{-\alpha} Inverse-gamma over scale ss
Matérn k(r)=21νΓ(ν)1(r/)νKν(r/)k(r) = 2^{1-\nu} \Gamma(\nu)^{-1} (r/\ell)^\nu K_\nu(r/\ell) sν1e2s/2ds\propto s^{-\nu-1} e^{-\ell^2 s/2} ds
Generalized Cauchy k(r)=(1+rα)βk(r) = (1 + r^\alpha)^{-\beta} τβ1e1/(4τ)dτ\propto \tau^{\beta-1} e^{-1/(4\tau)} d\tau
Wendland (compact support) k(r)=0(1sr)+mp(sr)dμ(s)k(r) = \int_0^{\infty} (1 - s r)^m_+ p(s r)\, d\mu(s) Any finite positive μ\mu

Explicit algorithms for random feature sampling (RFF) involve drawing the scale parameter from the kernel’s mixture density, then sampling a Gaussian direction. No additional complexity is introduced compared to the basic RFF approach; the only change is in the law of the scale parameter (Langrené et al., 2024).

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Scale-Mixture Representations for Isotropic Kernels.