Scale-Mixture Representations for Isotropic Kernels

Updated 10 January 2026

Scale-mixture representations for isotropic kernels express positive-definite kernels as integrals over scale parameters, unifying classical RBFs with multiscale approaches.
The framework leverages Bochner’s and Schoenberg’s theorems to provide explicit constructions of reproducing kernel Hilbert spaces with minimal-decomposition norms.
Applications range from efficient random Fourier feature sampling in machine learning to multiscale image registration and neural network kernel limits.

A scale-mixture representation for isotropic kernels expresses a positive-definite (PD) kernel as an integral (or sum) over a parametric family of isotropic kernels, typically controlled by a scale parameter. This framework unifies classical descriptions of radial basis functions (RBFs), allows for multiscale modeling, and provides explicit constructions for both the kernel and the associated reproducing kernel Hilbert space (RKHS). Scale mixtures are central to topics ranging from machine learning via random Fourier features to image registration, and connect directly to foundational characterizations by Bochner and Schoenberg.

1. Fundamental Representation and Theoretical Framework

A function $k(\|x-y\|)$ on $\mathbb{R}^d \times \mathbb{R}^d$ is called an isotropic kernel if it depends only on the Euclidean distance between $x$ and $y$ . Classical results (Bochner, Schoenberg) establish that any continuous, shift-invariant, positive-definite isotropic kernel is a scale mixture of basic kernel “atoms.” Specifically, for a wide class of parameterized kernels $\varphi(r, s)$ and a finite nonnegative measure $\mu$ on $[0, \infty)$ :

$k(\|x-y\|) = \int_0^\infty \varphi(\|x-y\|, s)\; d\mu(s).$

Typical choices include $\varphi(r, s) = \exp(-s r^2)$ (Gaussian), yielding mixtures of Gaussians, and other forms yielding Matérn or compactly supported kernels (Hotz et al., 2012).

These integrals produce kernels that are positive definite for any nonnegative measure $\mu$ , as nonnegative linear combinations or integrals of PD kernels are themselves PD (Bruveris et al., 2011). The kernel $k$ is then the reproducing kernel of the image of a direct-integral Hilbert space of functions parameterized by $s$ , with the RKHS norm given by a minimal decomposition property:

$\|f\|_H^2 = \inf \left\{ \int_0^\infty |a(s)|^2 d\mu(s) : f(x) = \int_0^\infty a(s) \varphi(\|x\|, s) d\mu(s) \right\}.$

2. Scale-Mixture Representations: Classical and Generalized Forms

Numerous kernel families admit scale-mixture representations as specific cases of the above framework:

Rational Quadratic: $(1 + r^2/(2\alpha \ell^2))^{-\alpha}$ is a scale mixture of Gaussians, with a mixing measure corresponding to an inverse-gamma distribution (Hotz et al., 2012).
Matérn Kernel: The Matérn family is expressible as

$k(r) = \int_0^\infty \exp(-s r^2)\; w_{\text{Mat}}(s) ds$

with $w_{\text{Mat}}(s)$ derived from the Bessel function representation (Hotz et al., 2012, Langrené et al., 2024).

Generalized Cauchy and Exponential Power: For generalized Cauchy $(1 + \|u\|^\alpha)^{-\beta}$ and exponential-power $\exp(-\|u\|^p)$ , the spectral (Bochner) densities also admit scale-mixture forms as integrals over Gaussians with a properly chosen density $g(\tau)$ (Langrené et al., 2024).

Schoenberg’s theorem provides the most general characterization: any $O(d)$ -invariant kernel on $\mathbb{R}^d$ can be written as an infinite series of radial kernels weighted by normalized Gegenbauer polynomials (zonal polynomials), with strictly positive definite kernels characterized by conditions on the radial coefficients $\alpha_n^{(d)}$ (Benning et al., 27 Jun 2025). The scale-mixture form in the stationary case recovers classical “Gaussian mixtures,” with the kernel written as

$k(\|x-y\|) = \int_0^\infty \Omega_d(s \|x-y\|) d\mu(s),$

where $\Omega_d$ is a dimension-dependent Bessel-type function.

3. Associated Hilbert Spaces and Minimal-Decomposition Norms

The scale-mixture construction induces a RKHS via a direct integral. For a discrete mixture with $n$ scales and kernels $k_{\sigma_i}$ , the corresponding RKHS $H=H_1+\dots+H_n$ is equipped with a norm defined by

$\|v\|_H^2 = \inf_{v=v_1+\dots+v_n} \sum_{i=1}^n \|v_i\|_{H_i}^2,$

where each $H_i$ is the RKHS associated to $k_{\sigma_i}$ (Bruveris et al., 2011). The reproducing kernel of $H$ is then the sum of the individual kernels:

$k(x, y) = \sum_{i=1}^n k_{\sigma_i}(\|x-y\|).$

In the continuous case, the direct-integral space consists of functions $u(s, \cdot)$ such that $\int \|u(s, \cdot)\|_{H_s}^2 d\lambda(s) < \infty$ , and the kernel is given by the integral over $s$ (Hotz et al., 2012). The RKHS norm is again a minimal-decomposition norm.

This formalism generalizes to include Mercer expansions, integral-operator kernels, and various compactly supported kernels (e.g., Wendland kernels), providing a unified approach to many classical and modern kernel classes.

4. Applications in Learning and Geometry

Scale-mixture representations have significant practical and theoretical applications:

Random Fourier Features (RFF): For shift-invariant isotropic kernels, the spectral density can be written as a Gaussian-scale mixture, enabling efficient sampling for RFF construction. Instead of sampling from a fixed Gaussian, one samples a variance parameter $\tau$ from the mixing law and then samples $\omega$ from $\mathcal{N}(0, (2\tau)^{-1}I_d)$ . This enables RFF approximations for a wide range of kernels, including Matérn, generalized Cauchy, exponential power, Beta, Kummer, and Tricomi families (Langrené et al., 2024).
Kernel Ridge Regression, SVM, Gaussian Processes: Scale mixtures yield closed-form expressions for kernels suitable for low-rank approximation and efficient learning (Langrené et al., 2024).
Image Registration and LDDMM: In large-deformation diffeomorphic metric mapping (LDDMM), mixed-kernel RKHSs correspond to multiscale models for diffeomorphic flows. The equivalence between variational formulations using a single sum-kernel and joint multiscale optimization is established via Lagrange multipliers and relates to an iterated semidirect-product decomposition of diffeomorphism groups (Bruveris et al., 2011).
Inverse Problems and Integral Operators: Regularization strategies can be implemented in large direct-integral RKHSs, then pulled back to finite-rank expansions (Hotz et al., 2012).

5. Spectral and Structural Characterizations

The scale-mixture view is underpinned by spectral theory. By Bochner’s theorem, any continuous, shift-invariant, PD kernel $K(u) = k(\|u\|)$ is the Fourier transform of a finite nonnegative measure $S$ . Schoenberg’s extension ensures that for isotropic kernels, $k(r)$ is the Laplace transform of a positive measure over $[0, \infty)$ in $r^2$ , implying complete monotonicity (Langrené et al., 2024, Hotz et al., 2012, Benning et al., 27 Jun 2025).

In spectral mixture representations:

$K(u) = \int e^{i \omega^\top u} d\nu(\omega), \quad S(\omega) = \int_0^{\infty} e^{-\tau \|\omega\|^2} g(\tau) d\tau,$

where $g(\tau)$ is the mixture density determined by the kernel, allowing constructive sampling and explicit feature map construction for a large set of RBF kernels (Langrené et al., 2024).

6. Connections to General Isotropic Kernels and Neural Network Limits

The most general $O(d)$ -invariant (isotropic) kernels are parametrized not only by the distance but also by the dot product, reducing to scale mixtures in the stationary case and to Taylor expansions in the dot-product case. Continuous, $O(d)$ -invariant, PD kernels admit expansions of the form:

$K(x, y) = \sum_{n=0}^\infty \alpha_n^{(d)}(\|x\|, \|y\|) \widetilde{P}_n^\lambda \left( \frac{\langle x, y \rangle}{\|x\| \|y\|} \right),$

where $\alpha_n^{(d)}$ are scale-mixture coefficients and $\widetilde{P}_n^\lambda$ are normalized Gegenbauer polynomials. The stationary case corresponds to $\alpha_n^{(d)}$ as explicit scale mixtures over radial functions, while dot-product kernels have $\alpha_n^{(d)}(r, s) = a_n r^n s^n$ (Benning et al., 27 Jun 2025).

Infinite-width limits of neural networks yield $O(d)$ -invariant kernels in this class, with explicit Gegenbauer or Hermite expansions determined by the activation function (Benning et al., 27 Jun 2025).

7. Examples and Practical Construction

A comparative summary of prototypical isotropic kernel scale-mixtures:

Kernel Class	Mixture Formulation	Mixing Measure / Density
Gaussian	$k(r) = \exp(-r^2 / \sigma^2)$	Dirac at $\sigma$
Rational Quadratic	$k(r) = (1 + r^2/(2\alpha\ell^2))^{-\alpha}$	Inverse-gamma over scale $s$
Matérn	$k(r) = 2^{1-\nu} \Gamma(\nu)^{-1} (r/\ell)^\nu K_\nu(r/\ell)$	$\propto s^{-\nu-1} e^{-\ell^2 s/2} ds$
Generalized Cauchy	$k(r) = (1 + r^\alpha)^{-\beta}$	$\propto \tau^{\beta-1} e^{-1/(4\tau)} d\tau$
Wendland (compact support)	$k(r) = \int_0^{\infty} (1 - s r)^m_+ p(s r)\, d\mu(s)$	Any finite positive $\mu$

Explicit algorithms for random feature sampling (RFF) involve drawing the scale parameter from the kernel’s mixture density, then sampling a Gaussian direction. No additional complexity is introduced compared to the basic RFF approach; the only change is in the law of the scale parameter (Langrené et al., 2024).

References

Mixture of Kernels and Iterated Semidirect Product of Diffeomorphisms Groups (Bruveris et al., 2011)
Representation by Integrating Reproducing Kernels (Hotz et al., 2012)
Schoenberg characterization of continuous non-stationary isotropic positive definite kernels (Benning et al., 27 Jun 2025)
A spectral mixture representation of isotropic kernels to generalize random Fourier features (Langrené et al., 2024)