Kolmogorov–Arnold Representation Theorem

Updated 16 February 2026

The Kolmogorov–Arnold theorem is a foundational result stating that any multivariate continuous function can be represented as a finite superposition of univariate functions, ensuring universal approximation.
It underpins the design of Kolmogorov–Arnold Networks (KANs), which leverage univariate basis functions to achieve enhanced efficiency, interpretability, and parameter economy compared to traditional MLPs.
Physics-informed variants (PIKANs) integrate learned univariate transformations with physics constraints to solve PDEs and ODEs more accurately and robustly.

The Kolmogorov–Arnold Representation Theorem is a foundational result in mathematical analysis that underpins a new class of neural network architectures—Kolmogorov–Arnold Networks (KANs)—which are now widely used in physics-informed scientific machine learning. The theorem guarantees that any multivariate continuous function can be decomposed into a finite superposition of univariate continuous functions. Modern developments, particularly in physics-informed machine learning, leverage this theorem to design networks (typically called Physics-Informed Kolmogorov–Arnold Networks, or PIKANs) that have provable universal approximation properties, improved parsimony, and enhanced interpretability compared to traditional multilayer perceptrons (MLPs). The following sections detail the theorem, its mathematical formulation, network architectures inspired by it, and implications for scientific computing.

1. The Kolmogorov–Arnold Representation Theorem

The Kolmogorov–Arnold theorem (1957) asserts that every continuous function $f: [0,1]^d \to \mathbb{R}$ can be represented as a finite sum of univariate continuous functions composed in a specific manner. Explicitly, for any such $f$ , there exist continuous univariate functions $\psi_{ij}$ and $\Phi_i$ such that

$f(x_1, \ldots, x_d) = \sum_{i=1}^{2d+1} \Phi_i\Bigg( \sum_{j=1}^d \psi_{ij}(x_j) \Bigg)$

This decomposition shows that any multivariate functional relationship can be exactly written as a sum over $2d+1$ terms, each term applying an outer univariate nonlinearity $\Phi_i$ to an inner sum of $d$ univariate nonlinearities $\psi_{ij}$ applied to each input coordinate. The construction ensures continuous and flexible parametrization and directly motivates separating multivariate function approximation into univariate subproblems (Pérez-Bernal et al., 12 Dec 2025, Patra et al., 2024, Toscano et al., 2024).

2. Mathematical and Network Formulation

The theorem's functional structure directly informs the design of Kolmogorov–Arnold Networks. In the context of neural networks, a KAN replaces the matrix-vector multiplications of conventional MLPs with edge-wise learnable univariate functions. For a network layer of input width $n_l$ and output width $f$ 0, the update is

$f$ 1

where each $f$ 2 is itself a trainable univariate function (often parameterized as a spline, a polynomial expansion, or a small neural net) (Pérez-Bernal et al., 12 Dec 2025, Shuai et al., 2024).

Physically informed versions (PIKANs) formulate the surrogate solution $f$ 3 to a PDE as

$f$ 4

with both outer ( $f$ 5) and inner ( $f$ 6) univariate maps parameterized and learned from data and/or physics constraints. The trainable functions can be B-splines (Pérez-Bernal et al., 12 Dec 2025, Shuai et al., 2024), Chebyshev polynomials (Toscano et al., 2024), wavelets (Patra et al., 2024, Heravifard et al., 12 Dec 2025), or, in specialized variants, sinc functions or Jacobi polynomials (Yu et al., 2024, Kashefi et al., 8 Apr 2025).

3. Physics-Informed Applications and Loss Construction

PIKANs are typically employed as solution ansätze for partial differential equations (PDEs) or ordinary differential equations (ODEs), with the network parameters optimized to minimize a composite loss functional. For PDE surrogacy, the canonical loss takes the form

$f$ 7

where

$f$ 8 is the mean squared physics (PDE) residual, computed via automatic differentiation and collocation points,
$f$ 9 enforces empirical alignment at selected observation points (if available),
$\psi_{ij}$ 0 penalizes violations of boundary or initial conditions (Pérez-Bernal et al., 12 Dec 2025, Patra et al., 2024, Toscano et al., 2024).

Sampling strategies depend on the problem domain; in unbounded domains, sampling from exponential or Gaussian distributions is used to emphasize the region of interest and avoid unnecessary computations in trivial far-fields (Pérez-Bernal et al., 12 Dec 2025).

4. Advantages and Limitations of KAN/PIKAN Architectures

KAN- and PIKAN-based architectures offer several advantages over classical PINNs:

Universal Approximation and Parsimony: The Kolmogorov–Arnold decomposition guarantees universal approximation with far fewer parameters than a comparable MLP, particularly in low- to moderate-dimensional settings (Pérez-Bernal et al., 12 Dec 2025, Patra et al., 2024, Shuai et al., 2024).
Improved Interpretability: Because all nonlinearities are learned and univariate, one can directly visualize and interpret each learned transformation, providing an avenue for scientific insight and explainability (Pérez-Bernal et al., 12 Dec 2025, Tekbıyık et al., 7 Oct 2025).
Spectral Bias Mitigation: By separating the multivariate problem into univariate branches, KANs/PIKANs are less susceptible to spectral bias (the tendency to fit low frequencies first), especially when basis functions such as splines, Chebyshev polynomials, and wavelets are included (Faroughi et al., 9 Jun 2025, Jacob et al., 2024, Heravifard et al., 12 Dec 2025).
Parameter Efficiency: PIKANs can attain similar or better accuracy than MLP-based PINNs with significantly fewer parameters, as demonstrated in power system dynamics (Shuai et al., 2024), elasticity problems (Gong et al., 23 Aug 2025), and channel modeling (Tekbıyık et al., 7 Oct 2025).

However, limitations include:

Training Overhead: PIKANs incur higher per-epoch computational cost due to the evaluation and differentiation of univariate basis expansions, which also complicates GPU optimization (Pérez-Bernal et al., 12 Dec 2025).
Scaling with Dimension: The number of branches or terms grows with input dimension, and naive implementations suffer from the curse of dimensionality, though recent variants such as SPIKANs address this via tensor product decompositions (Jacob et al., 2024).
Numerical Instabilities: Extrapolation beyond the basis function span or excessive network depth can cause numerical instabilities or vanishing gradients (Pérez-Bernal et al., 12 Dec 2025, Rigas et al., 27 Oct 2025).

5. Benchmarks, Variants, and Hybrid Designs

PIKANs have been benchmarked on a wide variety of ODE and PDE inverse and forward problems, routinely achieving sub-percent or even sub-millimeter errors with 1–2 orders of magnitude fewer parameters or training epochs than PINNs (Patra et al., 2024, Shuai et al., 2024, Gong et al., 23 Aug 2025). Key developments and variants include:

Wavelet- and Hybrid-Basis PIKANs: Multiresolution and localized features are incorporated via wavelet basis functions (WAV-KAN, HWF-PIKAN), resulting in rapid convergence for problems with sharp gradients or discontinuities (Patra et al., 2024, Heravifard et al., 12 Dec 2025).
Adaptive and Grid-Dependent PIKANs: Networks dynamically adapt their basis grids to error-prone regions, combining residual-based attention and adaptive state transition of optimizer momentum (Rigas et al., 2024).
Hybrid Architectures: MLP–KAN convex combinations and domain decomposition strategies allow networks to capture both global and local structure, adapting between low- and high-frequency regimes through trainable weights (Huang et al., 14 Nov 2025).
Tensor Product (SPIKAN): High-dimensional scalability is attained by modeling each input coordinate with its own KAN block and summing outer products, reducing both memory and computational costs (Jacob et al., 2024).
Multifidelity PIKANs: Low-fidelity surrogates are coupled with KAN-based corrections to address data scarcity or multi-resolution scientific computing, delivering order-of-magnitude accuracy improvements with minimal added data (Howard et al., 2024).

6. Optimization, Training Strategies, and Theoretical Insights

Training strategies for PIKANs parallel those for PINNs but benefit uniquely from the kernel structure induced by the Kolmogorov–Arnold decomposition:

Optimization Algorithms: Adam is commonly employed for pretraining, while L-BFGS and advanced second-order methods (notably self-scaled Broyden variants) yield order-of-magnitude improvements in convergence and final error (Kiyani et al., 22 Jan 2025).
Neural Tangent Kernel (NTK) Analysis: NTK analysis reveals a much flatter spectrum for PIKANs/cPIKANs versus PINNs, explaining the improved convergence of high-frequency modes and robustness to local minima (Faroughi et al., 9 Jun 2025).
Domain Scaling and Initialization: Chebyshev-based PIKANs are stabilized by scaling domains to $\psi_{ij}$ 1 and employing Glorot-like initialization to preserve signal variance through deep architectures (Mostajeran et al., 6 Jan 2025, Rigas et al., 27 Oct 2025).
Information Bottleneck and Training Dynamics: PIKANs pass through fitting, diffusion, and diffusion-equilibrium phases as complexity and SNR evolve; deep cPIKANs require careful initialization or gating to avoid diffusion-phase stagnation (Rigas et al., 27 Oct 2025, Yang et al., 26 Jul 2025).

7. Practical Guidelines and Application Domains

Current evidence indicates PIKANs are particularly advantageous when:

The problem dimension $\psi_{ij}$ 2 is moderate—so $\psi_{ij}$ 3 branches do not cause intractable parameter growth (Pérez-Bernal et al., 12 Dec 2025, Shuai et al., 2024).
Increased interpretability of learned function structure is required (e.g., in scientific inference, model reduction, or symbolic extraction) (Tekbıyık et al., 7 Oct 2025, Shuai et al., 2024).
Material discontinuities, multi-material, or multi-geometry problems are present, benefiting from local adaptivity provided by spline and polynomial bases (Gong et al., 23 Aug 2025, Kashefi et al., 8 Apr 2025).
Data scarcity motivates multifidelity surrogates or wafer-scale parameter sharing (Howard et al., 2024, Kashefi et al., 8 Apr 2025).
Efficiency and memory constraints are paramount, and compact network structures are desired (Tekbıyık et al., 7 Oct 2025).

Possible extensions include mixed MLP–KAN hybrids, advanced basis functions (wavelets, Chebyshev, sinc), adaptive collocation and basis refinement, and domain or parameter decomposition for high-dimensional or multi-scale PDEs (Huang et al., 14 Nov 2025, Heravifard et al., 12 Dec 2025, Jacob et al., 2024).

PIKAN-based methods have demonstrated marked success in applications ranging from electronic packaging mechanics (Gong et al., 23 Aug 2025) and power system dynamics (Shuai et al., 2024) to financial deep RL (Thoi et al., 1 Feb 2026) and explainable wireless channel modeling (Tekbıyık et al., 7 Oct 2025), indicating the broad applicability of the Kolmogorov–Arnold decomposition paradigm in computational science and engineering.