Fourier-Kolmogorov-Arnold Networks

Updated 17 January 2026

Fourier-KANs are neural architectures that combine the Kolmogorov-Arnold representation theorem with truncated Fourier series, enabling adaptive spectral expressivity.
They employ innovative techniques such as matrix association and learnable random Fourier features to reduce parameter complexity while capturing high-frequency signal components.
Their versatile applications in vision, language, audio, and scientific modeling demonstrate empirical gains in accuracy, efficiency, and interpretability.

Fourier Kolmogorov-Arnold Networks (Fourier-KAN) are a class of neural architectures that integrate the Kolmogorov–Arnold representation theorem with Fourier series or Random Fourier Features (RFF) to achieve adaptive spectral expressivity, parameter efficiency, and interpretability. They have been developed to overcome key limitations of vanilla Kolmogorov–Arnold Networks (KAN) such as parameter explosion and inability to capture high-frequency features in high-dimensional learning tasks. Fourier-KANs are now central in vision, language, audio signal representation, time-series anomaly detection, implicit neural representations, graph learning, scientific machine learning, and differentiable operator learning.

1. Mathematical Foundations and Theoretical Guarantees

Fourier-KANs merge the Kolmogorov–Arnold superposition principle, which states that any continuous multivariate function $f:\mathbb{R}^d\to\mathbb{R}$ can be written as sums and compositions of univariate functions, with the Fourier series theorem, which provides spectral bases for function approximation. A canonical Fourier-KAN layer replaces standard piecewise or spline basis functions by truncated Fourier expansions: $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ where $K$ is the spectral order and $\{a_{i,k},b_{i,k}\}$ are learnable parameters per input dimension (Xu et al., 2024, Li et al., 2024). For high-dimensional applications, Fourier-KAN blocks feature further optimizations such as learnable RFF with bandwidth and phase initialization following kernel-theoretic principles (e.g., $\omega_{ij} \sim \mathcal{N}(0, \sigma^2/d)$ and $b_i \sim \text{Uniform}[0,2\pi]$ (Zhang et al., 9 Feb 2025)).

The universality of Fourier-KANs is established for $L^2([0,2\pi]^n)$ , matching deep spline-KANs; smooth multivariate functions can be approximated arbitrarily well by a finite-depth network with Fourier-edge expansions (Li et al., 2024). Polynomial bounds on the parameter complexity are demonstrated for dual-domain Fourier-Kolmogorov–Arnold neural operators (KANO), showing a strict advantage over pure spectral models for dense or position-dependent operators (Lee et al., 20 Sep 2025).

2. Architectural Innovations and Parameter Efficiency

Fourier-KAN architectures address the parameter explosion in standard KANs via matrix association, low-rank spectral projections, and hybrid activation schemes. In Kolmogorov-Arnold-Fourier Networks (KAF) (Zhang et al., 9 Feb 2025):

The dual-matrix structure of classic KAN layers ( $W_A \cdot \phi(x) + W_B \cdot \psi(x)$ ) is collapsed via matrix association into $(W_A + W_B) \cdot (a \cdot \phi(x) + b \cdot \psi(x))$ with element-wise learnable scaling. This reduces the per-layer parameter cost from $d_{in} \times d_{out} \times (G+K+3) + d_{out}$ to $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 0.
Trainable RFF layers enable adaptive spectral embeddings, with analytically differentiable parameters (gradients for $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 1, $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 2 are computed via chain rule).
Adaptive hybrid activations $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 3 begin training with low-frequency bias (small $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 4) and shift toward high-frequency coverage as $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 5 grows, modulating the response spectrum during optimization.

Projective-KANs (P-KANs) further compress KANs by entropy-driven projection to Fourier bases, using sparsity-inducing penalties and gravitational regularization to encourage edge functions to converge to low-dimensional Fourier expansions (Poole et al., 24 Sep 2025). Empirically, this achieves up to 80% parameter reduction per edge while maintaining representational capacity, with stable training under noise.

3. Spectral Adaptivity and Expressivity

Fourier-KANs directly learn the activation frequency content necessary for each task. Unlike fixed positional encodings or global periodic activation networks, Fourier-KANs use learnable or data-adaptive spectral representations:

In KAF, learnable RFF frequencies and phases allow the network to capture the exact bandwidth needed for the signal (Zhang et al., 9 Feb 2025).
In Implicit Neural Representation contexts, first-layer activations have fully adaptive Fourier coefficients, serving as a spectral filter bank that can allocate power to arbitrary bands based on target reconstruction error (Mehrabian et al., 2024).
For audio and time series, frequency-adaptive learning strategies such as an inverted-pyramid mode assignment (layerwise decreasing spectral capacity) and frequency-aware weight initialization ensure rapid convergence across all frequency bands and mitigate spectral bias (Li et al., 10 Jan 2026, Zhou et al., 2024). Fourier-KANs are robust to high-frequency and low-frequency content without hyperparameter sensitivity.

Global support of Fourier bases enhances expressivity for smooth and periodic functions; however, Gibbs phenomena can affect approximation near discontinuities, which is addressed via hybridization or alternative local basis functions (Noorizadegan et al., 28 Oct 2025).

4. Applications Across Domains

Fourier-KANs have been applied in a wide array of domains, consistently achieving empirical gains over standard MLPs, spline-based KANs, and pure spectral models:

Computer Vision: KAF outperforms MLP, KAN, and kernel-based alternatives on MNIST, CIFAR, and ImageNet under tight parameter budgets (e.g., CIFAR-10 at $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 6 params, KAF $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 7 vs. MLP $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 8, FAN $f(x) = \sum_{i=1}^{d} \sum_{k=1}^{K} \left(a_{i,k} \cos(k x_i) + b_{i,k} \sin(k x_i)\right),$ 9) (Zhang et al., 9 Feb 2025).
Language and Audio: For NLP tasks (CoLA, AG_NEWS) and audio processing (SpeechCommand, UrbanSound8K), Fourier-KANs deliver 1–3 point improvements in accuracy and faster convergence (Zhang et al., 9 Feb 2025). In audio, they offer comparable SNR to carefully tuned positional encoding MLPs but without hyperparameter sensitivity (Li et al., 10 Jan 2026).
Implicit Representations and Signal Modeling: In INR for images and 3D shapes, learnable Fourier-layer activations in FKAN improve PSNR/SSIM and IoU over state-of-the-art baselines, converging to fine textures and boundaries more rapidly (Mehrabian et al., 2024).
Graph and Molecular Learning: Fourier-KAN modules improve representation power and trainability in collaborative filtering (FourierKAN-GCF: Recall@20 improvement from 0.3307 to 0.3564 on MOOC (Xu et al., 2024)) and molecular property prediction (KA-GNN: ROC-AUC, e.g., BACE $K$ 0 vs. $K$ 1 SMPT (Li et al., 2024)).
Operator Learning and Scientific Modeling: Dual-domain KANO architectures remain expressive over generic position-dependent PDE operators, outperforming Fourier Neural Operator (FNO) approaches and reconstructing symbolic Hamiltonians in quantum mechanics to four-decimal precision (Lee et al., 20 Sep 2025).
Time Series Anomaly Detection: KAN-AD and Fourier-KAN-Mamba exploit global Fourier bases for robust, lightweight, and fast anomaly detection, with parameter counts $K$ 2 and $K$ 3+ Event F1 improvements over prior art (Zhou et al., 2024, Wang et al., 19 Nov 2025).

5. Training Procedures, Regularization, and Best Practices

Fourier-KANs employ standard optimization protocols—typically Adam with layer-wise learning rate schedules and early stopping. Regularization schemes span:

Spectral norm penalties or controlled initialization for RFF matrices to prevent overfitting or spectral leakage (Zhang et al., 9 Feb 2025).
Explicit entropy minimization and gravitational alignment to favor sparse, interpretable spectral representations (P-KANs) (Poole et al., 24 Sep 2025).
Batch-wise variance normalization of Fourier expansion coefficients and adaptive learning rate decay. For time series and audio, frequency-adaptive strategies stabilize convergence and improve generalization (Li et al., 10 Jan 2026, Zhou et al., 2024).

Ablation studies across works show that both hybrid activations (e.g., GELU+RFF) and theoretically-guided spectral initializations are essential for high accuracy and generalization. In practice, Fourier-KANs demonstrate negligible hyperparameter sensitivity compared to coordinate-based positional encoding MLPs.

6. Interpretability, Limitations, and Future Directions

Fourier-KANs naturally enable interpretable functional discovery. Learned edge functions are directly analyzable—the magnitude of Fourier coefficients identifies dominant frequencies. In industrial applications (e.g., automated fiber placement), the model automatically discovered that low-frequency modes explained over 95% of behavior (Poole et al., 24 Sep 2025). Symbolic freezing in KANO architectures recovers exact PDE operator coefficients to high precision with minimal parameterization (Lee et al., 20 Sep 2025).

Limitations remain in scaling Fourier-KANs to highly discontinuous functions (Gibbs effect), computational overhead for entropy-driven projection, and efficient handling of massive graphs or high-order spectral expansions. Ongoing research targets hybrid basis integration, sparse and multi-resolution spectral schemes, inverse problem settings, and domain-specific Fourier-KAN compositions (Noorizadegan et al., 28 Oct 2025, Mehrabian et al., 2024).

7. Representative Empirical Results

Model	Domain	Param Count	Main Metric	Best Baseline	Fourier-KAN Value	Δ (improvement)
KAF (CIFAR-10)	Vision	$K$ 4	Accuracy	MLP (91.2%)	91.8%	+0.6%
FKAN (Kodak)	INR (Images)	--	PSNR, SSIM	INCODE (34.81,0.889)	37.91, 0.939	+8.91% PSNR
FourierKAN-GCF	Recommender	--	Recall@20	LightGCN (0.3307)	0.3564	+7.7%
KAN-AD (UCR)	Time Series	274	Event F1	KAN (0.4120)	0.5335	+12.15%
KA-GNN (BACE)	Molecular	44,000	ROC-AUC	SMPT (0.873)	0.890	+0.017
Fourier-KAN-Mamba	TS Anomaly	--	F1 (MSL)	Linear (90.99%)	93.27%	+2.28%

These results confirm strong empirical performance in both standard regression/classification tasks and scientific applications (Zhang et al., 9 Feb 2025, Mehrabian et al., 2024, Zhou et al., 2024, Li et al., 2024, Wang et al., 19 Nov 2025).

Fourier-Kolmogorov-Arnold Networks constitute a general family of neural architectures that combine the theoretical foundation of functional superpositions with adaptive spectral representation, providing a widely validated framework for efficient, interpretable, and spectrally expressive learning in high-dimensional, complex signal domains.