Combinations of Fast Activation and Trigonometric Functions in Kolmogorov-Arnold Networks

Published 16 Aug 2025 in cs.LG | (2508.11876v1)

Abstract: For years, many neural networks have been developed based on the Kolmogorov-Arnold Representation Theorem (KART), which was created to address Hilbert's 13th problem. Recently, relying on KART, Kolmogorov-Arnold Networks (KANs) have attracted attention from the research community, stimulating the use of polynomial functions such as B-splines and RBFs. However, these functions are not fully supported by GPU devices and are still considered less popular. In this paper, we propose the use of fast computational functions, such as ReLU and trigonometric functions (e.g., ReLU, sin, cos, arctan), as basis components in Kolmogorov-Arnold Networks (KANs). By integrating these function combinations into the network structure, we aim to enhance computational efficiency. Experimental results show that these combinations maintain competitive performance while offering potential improvements in training time and generalization.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel approach combining ReLU and trigonometric functions in KANs to accelerate training without sacrificing performance.
It replaces computationally expensive polynomial activations with element-wise operations, reducing training time significantly as shown on MNIST benchmarks.
Experimental results indicate competitive accuracy and improved efficiency over traditional B-spline and RBF functions, highlighting scalability potential.

Combinations of Fast Activation and Trigonometric Functions in Kolmogorov-Arnold Networks

Introduction

The paper introduces novel function combinations within Kolmogorov-Arnold Networks (KANs), leveraging fast activation and trigonometric functions to enhance computational efficiency. KANs, inspired by the Kolmogorov-Arnold Representation Theorem (KART), have traditionally incorporated polynomial functions like B-splines and Radial Basis Functions (RBFs). However, these functions often suffer from limited GPU support and extended training times. This work proposes integrating widely used functions such as ReLU, sine, cosine, and arctan to overcome these limitations while maintaining competitive performance.

Kolmogorov-Arnold Networks and KART

Kolmogorov-Arnold Representation Theorem (KART) asserts that any continuous multivariate function can be decomposed into a superposition of one-dimensional functions and additions. KANs embody this theorem, providing a versatile neural architecture where layers consist of function matrices instead of traditional fixed activation functions.

Figure 1: Left: the architecture of KAN(2,3,1). Right: A showcase of computing $\phi_{1,1,1}$ using control points and B-splines.

KAN models aim to extend the capabilities of MLPs by enhancing both width and depth through the application of functions $\Phi_q$ and $\phi_{q,p}$ , thereby capturing complex data patterns more effectively.

Methodology

The proposed approach introduces simple combinations of fast functions, focusing on activation functions like ReLU and trigonometric functions such as sine, cosine, and arctangent. These functions are chosen for their computational efficiency and are combined via element-wise operations, such as sum and product, to form FC-KAN models.

This design diverges from more complex polynomial representations, avoiding longer training times and promoting scalability. The use of these functions facilitates faster computation, as demonstrated by substantial reductions in execution time compared to B-splines and other traditional functions.

Figure 2: Average execution time ( $\mu$ s -- microseconds) for processing 1,000,000 inputs per function.

Experiments

The experimental evaluation comprises training FC-KAN on MNIST and Fashion-MNIST datasets, comparing performance metrics across diverse function combinations. The results highlight significant reductions in training time alongside competitive validation accuracy and F1 scores relative to B-Spline-based models and existing KAN configurations.

Implications and Future Directions

This work offers practical improvements in computational efficiency, reducing training time without sacrificing model accuracy. The findings suggest potential extensions to real-world datasets and deeper architectures. Future research could further optimize function combinations and explore additional trigonometric or non-trigonometric functions for FC-KAN, promising enhanced performance in varied domains.

Conclusion

By integrating fast functions into KAN, the paper presents a viable path toward more efficient and scalable neural networks. The approach achieves performance comparable to MLPs and existing KAN variants while operating with a similar parameter count, marking a step forward in the application of theoretical mathematical constructs in practical AI models. However, opportunities remain for further refinement and extension, particularly in addressing lingering scalability and computational constraints within fast function combinations.

Markdown Report Issue