LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior

Published 25 Feb 2025 in cs.LG and cs.NE | (2502.17844v2)

Abstract: The recently proposed Kolmogorov-Arnold network (KAN) is a promising alternative to multi-layer perceptrons (MLPs) for data-driven modeling. While original KAN layers were only capable of representing the addition operator, the recently-proposed MultKAN layer combines addition and multiplication subnodes in an effort to improve representation performance. Here, we find that MultKAN layers suffer from a few key drawbacks including limited applicability in output layers, bulky parameterizations with extraneous activations, and the inclusion of complex hyperparameters. To address these issues, we propose LeanKANs, a direct and modular replacement for MultKAN and traditional AddKAN layers. LeanKANs address these three drawbacks of MultKAN through general applicability as output layers, significantly reduced parameter counts for a given network structure, and a smaller set of hyperparameters. As a one-to-one layer replacement for standard AddKAN and MultKAN layers, LeanKAN is able to provide these benefits to traditional KAN learning problems as well as augmented KAN structures in which it serves as the backbone, such as KAN Ordinary Differential Equations (KAN-ODEs) or Deep Operator KANs (DeepOKAN). We demonstrate LeanKAN's simplicity and efficiency in a series of demonstrations carried out across a standard KAN toy problem as well as ordinary and partial differential equations learned via KAN-ODEs, where we find that its sparser parameterization and compact structure serve to increase its expressivity and learning capability, leading it to outperform similar and even much larger MultKANs in various tasks.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that LeanKAN improves memory efficiency and convergence by embedding multiplication directly within the layer.
It retains the parameter count of traditional AddKAN layers while delivering lower error rates in function approximation, ODE, and PDE tasks.
Its single hyperparameter design reduces tuning complexity, making it a practical choice for diverse scientific machine learning applications.

LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior

Introduction

The paper introduces LeanKAN, a refined architecture for Kolmogorov-Arnold Networks (KANs), designed to improve memory efficiency and convergence behavior by offering a direct and modular replacement for previously used AddKAN and MultKAN layers. While MultKAN layers incorporate multiplication to improve representation capacity, they are often encumbered by bulky parameterizations and complex hyperparameters. LeanKAN reduces these complexities, positioning itself as a parameter-efficient alternative that is applicable to output layers of data-driven modeling networks.

Kolmogorov-Arnold Networks and LeanKAN

KANs utilize the Kolmogorov-Arnold theorem, which states that any multivariate continuous function can be decomposed into univariate functions, making them suitable for high-dimensional functional approximations. Traditional AddKAN layers solely facilitate addition, limiting their expressivity in representing multiplicative functions. MultKAN layers augment this by facilitating multiplication, but at the cost of increased complexity due to additional nodes and hyperparameters. LeanKAN optimizes this structure by directly embedding multiplication within the layer’s computation, avoiding parameter inflation and ensuring all nodes can represent both addition and multiplication.

Structural and Efficiency Improvements

LeanKAN employs a single hyperparameter, $n^\text{mu}$ , defining the number of input nodes engaged in multiplication, and incorporates multiplication directly within the layer’s functionality. This design choice avoids the need for additional subnodes seen in MultKAN, leading to enhanced parameter efficiency; LeanKAN retains the same number of parameters as AddKAN, despite supporting multiplicative learning operations. This simplification also reduces the need for complex hyperparameter tuning, facilitating ease of implementation in augmented KAN frameworks such as KAN-ODEs and DeepOKANs.

Figure 1: Visualization of the three KAN layers referenced in this work, highlighting LeanKAN’s efficient integration of multiplication within its structure.

Case Studies and Performance Evaluation

Multidimensional Function Approximation

In testing against MultKAN, LeanKAN demonstrated superior memory efficiency and interpretability while maintaining higher or comparable accuracy across various function reconstruction tasks. Specifically, LeanKAN used fewer parameters, owing to its lean architecture, and achieved lower error rates, showcasing its improved computational efficacy for function approximation tasks.

Figure 2: Comparison between MultKAN and LeanKAN for single-layer multiplication training example, illustrating improved memory efficiency with LeanKAN.

Ordinary Differential Equations

The paper applied LeanKAN within the framework of KAN-ODEs to learn dynamic systems under the Lotka-Volterra predator-prey model. LeanKAN illustrated accelerated convergence rates and enhanced accuracy, reducing testing and training loss by an order of magnitude compared to the larger parameter-sized MultKAN networks. This was attributed to its lean parameterization, which mitigates the negative impact of redundant activations, thereby supporting faster and more accurate model training.

Figure 3: Loss metrics comparison between LeanKAN and MultKAN after extensive training, highlighting LeanKAN's lower training and testing losses.

Partial Differential Equations

LeanKAN was further examined within PDE contexts, specifically in solving Schrödinger-type equations. Here, LeanKAN again demonstrated improved performance and efficiency, presenting a marked reduction in MSE despite a sparse setup. The simpler hyperparameter configuration facilitated easier adaptability and implementation compared to traditional KAN structures.

Figure 4: Schrödinger Equation case study results, demonstrating LeanKAN's predictive accuracy and lower error margins compared to AddKAN.

Conclusion

LeanKAN emerges as a viable, memory-efficient replacement for MultKAN and AddKAN layers, especially in augmented network models like KAN-ODEs or DeepOKANs. Its design strategically embeds multiplication without redundant parameters, ensuring simplicity in hyperparameter tuning and accelerating both convergence rates and overall accuracy. The paper suggests widespread applicability for LeanKAN across diverse scientific machine learning applications, where efficient, high-order learning operations are critical.