Papers
Topics
Authors
Recent
Search
2000 character limit reached

Group Rational KANs (GR-KANs)

Updated 11 November 2025
  • Group Rational KANs are neural architectures that tie rational-function activations across input groups, reducing computational overhead while preserving expressivity.
  • They integrate equivariant linear layers and gated rational activations to systematically enforce symmetry priors in both feed-forward and Transformer models.
  • GR-KANs deliver state-of-the-art performance in symmetry-critical tasks and medical imaging by achieving high data efficiency and parameter economy.

Group Rational Kolmogorov–Arnold Networks (GR-KANs) are a class of neural architectures that combine rational-function-based Kolmogorov–Arnold activation units with groupwise parameter sharing to deliver high expressivity, parameter efficiency, and systematic incorporation of symmetry priors in both conventional and Transformer-based deep learning models. GR-KANs generalize classical KANs by grouping input channels and tying the underlying nonlinear functions across each group, offering computational and statistical benefits relative to vanilla per-edge KAN parameterizations. By incorporating equivariant linear layers, gated rational activations, and explicit group decomposition, they provide a principled mechanism for encoding arbitrary matrix group symmetries within dense neural architectures.

1. Mathematical Foundations: From KANs to Group Rational KANs

Kolmogorov–Arnold Networks (KANs) realize the classical superposition theorem by learning a direct sum of univariate nonlinearities along each input-output “edge” in the feed-forward network. In their rational-function instantiation, each univariate function φ(x)\varphi(x) is parameterized as

φ(x)=wF(x)=wP(x)1+Q(x)\varphi(x) = w \cdot F(x) = w \cdot \frac{P(x)}{1 + |Q(x)|}

where P(x)=a0+a1x++amxmP(x) = a_0 + a_1 x + \dots + a_m x^m, Q(x)=b1x++bnxnQ(x) = b_1 x + \dots + b_n x^n (typically m=3,n=4m=3, n=4), and wRw\in\mathbb{R} is a learnable scalar weight.

In vanilla KANs, each input-to-output connection (or “edge”) maintains its bespoke rational map Fk,iF_{k,i}, resulting in O(dindout)O(d_{\text{in}} \cdot d_{\text{out}}) such maps per layer. GR-KANs introduce a grouping scheme: partitioning dind_{\text{in}} input channels into GG (typically GdindoutG\ll d_{\text{in}}\cdot d_{\text{out}}) groups, and tying the rational basis function per group. For input xRdinx\in \mathbb{R}^{d_{\text{in}}}, GR-KAN computes layer output zRdoutz\in\mathbb{R}^{d_{\text{out}}} as

zk=i=1dinwk,iFg(i)(xi)z_k = \sum_{i=1}^{d_{\text{in}}} w_{k,i}\, F_{g(i)}(x_i)

where g(i){1,,G}g(i)\in\{1,\ldots,G\} indexes the assigned group for channel ii and Fg(i)F_{g(i)} is the rational base for that group.

This mechanism preserves the universal approximation property of rational KANs while substantially reducing the number of unique nonlinear activations and their associated computational overhead. It also serves as an inductive bias whereby channels deemed a priori similar—based on spatial, semantic, or learned criteria—share their nonlinear characteristics.

2. Grouping Strategies and Parameter Efficiency

The grouping mechanism in GR-KANs reduces both parameter count and inference-time floating-point operations. The standard approach partitions the dind_{\text{in}} input dimensions into GG groups of size s=din/Gs = d_{\text{in}}/G, assigning channels 1,,s1,\dots,s to group 1, s+1,,2ss+1,\dots,2s to group 2, etc. Each group possesses a single set of rational coefficients {aj,bj}\{a_j, b_j\}, typically initialized with small Gaussians. Each “edge” from input to output retains its own scalar weight wk,iw_{k,i}.

The resulting parameter allocation is as follows:

Component Vanilla KAN GR-KAN (G groups)
Rational params dindout(m+1+n)d_{\text{in}} \cdot d_{\text{out}} \cdot (m+1+n) G(m+1+n)G \cdot (m+1+n)
Linear weights (WW) dindoutd_{\text{in}} \cdot d_{\text{out}} dindoutd_{\text{in}} \cdot d_{\text{out}}
Total O(dindout(m+n))O(d_{\text{in}} d_{\text{out}} (m+n)) O(dindout+G(m+n))O(d_{\text{in}} d_{\text{out}} + G(m+n))

At inference, the rational base function FjF_j is evaluated ss times per group, rather than once per edge, with O(din(m+n))O(d_{\text{in}}(m+n)) rational op count per forward pass versus O(dindout(m+n))O(d_{\text{in}}d_{\text{out}}(m+n)) for the vanilla KAN.

The grouping both regularizes the model and allows for practical deployment of rational-KAN activations in deep architectures such as Transformers, where the full cost of per-edge rational activations would be prohibitive.

3. Equivariance and Symmetry: Group-Equivariant Linear Layers

To encode invariance or equivariance to arbitrary matrix groups GG, GR-KANs extend the feed-forward structure by:

  • Using gated rational-spline activation functions that are scalar-valued and hence commute trivially with group action.
  • Replacing standard linear layers with GG-equivariant maps. Let (Ui,ρi)(U_i, \rho_i) and (Uo,ρo)(U_o, \rho_o) be real representation spaces of GG. Linear operator W:UiUoW: U_i\rightarrow U_o is GG-equivariant if

Wρi(g)=ρo(g)W,gG.W \rho_i(g) = \rho_o(g) W,\quad \forall g\in G.

These constraints can be encoded as a homogeneous linear system using the Kronecker product, whose null space yields an orthonormal basis {Q:m}\{Q_{:m}\} of intertwining operators; the most general equivariant WW is a linear combination of these.

  • Introducing a lift layer to map raw data (in a trivial representation) into a direct sum of scalar and tensor representations of GG, so that subsequent layers operate in a symmetry-aware space.

The final architecture consists of stacked blocks, each comprising a group-gated rational activation and an equivariant linear transformation, with outputs projected to the desired target representation (such as an invariant scalar or equivariant tensor).

4. Implementation in Transformer Architectures

GR-KANs are deployed beyond feed-forward settings by integrating into self-attention architectures. In the UKAST design (Sapkota et al., 6 Nov 2025), the core enhancements occur in the Swin Transformer encoder, where the standard two-layer MLP (feed-forward network, FFN) is replaced by a GR-KAN block. The modified block workflow is:

  1. Residual convolution projection (3×3 Conv + BN + ReLU).
  2. Windowed-MSA (multi-head self-attention), layer-norm.
  3. GR-KAN (shared-group rational activation + linear), layer-norm, skip connection.
  4. Shifted-window MSA, layer-norm.
  5. Second GR-KAN, layer-norm, skip connection.

Group count G=8G=8 is typical (higher values yield marginal returns beyond G=4G=4). Polynomial orders in the rational basis are m=3m=3/n=4n=4 with Safe Padé Activation Unit implementation.

Training employs AdamW optimizer, batch size 24, cosine annealing learning rate schedule, and standard data augmentation (random crop, flips, rotations, Gaussian noise). Inference uses overlapping patch-based windows, making the design scalable for 2D/3D medical image segmentation.

5. Empirical Performance and Data Efficiency

Empirical analyses on scientific tasks with strong symmetry priors and on medical image segmentation corroborate the theoretical advantages of GR-KANs.

  • Particle scattering (O(1,3)O(1,3) Lorentz invariance): Test MSE 3.8×106\approx 3.8 \times 10^{-6} with \sim435K params (EKAN-O(1,3)O(1,3)), outperforming EMLP (test MSE 2.1×1042.1\times 10^{-4}, 450K params) by two orders on 10410^4 samples.
  • Three-body dynamics (O(2)O(2) equivariance): Test MSE 4.8×1044.8\times 10^{-4} at 11K params, compared to 4.2×1034.2\times 10^{-3} for MLP (100K params) and similar MSE for EMLP with 5× parameters.
  • Top-quark tagging (O(1,3)O(1,3) invariance): EKAN-SO+(1,3)SO^+(1,3) reaches \sim76.9% accuracy at 34K params, vs EMLP’s 77.1% (133K) and MLP’s 69.3% (83K).
  • On four 2D/3D medical image segmentation benchmarks, UKAST with GR-KAN achieves or exceeds state-of-the-art Dice accuracy without increasing parameter count over SwinUNETR.
  • Replacing MLP with GR-KAN (without RC): GFLOPs 1.25001.24671.2500 \rightarrow 1.2467 (+0.0008+0.0008 parameters).
  • Data efficiency: Under 10%/25%/50%/100% of training data, UKAST consistently outperforms SwinUNETR and KAN-based variants (UKAT, U-KAN).
  • Ablation studies: ViT + GR-KAN yields increases of +3.5%+3.5\% (2D), +5.0%+5.0\% (3D) mean Dice over ViT + MLP; SwinT + GR-KAN yields +2.1%+2.1\%/+0.8%+0.8\% gains.

Collectively, these results demonstrate that GR-KANs enable state-of-the-art performance in both symmetry-aware and data-scarce regimes with minimal computational penalty.

6. Analysis, Limitations, and Prospects

GR-KANs inherit the expressivity of rational-function activations, capable of approximating sharp transitions and mild singularities more efficiently than polynomial or spline basis alone. By tying nonlinear functions within groups, they reduce overfitting risk and enhance data efficiency, especially in scenarios where learning a full matrix of rational functions is unnecessary or impractical.

The computational benefits—O(1/dout1/d_\text{out}) reduction of rational ops compared to vanilla KANs—make them suitable for large-scale, multi-head, or deep architectures. In practice, group assignments are static (block-wise); a plausible implication is that input-dependent or learned grouping could allocate capacity more efficiently, suggesting a direction for future work.

Current designs typically fix rational orders (m=3m=3, n=4n=4); adaptive per-group order selection could further enrich the representable function classes. Combining group rational bases with attention mechanisms may advance the synergy between grouping and learned gating.

7. Summary and Outlook

Group Rational KANs provide a systematic architecture merging the compositional nonlinearity of rational-based KANs with parameter and computational efficiency via channel grouping, alongside rigorous enforcement of group-theoretic equivariance. Their demonstrated impact on scientific data analysis and medical imaging, resource savings, and formal symmetry guarantees suggest substantial utility for future equivariant models across data modalities and architectures. Extensions to dynamic grouping, adaptive basis order, and hybrid attention are promising vectors for ongoing research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Group Rational KANs (GR-KANs).