Group Rational KANs (GR-KANs)
- Group Rational KANs are neural architectures that tie rational-function activations across input groups, reducing computational overhead while preserving expressivity.
- They integrate equivariant linear layers and gated rational activations to systematically enforce symmetry priors in both feed-forward and Transformer models.
- GR-KANs deliver state-of-the-art performance in symmetry-critical tasks and medical imaging by achieving high data efficiency and parameter economy.
Group Rational Kolmogorov–Arnold Networks (GR-KANs) are a class of neural architectures that combine rational-function-based Kolmogorov–Arnold activation units with groupwise parameter sharing to deliver high expressivity, parameter efficiency, and systematic incorporation of symmetry priors in both conventional and Transformer-based deep learning models. GR-KANs generalize classical KANs by grouping input channels and tying the underlying nonlinear functions across each group, offering computational and statistical benefits relative to vanilla per-edge KAN parameterizations. By incorporating equivariant linear layers, gated rational activations, and explicit group decomposition, they provide a principled mechanism for encoding arbitrary matrix group symmetries within dense neural architectures.
1. Mathematical Foundations: From KANs to Group Rational KANs
Kolmogorov–Arnold Networks (KANs) realize the classical superposition theorem by learning a direct sum of univariate nonlinearities along each input-output “edge” in the feed-forward network. In their rational-function instantiation, each univariate function is parameterized as
where , (typically ), and is a learnable scalar weight.
In vanilla KANs, each input-to-output connection (or “edge”) maintains its bespoke rational map , resulting in such maps per layer. GR-KANs introduce a grouping scheme: partitioning input channels into (typically ) groups, and tying the rational basis function per group. For input , GR-KAN computes layer output as
where indexes the assigned group for channel and is the rational base for that group.
This mechanism preserves the universal approximation property of rational KANs while substantially reducing the number of unique nonlinear activations and their associated computational overhead. It also serves as an inductive bias whereby channels deemed a priori similar—based on spatial, semantic, or learned criteria—share their nonlinear characteristics.
2. Grouping Strategies and Parameter Efficiency
The grouping mechanism in GR-KANs reduces both parameter count and inference-time floating-point operations. The standard approach partitions the input dimensions into groups of size , assigning channels to group 1, to group 2, etc. Each group possesses a single set of rational coefficients , typically initialized with small Gaussians. Each “edge” from input to output retains its own scalar weight .
The resulting parameter allocation is as follows:
| Component | Vanilla KAN | GR-KAN (G groups) |
|---|---|---|
| Rational params | ||
| Linear weights () | ||
| Total |
At inference, the rational base function is evaluated times per group, rather than once per edge, with rational op count per forward pass versus for the vanilla KAN.
The grouping both regularizes the model and allows for practical deployment of rational-KAN activations in deep architectures such as Transformers, where the full cost of per-edge rational activations would be prohibitive.
3. Equivariance and Symmetry: Group-Equivariant Linear Layers
To encode invariance or equivariance to arbitrary matrix groups , GR-KANs extend the feed-forward structure by:
- Using gated rational-spline activation functions that are scalar-valued and hence commute trivially with group action.
- Replacing standard linear layers with -equivariant maps. Let and be real representation spaces of . Linear operator is -equivariant if
These constraints can be encoded as a homogeneous linear system using the Kronecker product, whose null space yields an orthonormal basis of intertwining operators; the most general equivariant is a linear combination of these.
- Introducing a lift layer to map raw data (in a trivial representation) into a direct sum of scalar and tensor representations of , so that subsequent layers operate in a symmetry-aware space.
The final architecture consists of stacked blocks, each comprising a group-gated rational activation and an equivariant linear transformation, with outputs projected to the desired target representation (such as an invariant scalar or equivariant tensor).
4. Implementation in Transformer Architectures
GR-KANs are deployed beyond feed-forward settings by integrating into self-attention architectures. In the UKAST design (Sapkota et al., 6 Nov 2025), the core enhancements occur in the Swin Transformer encoder, where the standard two-layer MLP (feed-forward network, FFN) is replaced by a GR-KAN block. The modified block workflow is:
- Residual convolution projection (3×3 Conv + BN + ReLU).
- Windowed-MSA (multi-head self-attention), layer-norm.
- GR-KAN (shared-group rational activation + linear), layer-norm, skip connection.
- Shifted-window MSA, layer-norm.
- Second GR-KAN, layer-norm, skip connection.
Group count is typical (higher values yield marginal returns beyond ). Polynomial orders in the rational basis are / with Safe Padé Activation Unit implementation.
Training employs AdamW optimizer, batch size 24, cosine annealing learning rate schedule, and standard data augmentation (random crop, flips, rotations, Gaussian noise). Inference uses overlapping patch-based windows, making the design scalable for 2D/3D medical image segmentation.
5. Empirical Performance and Data Efficiency
Empirical analyses on scientific tasks with strong symmetry priors and on medical image segmentation corroborate the theoretical advantages of GR-KANs.
Symmetry-critical scientific domains (Hu et al., 2024):
- Particle scattering ( Lorentz invariance): Test MSE with 435K params (EKAN-), outperforming EMLP (test MSE , 450K params) by two orders on samples.
- Three-body dynamics ( equivariance): Test MSE at 11K params, compared to for MLP (100K params) and similar MSE for EMLP with 5× parameters.
- Top-quark tagging ( invariance): EKAN- reaches 76.9% accuracy at 34K params, vs EMLP’s 77.1% (133K) and MLP’s 69.3% (83K).
Swin Transformer integration (Sapkota et al., 6 Nov 2025):
- On four 2D/3D medical image segmentation benchmarks, UKAST with GR-KAN achieves or exceeds state-of-the-art Dice accuracy without increasing parameter count over SwinUNETR.
- Replacing MLP with GR-KAN (without RC): GFLOPs ( parameters).
- Data efficiency: Under 10%/25%/50%/100% of training data, UKAST consistently outperforms SwinUNETR and KAN-based variants (UKAT, U-KAN).
- Ablation studies: ViT + GR-KAN yields increases of (2D), (3D) mean Dice over ViT + MLP; SwinT + GR-KAN yields / gains.
Collectively, these results demonstrate that GR-KANs enable state-of-the-art performance in both symmetry-aware and data-scarce regimes with minimal computational penalty.
6. Analysis, Limitations, and Prospects
GR-KANs inherit the expressivity of rational-function activations, capable of approximating sharp transitions and mild singularities more efficiently than polynomial or spline basis alone. By tying nonlinear functions within groups, they reduce overfitting risk and enhance data efficiency, especially in scenarios where learning a full matrix of rational functions is unnecessary or impractical.
The computational benefits—O() reduction of rational ops compared to vanilla KANs—make them suitable for large-scale, multi-head, or deep architectures. In practice, group assignments are static (block-wise); a plausible implication is that input-dependent or learned grouping could allocate capacity more efficiently, suggesting a direction for future work.
Current designs typically fix rational orders (, ); adaptive per-group order selection could further enrich the representable function classes. Combining group rational bases with attention mechanisms may advance the synergy between grouping and learned gating.
7. Summary and Outlook
Group Rational KANs provide a systematic architecture merging the compositional nonlinearity of rational-based KANs with parameter and computational efficiency via channel grouping, alongside rigorous enforcement of group-theoretic equivariance. Their demonstrated impact on scientific data analysis and medical imaging, resource savings, and formal symmetry guarantees suggest substantial utility for future equivariant models across data modalities and architectures. Extensions to dynamic grouping, adaptive basis order, and hybrid attention are promising vectors for ongoing research.