DecoKAN: Interpretable Crypto Forecasting
- DecoKAN is an interpretable multivariate time series forecasting framework designed for cryptocurrency markets, integrating frequency decoupling and nonlinear modeling.
- The framework employs multi-level discrete wavelet transforms for precise frequency separation and Kolmogorov-Arnold Network mixers for hierarchical spline-based function approximation.
- Its symbolic analysis pipeline prunes model complexity and extracts explicit formulas, achieving state-of-the-art accuracy and transparent, human-readable output.
DecoKAN is an interpretable multivariate time series forecasting framework tailored for the unique dynamical properties of cryptocurrency markets. The methodology leverages a multi-level Discrete Wavelet Transform (DWT) for nonparametric frequency decoupling and Kolmogorov-Arnold Network (KAN) mixers for hierarchical, spline-based nonlinear function approximation. The architecture is augmented by a symbolic analysis pipeline for output sparsification, pruning, and analytical formula extraction, producing transparent, human-readable representations of learned patterns. DecoKAN demonstrably closes the gap between predictive performance and interpretability, achieving state-of-the-art accuracy (lowest MSE across BTC, ETH, XMR) while generating explicit symbolic forecasts, thereby supporting trustworthy decision-making in volatile financial contexts (Gao et al., 23 Dec 2025).
1. Multi-Level Discrete Wavelet Transform: Mathematical Formulation and Frequency Separation
DecoKAN begins with a multi-level DWT decomposition of the input time series. Let denote the normalized multivariate input over a look-back window . A compactly supported orthogonal wavelet (Daubechies 4) is selected; the mother scaling function and corresponding wavelet satisfy the two-scale equations: where and are the db4-specific low-pass and high-pass filter coefficients.
One DWT level on a 1D series yields approximation coefficients and detail coefficients . Multi-level () DWT recursively applies this procedure to approximations, yielding , as described in Eq. (3), with each coefficient series of dimension . The inverse DWT reconstructs predictions via upsampling and convolution with transposed filters: (cf. Eq. (14)).
Frequency separation is structurally enforced: the -th approximation models slow-varying, low-frequency socio-economic trends, while capture increasingly coarse, high-frequency market oscillations (speculative volatility). Figure 1 illustrates this orthogonal decomposition.
2. Kolmogorov-Arnold Network Mixers: Spline-Based Modeling and Hierarchical Resolution
Each decomposed coefficient series is routed to a dedicated "Resolution Branch" comprising KAN Mixer blocks. The KANLinear module employs learnable univariate activations: as per Eq. (11); is a fixed base (SiLU), are B-spline basis functions of order across a grid of size , and are optimized parameters. This parametrization is a Kolmogorov-Arnold universal approximator with intrinsic interpretability due to its explicit spline expansion.
Each Resolution Branch embeds its input with overlapping patches and applies two KAN Mixer blocks (Eqs. (4)-(6)). A standard block consists of:
- Temporal KAN along the patch axis (),
- Feature KAN along the embedding axis (), with LayerNorm and residual connections (Eqs. (7)-(10)). Crucially, per-branch modeling circumvents cross-frequency attention, thereby eliminating spectral entanglement and facilitating transparent decomposition.
3. Symbolic Analysis Pipeline: Sparsification, Pruning, Symbolization
Symbolic interpretability is realized during and after training. Regularization loss (Eq. (12)) enforces coefficient sparsity and entropy minimization: where , . This bias towards low-entropy spline weights results in concise activations.
Edge pruning (Algorithm 1, Phase 2) removes connections with activation -norm below threshold . Surviving edges are symbolized via regression over a function library (polynomial, sinusoidal, forms), producing maximal closed forms . Typical examples include quartic polynomials (e.g., , ) and sinusoidal patterns (e.g., ), facilitating explicit explanation of high-frequency phenomena.
The pruning statistics reveal structural separation: for ETH, the approximation branch pruned just 4.79% of edges, versus 76.28% in the detail branch (Table IV).
4. Model Architecture and Training Procedure
The overall computational flow (cf. Fig. 2) is:
- RevIN normalization of
- Multi-level DWT
- Parallel KAN Resolution Branches (two Mixer blocks/branch)
- Head linear layers produce predicted coefficients
- IDWT and RevIN denormalization yield the final forecast
Loss minimization employs
with . Optimization uses Adam over a parameter grid (learning rates ), with 30 training epochs (see Table II).
Hyperparameters (see Table II) include look-back , DWT levels , KAN spline grid , order , patch size , embedding , resulting in 0.11–0.18M parameters and 0.0073 GFLOPs at . Training time is 12.6s/epoch, inference ≈5s; B-spline ops induce training overhead, while inference remains real-time capable.
5. Experimental Evaluation: Quantitative Results, Ablations
Extensive benchmarking on BTC, ETH, and XMR datasets across various forecasting horizons () demonstrate DecoKAN's superior mean squared error (MSE) and mean absolute error (MAE) metrics (Table III). Results include:
| Dataset | DecoKAN MSE | WPMixer MSE | TimeFilter MSE |
|---|---|---|---|
| BTC | 0.136 | 0.160 | 0.146 |
| ETH | 0.100 | 0.118 | 0.160 |
| XMR | 0.219 | 0.235 | 0.233 |
DecoKAN ranks lowest in 27/32 crypto cases ("1stCount"); paired t-tests yield significance versus WPMixer and TimeFilter. Per-branch error ablation (not tabulated) indicates the Detail Branch contributes of forecasting variance during volatility spikes.
6. Interpretability: Symbolic Case Studies and Market Mapping
Case study on ETH (Fig. 5) traces the lifecycle from learned spline to pruned formulas , with high-frequency detail branches governing rapid price swings and approximation branches preserving global trend structure.
Symbolic regression outputs exhibit financial domain relevance: quartic polynomials encode momentum decay, sinusoids reproduce weekly/daily cycles, and captures rally saturation effects (Table V). The resulting formula set forms an auditable analytic toolkit for market practitioners.
This suggests that DecoKAN's separation and symbolization pipeline materially assists in post-hoc logic auditing, bridging the interpretability-performance gap typical of conventional black-box deep learning time series models. Its utility for risk management and systematic trading is substantiated by explicit mapping from learned formulas to identifiable market patterns.
DecoKAN synthesizes multi-level DWT-based frequency decoupling and interpretable KAN-based nonlinear modeling, augmented by rigorous symbolic regression, to enable not only predictive state-of-the-art performance on challenging cryptocurrency data but also unprecedented transparency in the model's internal logic and decision pathways (Gao et al., 23 Dec 2025).