Papers
Topics
Authors
Recent
Search
2000 character limit reached

DecoKAN: Interpretable Crypto Forecasting

Updated 30 December 2025
  • DecoKAN is an interpretable multivariate time series forecasting framework designed for cryptocurrency markets, integrating frequency decoupling and nonlinear modeling.
  • The framework employs multi-level discrete wavelet transforms for precise frequency separation and Kolmogorov-Arnold Network mixers for hierarchical spline-based function approximation.
  • Its symbolic analysis pipeline prunes model complexity and extracts explicit formulas, achieving state-of-the-art accuracy and transparent, human-readable output.

DecoKAN is an interpretable multivariate time series forecasting framework tailored for the unique dynamical properties of cryptocurrency markets. The methodology leverages a multi-level Discrete Wavelet Transform (DWT) for nonparametric frequency decoupling and Kolmogorov-Arnold Network (KAN) mixers for hierarchical, spline-based nonlinear function approximation. The architecture is augmented by a symbolic analysis pipeline for output sparsification, pruning, and analytical formula extraction, producing transparent, human-readable representations of learned patterns. DecoKAN demonstrably closes the gap between predictive performance and interpretability, achieving state-of-the-art accuracy (lowest MSE across BTC, ETH, XMR) while generating explicit symbolic forecasts, thereby supporting trustworthy decision-making in volatile financial contexts (Gao et al., 23 Dec 2025).

1. Multi-Level Discrete Wavelet Transform: Mathematical Formulation and Frequency Separation

DecoKAN begins with a multi-level DWT decomposition of the input time series. Let XLRC×LX_L \in \mathbb{R}^{C \times L} denote the normalized multivariate input over a look-back window LL. A compactly supported orthogonal wavelet (Daubechies 4) is selected; the mother scaling function ϕ(t)\phi(t) and corresponding wavelet ψ(t)\psi(t) satisfy the two-scale equations: ϕ(t)=2nh[n]ϕ(2tn),ψ(t)=2ng[n]ϕ(2tn)\phi(t) = \sqrt{2}\sum_{n}h[n]\phi(2t-n), \quad \psi(t) = \sqrt{2}\sum_{n}g[n]\phi(2t-n) where h[n]h[n] and g[n]g[n] are the db4-specific low-pass and high-pass filter coefficients.

One DWT level on a 1D series xx yields approximation coefficients ak=nh[n]x[2kn]a_k = \sum_n h[n] x[2k-n] and detail coefficients dk=ng[n]x[2kn]d_k = \sum_n g[n] x[2k-n]. Multi-level (mm) DWT recursively applies this procedure to approximations, yielding (XAm,XDm,,XD1)=DWT(XL,ψ,m)\left(X_{A_m}, X_{D_m}, \dots, X_{D_1}\right) = \mathrm{DWT}(X_L, \psi, m), as described in Eq. (3), with each coefficient series of dimension C×LiC \times L_i. The inverse DWT reconstructs predictions via upsampling and convolution with transposed filters: Y=IDWT(YAm,YDm,,YD1)Y = \mathrm{IDWT}(Y_{A_m}, Y_{D_m}, \dots, Y_{D_1}) (cf. Eq. (14)).

Frequency separation is structurally enforced: the mm-th approximation XAmX_{A_m} models slow-varying, low-frequency socio-economic trends, while XD1,,XDmX_{D_1}, \dots, X_{D_m} capture increasingly coarse, high-frequency market oscillations (speculative volatility). Figure 1 illustrates this orthogonal decomposition.

2. Kolmogorov-Arnold Network Mixers: Spline-Based Modeling and Hierarchical Resolution

Each decomposed coefficient series XwiX_{w_i} is routed to a dedicated "Resolution Branch" comprising KAN Mixer blocks. The KANLinear module employs learnable univariate activations: ϕ(x)=wbb(x)+wsj=0G+k1cjBj(x)\phi(x) = w_b b(x) + w_s \sum_{j=0}^{G+k-1} c_j B_j(x) as per Eq. (11); b(x)b(x) is a fixed base (SiLU), Bj(x)B_j(x) are B-spline basis functions of order kk across a grid of size GG, and wb,ws,cjw_b, w_s, c_j are optimized parameters. This parametrization is a Kolmogorov-Arnold universal approximator with intrinsic interpretability due to its explicit spline expansion.

Each Resolution Branch embeds its input with overlapping patches and applies two KAN Mixer blocks (Eqs. (4)-(6)). A standard block consists of:

  1. Temporal KAN along the patch axis (NiN_i),
  2. Feature KAN along the embedding axis (dd), with LayerNorm and residual connections (Eqs. (7)-(10)). Crucially, per-branch modeling circumvents cross-frequency attention, thereby eliminating spectral entanglement and facilitating transparent decomposition.

3. Symbolic Analysis Pipeline: Sparsification, Pruning, Symbolization

Symbolic interpretability is realized during and after training. Regularization loss (Eq. (12)) enforces coefficient sparsity and entropy minimization: Lreg=λ1edgesϕ1+λ2edgesS(ϕ)L_{\rm reg} = \lambda_1 \sum_{\text{edges}} |\phi|_{1} + \lambda_2 \sum_{\text{edges}} S(\phi) where S(ϕ)=jpjlogpjS(\phi) = -\sum_j p_j \log p_j, pj=cj/kckp_j = |c_j|/\sum_k |c_k|. This bias towards low-entropy spline weights results in concise activations.

Edge pruning (Algorithm 1, Phase 2) removes connections with activation L2L_2-norm below threshold τ\tau. Surviving edges are symbolized via regression over a function library (polynomial, sinusoidal, tanh\tanh forms), producing maximal R2R^2 closed forms f(x)ϕ(x)f(x) \approx \phi(x). Typical examples include quartic polynomials (e.g., f(x)=0.076x4+0.216x3+0.252x21.370x0.116f(x) = -0.076x^4 + 0.216x^3 + 0.252x^2 - 1.370x - 0.116, R2=0.9968R^2 = 0.9968) and sinusoidal patterns (e.g., 0.923sin(1.348x+0.695)+0.801cos(2.624x)0.923\sin(1.348x + 0.695) + 0.801\cos(2.624x)), facilitating explicit explanation of high-frequency phenomena.

The pruning statistics reveal structural separation: for ETH, the approximation branch pruned just 4.79% of edges, versus 76.28% in the detail branch (Table IV).

4. Model Architecture and Training Procedure

The overall computational flow (cf. Fig. 2) is:

  • RevIN normalization of XLX_L
  • Multi-level DWT ({XAm,XDm,})(\rightarrow \{X_{A_m}, X_{D_m}, \dots\})
  • Parallel KAN Resolution Branches (two Mixer blocks/branch)
  • Head linear layers produce predicted coefficients
  • IDWT and RevIN denormalization yield the final forecast XTX_T

Loss minimization employs

Ltotal=Lforecast+γLreg,Lforecast=MSE(XT,Xtrue)L_{\rm total} = L_{\rm forecast} + \gamma L_{\rm reg}, \qquad L_{\rm forecast} = \mathrm{MSE}(X_T, X_{\rm true})

with γ=105\gamma = 10^{-5}. Optimization uses Adam over a parameter grid (learning rates [104,5×104][10^{-4}, 5 \times 10^{-4}]), with 30 training epochs (see Table II).

Hyperparameters (see Table II) include look-back L=96L=96, DWT levels m{1,2}m \in \{1,2\}, KAN spline grid G=5G=5, order k=3k=3, patch size P{8,16}P \in \{8,16\}, embedding d{64,128,256}d \in \{64,128,256\}, resulting in 0.11–0.18M parameters and 0.0073 GFLOPs at T=96T=96. Training time is 12.6s/epoch, inference ≈5s; B-spline ops induce training overhead, while inference remains real-time capable.

5. Experimental Evaluation: Quantitative Results, Ablations

Extensive benchmarking on BTC, ETH, and XMR datasets across various forecasting horizons (T={24,48,96,168}T=\{24,48,96,168\}) demonstrate DecoKAN's superior mean squared error (MSE) and mean absolute error (MAE) metrics (Table III). Results include:

Dataset DecoKAN MSE WPMixer MSE TimeFilter MSE
BTC 0.136 0.160 0.146
ETH 0.100 0.118 0.160
XMR 0.219 0.235 0.233

DecoKAN ranks lowest in 27/32 crypto cases ("1stCount"); paired t-tests yield p<0.05p<0.05 significance versus WPMixer and TimeFilter. Per-branch error ablation (not tabulated) indicates the Detail Branch contributes 70%\sim70\% of forecasting variance during volatility spikes.

6. Interpretability: Symbolic Case Studies and Market Mapping

Case study on ETH (Fig. 5) traces the lifecycle from learned spline ϕ\phi to pruned formulas ff, with high-frequency detail branches governing rapid price swings and approximation branches preserving global trend structure.

Symbolic regression outputs exhibit financial domain relevance: quartic polynomials encode momentum decay, sinusoids reproduce weekly/daily cycles, and tanh\tanh captures rally saturation effects (Table V). The resulting formula set forms an auditable analytic toolkit for market practitioners.

This suggests that DecoKAN's separation and symbolization pipeline materially assists in post-hoc logic auditing, bridging the interpretability-performance gap typical of conventional black-box deep learning time series models. Its utility for risk management and systematic trading is substantiated by explicit mapping from learned formulas to identifiable market patterns.


DecoKAN synthesizes multi-level DWT-based frequency decoupling and interpretable KAN-based nonlinear modeling, augmented by rigorous symbolic regression, to enable not only predictive state-of-the-art performance on challenging cryptocurrency data but also unprecedented transparency in the model's internal logic and decision pathways (Gao et al., 23 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DecoKAN.