DecoKAN: Interpretable Crypto Forecasting

Updated 30 December 2025

DecoKAN is an interpretable multivariate time series forecasting framework designed for cryptocurrency markets, integrating frequency decoupling and nonlinear modeling.
The framework employs multi-level discrete wavelet transforms for precise frequency separation and Kolmogorov-Arnold Network mixers for hierarchical spline-based function approximation.
Its symbolic analysis pipeline prunes model complexity and extracts explicit formulas, achieving state-of-the-art accuracy and transparent, human-readable output.

DecoKAN is an interpretable multivariate time series forecasting framework tailored for the unique dynamical properties of cryptocurrency markets. The methodology leverages a multi-level Discrete Wavelet Transform (DWT) for nonparametric frequency decoupling and Kolmogorov-Arnold Network (KAN) mixers for hierarchical, spline-based nonlinear function approximation. The architecture is augmented by a symbolic analysis pipeline for output sparsification, pruning, and analytical formula extraction, producing transparent, human-readable representations of learned patterns. DecoKAN demonstrably closes the gap between predictive performance and interpretability, achieving state-of-the-art accuracy (lowest MSE across BTC, ETH, XMR) while generating explicit symbolic forecasts, thereby supporting trustworthy decision-making in volatile financial contexts (Gao et al., 23 Dec 2025).

1. Multi-Level Discrete Wavelet Transform: Mathematical Formulation and Frequency Separation

DecoKAN begins with a multi-level DWT decomposition of the input time series. Let $X_L \in \mathbb{R}^{C \times L}$ denote the normalized multivariate input over a look-back window $L$ . A compactly supported orthogonal wavelet (Daubechies 4) is selected; the mother scaling function $\phi(t)$ and corresponding wavelet $\psi(t)$ satisfy the two-scale equations: $\phi(t) = \sqrt{2}\sum_{n}h[n]\phi(2t-n), \quad \psi(t) = \sqrt{2}\sum_{n}g[n]\phi(2t-n)$ where $h[n]$ and $g[n]$ are the db4-specific low-pass and high-pass filter coefficients.

One DWT level on a 1D series $x$ yields approximation coefficients $a_k = \sum_n h[n] x[2k-n]$ and detail coefficients $d_k = \sum_n g[n] x[2k-n]$ . Multi-level ( $m$ ) DWT recursively applies this procedure to approximations, yielding $\left(X_{A_m}, X_{D_m}, \dots, X_{D_1}\right) = \mathrm{DWT}(X_L, \psi, m)$ , as described in Eq. (3), with each coefficient series of dimension $C \times L_i$ . The inverse DWT reconstructs predictions via upsampling and convolution with transposed filters: $Y = \mathrm{IDWT}(Y_{A_m}, Y_{D_m}, \dots, Y_{D_1})$ (cf. Eq. (14)).

Frequency separation is structurally enforced: the $m$ -th approximation $X_{A_m}$ models slow-varying, low-frequency socio-economic trends, while $X_{D_1}, \dots, X_{D_m}$ capture increasingly coarse, high-frequency market oscillations (speculative volatility). Figure 1 illustrates this orthogonal decomposition.

2. Kolmogorov-Arnold Network Mixers: Spline-Based Modeling and Hierarchical Resolution

Each decomposed coefficient series $X_{w_i}$ is routed to a dedicated "Resolution Branch" comprising KAN Mixer blocks. The KANLinear module employs learnable univariate activations: $\phi(x) = w_b b(x) + w_s \sum_{j=0}^{G+k-1} c_j B_j(x)$ as per Eq. (11); $b(x)$ is a fixed base (SiLU), $B_j(x)$ are B-spline basis functions of order $k$ across a grid of size $G$ , and $w_b, w_s, c_j$ are optimized parameters. This parametrization is a Kolmogorov-Arnold universal approximator with intrinsic interpretability due to its explicit spline expansion.

Each Resolution Branch embeds its input with overlapping patches and applies two KAN Mixer blocks (Eqs. (4)-(6)). A standard block consists of:

Temporal KAN along the patch axis ( $N_i$ ),
Feature KAN along the embedding axis ( $d$ ), with LayerNorm and residual connections (Eqs. (7)-(10)). Crucially, per-branch modeling circumvents cross-frequency attention, thereby eliminating spectral entanglement and facilitating transparent decomposition.

3. Symbolic Analysis Pipeline: Sparsification, Pruning, Symbolization

Symbolic interpretability is realized during and after training. Regularization loss (Eq. (12)) enforces coefficient sparsity and entropy minimization: $L_{\rm reg} = \lambda_1 \sum_{\text{edges}} |\phi|_{1} + \lambda_2 \sum_{\text{edges}} S(\phi)$ where $S(\phi) = -\sum_j p_j \log p_j$ , $p_j = |c_j|/\sum_k |c_k|$ . This bias towards low-entropy spline weights results in concise activations.

Edge pruning (Algorithm 1, Phase 2) removes connections with activation $L_2$ -norm below threshold $\tau$ . Surviving edges are symbolized via regression over a function library (polynomial, sinusoidal, $\tanh$ forms), producing maximal $R^2$ closed forms $f(x) \approx \phi(x)$ . Typical examples include quartic polynomials (e.g., $f(x) = -0.076x^4 + 0.216x^3 + 0.252x^2 - 1.370x - 0.116$ , $R^2 = 0.9968$ ) and sinusoidal patterns (e.g., $0.923\sin(1.348x + 0.695) + 0.801\cos(2.624x)$ ), facilitating explicit explanation of high-frequency phenomena.

The pruning statistics reveal structural separation: for ETH, the approximation branch pruned just 4.79% of edges, versus 76.28% in the detail branch (Table IV).

4. Model Architecture and Training Procedure

The overall computational flow (cf. Fig. 2) is:

RevIN normalization of $X_L$
Multi-level DWT $(\rightarrow \{X_{A_m}, X_{D_m}, \dots\})$
Parallel KAN Resolution Branches (two Mixer blocks/branch)
Head linear layers produce predicted coefficients
IDWT and RevIN denormalization yield the final forecast $X_T$

Loss minimization employs

$L_{\rm total} = L_{\rm forecast} + \gamma L_{\rm reg}, \qquad L_{\rm forecast} = \mathrm{MSE}(X_T, X_{\rm true})$

with $\gamma = 10^{-5}$ . Optimization uses Adam over a parameter grid (learning rates $[10^{-4}, 5 \times 10^{-4}]$ ), with 30 training epochs (see Table II).

Hyperparameters (see Table II) include look-back $L=96$ , DWT levels $m \in \{1,2\}$ , KAN spline grid $G=5$ , order $k=3$ , patch size $P \in \{8,16\}$ , embedding $d \in \{64,128,256\}$ , resulting in 0.11–0.18M parameters and 0.0073 GFLOPs at $T=96$ . Training time is 12.6s/epoch, inference ≈5s; B-spline ops induce training overhead, while inference remains real-time capable.

5. Experimental Evaluation: Quantitative Results, Ablations

Extensive benchmarking on BTC, ETH, and XMR datasets across various forecasting horizons ( $T=\{24,48,96,168\}$ ) demonstrate DecoKAN's superior mean squared error (MSE) and mean absolute error (MAE) metrics (Table III). Results include:

Dataset	DecoKAN MSE	WPMixer MSE	TimeFilter MSE
BTC	0.136	0.160	0.146
ETH	0.100	0.118	0.160
XMR	0.219	0.235	0.233

DecoKAN ranks lowest in 27/32 crypto cases ("1stCount"); paired t-tests yield $p<0.05$ significance versus WPMixer and TimeFilter. Per-branch error ablation (not tabulated) indicates the Detail Branch contributes $\sim70\%$ of forecasting variance during volatility spikes.

6. Interpretability: Symbolic Case Studies and Market Mapping

Case study on ETH (Fig. 5) traces the lifecycle from learned spline $\phi$ to pruned formulas $f$ , with high-frequency detail branches governing rapid price swings and approximation branches preserving global trend structure.

Symbolic regression outputs exhibit financial domain relevance: quartic polynomials encode momentum decay, sinusoids reproduce weekly/daily cycles, and $\tanh$ captures rally saturation effects (Table V). The resulting formula set forms an auditable analytic toolkit for market practitioners.

This suggests that DecoKAN's separation and symbolization pipeline materially assists in post-hoc logic auditing, bridging the interpretability-performance gap typical of conventional black-box deep learning time series models. Its utility for risk management and systematic trading is substantiated by explicit mapping from learned formulas to identifiable market patterns.

DecoKAN synthesizes multi-level DWT-based frequency decoupling and interpretable KAN-based nonlinear modeling, augmented by rigorous symbolic regression, to enable not only predictive state-of-the-art performance on challenging cryptocurrency data but also unprecedented transparency in the model's internal logic and decision pathways (Gao et al., 23 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

DecoKAN: Interpretable Decomposition for Forecasting Cryptocurrency Market Dynamics (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DecoKAN.

DecoKAN: Interpretable Crypto Forecasting

1. Multi-Level Discrete Wavelet Transform: Mathematical Formulation and Frequency Separation

2. Kolmogorov-Arnold Network Mixers: Spline-Based Modeling and Hierarchical Resolution

3. Symbolic Analysis Pipeline: Sparsification, Pruning, Symbolization

4. Model Architecture and Training Procedure

5. Experimental Evaluation: Quantitative Results, Ablations

6. Interpretability: Symbolic Case Studies and Market Mapping

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DecoKAN: Interpretable Crypto Forecasting

1. Multi-Level Discrete Wavelet Transform: Mathematical Formulation and Frequency Separation

2. Kolmogorov-Arnold Network Mixers: Spline-Based Modeling and Hierarchical Resolution

3. Symbolic Analysis Pipeline: Sparsification, Pruning, Symbolization

4. Model Architecture and Training Procedure

5. Experimental Evaluation: Quantitative Results, Ablations

6. Interpretability: Symbolic Case Studies and Market Mapping

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research