Papers
Topics
Authors
Recent
Search
2000 character limit reached

Continuous Spline Tokens

Updated 18 January 2026
  • Continuous spline tokens are a continuous representation that reconstructs explicit splines from noisy data to extract local kinematic coefficients like position, velocity, acceleration, and jerk.
  • They use a variational smoothing approach to balance data fidelity and smoothness, employing cubic or quartic splines with regularization to mitigate measurement noise.
  • Integration with transformer architectures using dual anchoring, global normalization, and LoRA modules enables robust, risk-sensitive decision policies with superior risk-adjusted returns.

Continuous spline tokens are an optimization-based continuous-time representation for noisy time series, developed to address the limitations of discrete tokenizations in transformer architectures. Instead of discretizing observed data directly, this approach reconstructs an explicit spline from noisy measurements and extracts local kinematic coefficients—position, velocity, acceleration, and jerk—as tokens. These tokens provide a denoised, physically interpretable embedding for decision policies, particularly within environments with asymmetric objectives and pronounced noise, as exemplified in financial time series (Kearney, 15 Jan 2026).

1. Spline Reconstruction: Mathematical and Variational Formulation

Given noisy time series observations {(tk,yk)}k=0N\{(t_k, y_k)\}_{k=0}^N of an underlying latent process x(t)x(t), continuous spline tokenization reconstructs x(t)x(t) as a piecewise polynomial spline of degree dd via a variational smoothing problem. For log-price, d=3d=3 (cubic spline) is used; for aggregate log-volume, d=4d=4 (quartic spline) with integral constraints.

The cubic spline for log-price is determined as the solution to

minx,v,w    12t0tNv(t)2dt+α22k=0Nwk2\min_{x,v,w} \;\; \frac{1}{2}\int_{t_0}^{t_N} v(t)^2\,dt + \frac{\alpha^2}{2}\sum_{k=0}^N w_k^2

subject to

x¨(t)=v(t),yk=x(tk)+wk,\ddot x(t) = v(t), \qquad y_k = x(t_k) + w_k,

where v(t)v(t) reflects process noise (acceleration), and wkw_k measurement noise. The hyperparameter α2=σp2/σm2\alpha^2 = \sigma_p^2 / \sigma_m^2 balances smoothness and data fit. By calculus of variations, the optimal x(t)x^{*}(t) on the interval [tk,tk+1][t_k, t_{k+1}] is

x(t)=c0,k+c1,k(ttk)+c2,k2(ttk)2+c3,k6(ttk)3.x^{*}(t) = c_{0,k} + c_{1,k}(t-t_k) + \frac{c_{2,k}}{2}(t-t_k)^2 + \frac{c_{3,k}}{6}(t-t_k)^3.

Analogously, for log-volume, a quartic spline is obtained from

minx,v,w12v(t)2dt+α22wk2\min_{x,v,w} \frac12\int v(t)^2 dt + \frac{\alpha^2}{2}\sum w_k^2

subject to

x¨(t)=v(t),y~k=tk1tkx(t)dt+wk,\ddot x(t) = v(t), \qquad \tilde y_k = \int_{t_{k-1}}^{t_k} x(t)\,dt + w_k,

with the spline on [tk,tk+1][t_k, t_{k+1}] given by

x(t)=c~0,k+c~1,k(ttk)+c~2,k2(ttk)2+c~3,k6(ttk)3+c~4,k24(ttk)4.x^{*}(t) = \tilde c_{0,k} + \tilde c_{1,k}(t-t_k) + \frac{\tilde c_{2,k}}{2}(t-t_k)^2 + \frac{\tilde c_{3,k}}{6}(t-t_k)^3 + \frac{\tilde c_{4,k}}{24}(t-t_k)^4.

The penalty v2\int v^2 enforces smoothness, while wk2\sum w_k^2 ensures fidelity to observed data. This reconstruction regularizes against noise, yielding an explicit algebraic description of the latent trajectory over time.

2. Formation and Normalization of Kinematic Tokens

Kinematic tokens are defined as the local spline coefficients within each sampling interval. For log-price, the four coefficients [c0,k,c1,k,c2,k,c3,k][c_{0,k},\,c_{1,k},\,c_{2,k},\,c_{3,k}] correspond to position, velocity, acceleration, and jerk, forming a 4D token per window: tk=[c0,k,c1,k,c2,k,c3,k]R4.\mathbf t_k = [c_{0,k},\,c_{1,k},\,c_{2,k},\,c_{3,k}]^{\top} \in \mathbb{R}^4. For log-volume, a 5D token [c~0,k,c~1,k,c~2,k,c~3,k,c~4,k]R5[\tilde{c}_{0,k},\,\tilde{c}_{1,k},\,\tilde{c}_{2,k},\,\tilde{c}_{3,k},\,\tilde{c}_{4,k}]^{\top} \in \mathbb{R}^5 is constructed. The two channels are concatenated as a joint 9D token [tk;t~k][\mathbf t_k; \tilde{\mathbf t}_k].

To address non-stationarity and heteroscedasticity, dual anchoring and global normalization are applied:

  • Window anchoring: Subtracting the first-position coefficient in each window, i.e., c0,k=c0,kc0,0c'_{0,k} = c_{0,k} - c_{0,0}, c~0,k=c~0,kc~0,0\tilde c'_{0,k} = \tilde c_{0,k} - \tilde c_{0,0}.
  • Z-score normalization: Apply global standardization (from training data) to all higher-order coefficients {c1,c2,c3,c~1,...,c~4}\{c_1, c_2, c_3, \tilde c_1, ..., \tilde c_4\}.

This representation encodes explicit local dynamics, denoised from measurement artifacts, enabling stable learning in downstream models.

3. Integration with Transformer Architectures

The continuous spline tokens are embedded within a transformer-based policy learning pipeline as follows:

  1. Raw price and volume are log-transformed.
  2. On each rolling context window [tT,t][t-T, t], spline optimization is performed to extract the corresponding sequence of 9D tokens.
  3. Dual anchoring and global Z-scoring are applied to standardize each windowed sequence.
  4. The resulting (batch ×\times T ×\times 9) tensor is fed into a decoder-only causal transformer (4 layers, 8 heads, dmodel=512d_{\text{model}}=512, dff=2048d_{\text{ff}}=2048), augmented with RoPE positional encodings.

No specific architectural modifications are required aside from the channel dimension. LoRA modules are included for adaptation in the first two layers (features) and the fourth layer (decision output). The pipeline leverages the semantic alignment of kinematic coefficients with time-local process dynamics, offering a denoised and physically meaningful alternative to raw value or finite difference input.

4. Training Objectives and Evaluation Protocol

Training is conducted using a risk-averse, contextually abstinent objective. The ternary label for each window is defined by realized return: gt={0  (Buy)if rt+1>τ, 1  (Sell)if rt+1<τ, 2  (Hold)otherwise,g_t = \begin{cases} 0\;(\text{Buy}) & \text{if } r_{t+1} > \tau, \ 1\;(\text{Sell}) & \text{if } r_{t+1} < -\tau, \ 2\;(\text{Hold}) & \text{otherwise,} \end{cases} where rt+1=c0,t+1c0,tr_{t+1} = c_{0,t+1} - c_{0,t} and τ=1%\tau = 1\%. The asymmetric weighted cross-entropy loss

L=tj=02wjI[gt=j]logp^t,j,w=[2.0,10.0,1.0]\mathcal{L} = -\sum_{t}\sum_{j=0}^2 w_j\,\mathbb{I}[g_t=j]\,\log\hat{p}_{t,j}, \qquad \mathbf{w} = [2.0,\, 10.0,\, 1.0]

assigns high penalty to erroneous "Sell" predictions to prefer "Hold" when predictive confidence is low.

Backtesting uses rolling-window inference to avoid look-ahead bias, with a deterministic finite-state machine mapping predictions gtg_t to portfolio allocations St{Cash,Long}S_t \in \{\text{Cash},\text{Long}\}. No leverage or shorting is employed. Metrics computed include total return, maximum drawdown (MaxDD\text{MaxDD}), Sharpe, and Sortino ratios. This design tests model calibration and robustness under abstention incentives.

5. Empirical Results and Diagnostic Insights

Under the risk-averse asymmetric objective, discrete baselines—such as raw value discretization, finite differences, and PatchTST—collapse to a "Liquidation Equilibrium" (100% Sell), resulting in 0% return and zero drawdown, with undefined Sharpe ratios. In contrast, transformer policies supplied with continuous spline tokens ("SplineGPT") sustain substantive, non-trivial action rates (30–80%) and achieve stable, calibrated decision distributions across assets.

Performance highlights include:

  • INTC: +276.3% total return, MaxDD −37.1%, Sharpe 1.38
  • NVDA: +544.3% return, MaxDD −35.6%, Sharpe 2.28
  • TSLA: +642.2% return, MaxDD −33.5%, Sharpe 2.15

Behavioral diagnostics reveal that discrete baselines yield degenerate (100% Sell) output regimes, whereas SplineGPT maintains balanced Buy/Sell/Hold distributions and sharply calibrated probability estimates (empirical calibration curves near the diagonal). Sensitivity sweeps confirm robustness with respect to both the abstention threshold τ\tau (across 0.25%0.25\%2%2\%) and transaction costs up to 30 bps, where discrete baselines persist in abstention.

High-frequency case analyses, e.g., XOM false-breakout events, demonstrate that continuous kinematic tokens enable the model to respond to rapid changes in acceleration (c2c_2) and thus avoid price collapses—an ability lacking in discrete-only approaches.

6. Significance and Methodological Implications

By reconstructing and tokenizing explicit continuous splines from raw, noisy measurements, continuous spline tokenization offers several advantages:

  1. Physically grounded, denoised representation of underlying dynamics.
  2. Overcoming low signal-to-noise limitations inherent in direct discretizations.
  3. Enabling stable, calibrated, risk-sensitive policies that escape the absorption effects induced by asymmetrically weighted objectives.
  4. Achieving superior risk-adjusted returns and robust real-world policy learning in noisy, abstention-prone time series domains.

These findings establish spline-based continuous tokenization as a principled alternative to conventional tokenization for transformer-based learning on noisy, continuous-time processes (Kearney, 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Continuous Spline Tokens.