Continuous Spline Tokens
- Continuous spline tokens are a continuous representation that reconstructs explicit splines from noisy data to extract local kinematic coefficients like position, velocity, acceleration, and jerk.
- They use a variational smoothing approach to balance data fidelity and smoothness, employing cubic or quartic splines with regularization to mitigate measurement noise.
- Integration with transformer architectures using dual anchoring, global normalization, and LoRA modules enables robust, risk-sensitive decision policies with superior risk-adjusted returns.
Continuous spline tokens are an optimization-based continuous-time representation for noisy time series, developed to address the limitations of discrete tokenizations in transformer architectures. Instead of discretizing observed data directly, this approach reconstructs an explicit spline from noisy measurements and extracts local kinematic coefficients—position, velocity, acceleration, and jerk—as tokens. These tokens provide a denoised, physically interpretable embedding for decision policies, particularly within environments with asymmetric objectives and pronounced noise, as exemplified in financial time series (Kearney, 15 Jan 2026).
1. Spline Reconstruction: Mathematical and Variational Formulation
Given noisy time series observations of an underlying latent process , continuous spline tokenization reconstructs as a piecewise polynomial spline of degree via a variational smoothing problem. For log-price, (cubic spline) is used; for aggregate log-volume, (quartic spline) with integral constraints.
The cubic spline for log-price is determined as the solution to
subject to
where reflects process noise (acceleration), and measurement noise. The hyperparameter balances smoothness and data fit. By calculus of variations, the optimal on the interval is
Analogously, for log-volume, a quartic spline is obtained from
subject to
with the spline on given by
The penalty enforces smoothness, while ensures fidelity to observed data. This reconstruction regularizes against noise, yielding an explicit algebraic description of the latent trajectory over time.
2. Formation and Normalization of Kinematic Tokens
Kinematic tokens are defined as the local spline coefficients within each sampling interval. For log-price, the four coefficients correspond to position, velocity, acceleration, and jerk, forming a 4D token per window: For log-volume, a 5D token is constructed. The two channels are concatenated as a joint 9D token .
To address non-stationarity and heteroscedasticity, dual anchoring and global normalization are applied:
- Window anchoring: Subtracting the first-position coefficient in each window, i.e., , .
- Z-score normalization: Apply global standardization (from training data) to all higher-order coefficients .
This representation encodes explicit local dynamics, denoised from measurement artifacts, enabling stable learning in downstream models.
3. Integration with Transformer Architectures
The continuous spline tokens are embedded within a transformer-based policy learning pipeline as follows:
- Raw price and volume are log-transformed.
- On each rolling context window , spline optimization is performed to extract the corresponding sequence of 9D tokens.
- Dual anchoring and global Z-scoring are applied to standardize each windowed sequence.
- The resulting (batch T 9) tensor is fed into a decoder-only causal transformer (4 layers, 8 heads, , ), augmented with RoPE positional encodings.
No specific architectural modifications are required aside from the channel dimension. LoRA modules are included for adaptation in the first two layers (features) and the fourth layer (decision output). The pipeline leverages the semantic alignment of kinematic coefficients with time-local process dynamics, offering a denoised and physically meaningful alternative to raw value or finite difference input.
4. Training Objectives and Evaluation Protocol
Training is conducted using a risk-averse, contextually abstinent objective. The ternary label for each window is defined by realized return: where and . The asymmetric weighted cross-entropy loss
assigns high penalty to erroneous "Sell" predictions to prefer "Hold" when predictive confidence is low.
Backtesting uses rolling-window inference to avoid look-ahead bias, with a deterministic finite-state machine mapping predictions to portfolio allocations . No leverage or shorting is employed. Metrics computed include total return, maximum drawdown (), Sharpe, and Sortino ratios. This design tests model calibration and robustness under abstention incentives.
5. Empirical Results and Diagnostic Insights
Under the risk-averse asymmetric objective, discrete baselines—such as raw value discretization, finite differences, and PatchTST—collapse to a "Liquidation Equilibrium" (100% Sell), resulting in 0% return and zero drawdown, with undefined Sharpe ratios. In contrast, transformer policies supplied with continuous spline tokens ("SplineGPT") sustain substantive, non-trivial action rates (30–80%) and achieve stable, calibrated decision distributions across assets.
Performance highlights include:
- INTC: +276.3% total return, MaxDD −37.1%, Sharpe 1.38
- NVDA: +544.3% return, MaxDD −35.6%, Sharpe 2.28
- TSLA: +642.2% return, MaxDD −33.5%, Sharpe 2.15
Behavioral diagnostics reveal that discrete baselines yield degenerate (100% Sell) output regimes, whereas SplineGPT maintains balanced Buy/Sell/Hold distributions and sharply calibrated probability estimates (empirical calibration curves near the diagonal). Sensitivity sweeps confirm robustness with respect to both the abstention threshold (across –) and transaction costs up to 30 bps, where discrete baselines persist in abstention.
High-frequency case analyses, e.g., XOM false-breakout events, demonstrate that continuous kinematic tokens enable the model to respond to rapid changes in acceleration () and thus avoid price collapses—an ability lacking in discrete-only approaches.
6. Significance and Methodological Implications
By reconstructing and tokenizing explicit continuous splines from raw, noisy measurements, continuous spline tokenization offers several advantages:
- Physically grounded, denoised representation of underlying dynamics.
- Overcoming low signal-to-noise limitations inherent in direct discretizations.
- Enabling stable, calibrated, risk-sensitive policies that escape the absorption effects induced by asymmetrically weighted objectives.
- Achieving superior risk-adjusted returns and robust real-world policy learning in noisy, abstention-prone time series domains.
These findings establish spline-based continuous tokenization as a principled alternative to conventional tokenization for transformer-based learning on noisy, continuous-time processes (Kearney, 15 Jan 2026).