Temporal Attention augmented Bilinear Network for Financial Time-Series Data Analysis

Published 4 Dec 2017 in cs.CE, cs.LG, and q-fin.CP | (1712.00975v1)

Abstract: Financial time-series forecasting has long been a challenging problem because of the inherently noisy and stochastic nature of the market. In the High-Frequency Trading (HFT), forecasting for trading purposes is even a more challenging task since an automated inference system is required to be both accurate and fast. In this paper, we propose a neural network layer architecture that incorporates the idea of bilinear projection as well as an attention mechanism that enables the layer to detect and focus on crucial temporal information. The resulting network is highly interpretable, given its ability to highlight the importance and contribution of each temporal instance, thus allowing further analysis on the time instances of interest. Our experiments in a large-scale Limit Order Book (LOB) dataset show that a two-hidden-layer network utilizing our proposed layer outperforms by a large margin all existing state-of-the-art results coming from much deeper architectures while requiring far fewer computations.

Abstract PDF Upgrade to Chat

Citations (191)

View on Semantic Scholar

Summary

The paper presents a novel Temporal Attention Augmented Bilinear Layer for efficient financial time-series forecasting.
It leverages bilinear projections combined with a temporal attention mechanism to reduce computational costs while enhancing prediction speed.
Experimental results on Limit Order Book data demonstrate improved F1 scores and clearer interpretability of temporal market dynamics.

Temporal Attention Augmented Bilinear Network for Financial Time-Series Data Analysis

Overview and Motivation

Financial time-series analysis presents acute challenges due to nonstationarity, noise, and the need for rapid inference, especially in high-frequency trading scenarios. The paper proposes the Temporal Attention Augmented Bilinear Layer (TABL) as a neural network layer for efficient and interpretable modeling of multivariate time-series data, specifically tailored to financial contexts such as Limit Order Book (LOB) prediction. By integrating bilinear projections with a differentiable temporal attention mechanism, this network architecture achieves superior predictive performance while maintaining low computational complexity and high interpretability.

Bilinear Projections and Temporal Attention Mechanism

The proposed architecture leverages the intrinsic tensor structure of multivariate time-series data, representing each sample as a matrix $\mathbf{X} \in \mathbb{R}^{D \times T}$ , where $D$ denotes feature dimension and $T$ denotes temporal depth. Standard deep learning models, including MLPs and CNNs, incur high parameterization costs and often fail to exploit separable dependencies across modes. Bilinear projection addresses this by separately learning feature-wise and temporal mappings via two weight matrices $\mathbf{W}_1$ and $\mathbf{W}_2$ .

The novelty arises from augmenting bilinear projection with a temporal attention mechanism, parameterized via matrix $\mathbf{W}$ and scalar $\lambda$ , that enables the layer to dynamically attend to particular time steps, focusing prediction and enhancing interpretability. The attention process includes five steps: feature-mode transformation, attention weight computation, softmax normalization, soft-attention masking, and temporal-mode projection. The attention mechanism operates independently across features, engendering neuron competition across the temporal dimension.

Computational Complexity

A salient property of the TABL is its efficient scaling. For input dimension $D \times T$ and output $D' \times T'$ , parameter count is $O(DD' + TT' + D'T')$ for BL and $O(DD' + TT' + D'T' + T^2)$ for TABL, with corresponding computational complexity $O(D'DT + D'TT' + 2D'T')$ for BL and an additional $O(D'T^2 + 3D'T)$ for TABL. This is markedly lower than attention-based recurrent models (Seq-RNNs) that require $O(3D'D + 11D'^2 + 11D')$ parameters and $O(11TD'^2 + 20TD' + 4T^2D' + 3TD'D + T^2)$ computation per sample. Empirical timing confirms that TABL achieves inference and training speeds orders of magnitude faster than CNN and LSTM baselines on LOB data, rendering it viable for deployment in latency-constrained HFT environments.

Experimental Results on Limit Order Book Prediction

Extensive experiments were conducted on the FI-2010 dataset comprising over 4 million events across five Finnish stocks with z-score normalization, using both classical (Setup1: forward splits by days) and deep learning (Setup2: 7 days training, 3 days test) protocols.

The TABL architecture, even with only two hidden layers, consistently yields superior F1 scores across all prediction horizons ( $H=\{10,20,50,100\}$ events) compared to the state-of-the-art including Ridge Regression, ARIMA, MLPs, CNNs, LSTMs, SVMs, and bag-of-features models. In Setup2, TABL not only outperforms LSTM networks (e.g., achieving F1 scores exceeding previous results by up to $25\%$ for $H=10$ ) but does so with significantly reduced model depth and training/inference latency.

These results establish that bilinear architectures augmented with temporal attention excel at extracting discriminative global temporal cues in financial time-series data, facilitating high performance without sacrificing computational practicality.

Interpretability and Temporal Analysis

A major advantage of the TABL is its interpretability. Analysis of the learned attention mask $\mathbf{A}$ and scalar $\lambda$ reveals meaningful temporal focus; for example, the model attends preferentially to recent events immediately preceding mid-price movement, and modifies attention patterns distinctly across stationary, increase, and decrease classes. The attention mechanism, quantified by $\lambda$ , facilitates soft selection in early training epochs, with behavior stabilizing toward hard attention as training converges.

This interpretability enables deeper post-hoc analysis of market microstructure and causality, opening avenues for research into identifying pseudo-periods, volatility bursts, and market regime shifts.

Implications and Speculative Future Directions

Practically, the TABL architecture enables scalable and interpretable forecasting in high-frequency finance, supporting rapid decision-making and facilitating transparency for risk analysis or regulatory compliance. Theoretically, its success demonstrates the utility of multilinear parameterizations joined with learned attention in time-series tasks, suggesting broader applicability in other domains with complex temporal dependencies (e.g., IoT sensor fusion, medical telemetry).

Future research may extend this paradigm by exploring hierarchical attention across multiple modes, adaptive prediction horizons, integration with reinforcement learning pipelines, and real-time anomaly detection. Further, advances in efficient attention computation and robust regularization could unlock even deeper architectures without sacrificing speed or interpretability.

Conclusion

The Temporal Attention Augmented Bilinear Layer presents a technically robust and practically efficient solution for financial time-series analysis. By combining bilinear projections with an explicitly modeled temporal attention mechanism, the architecture achieves both superior predictive performance and interpretability without incurring the computational overhead typical of deep recurrent models. Its empirical successes and theoretical contributions pave the way for further investigations into multilinear and attention-based modeling in complex sequential data domains (1712.00975).

Markdown Report Issue