- The paper's main contribution is demonstrating that vanilla LSTMs outperform transformer models in both one-day and ten-day stock forecasts.
- It employs a unified experimental framework comparing LSTMs, attention-enhanced variants, and transformer-based models using Yahoo Finance data.
- Results indicate that LSTMs offer more stable trading performance and lower prediction errors under data-constrained conditions.
Introduction
The paper "StockBot 2.0: Vanilla LSTMs Outperform Transformer-based Forecasting for Stock Prices" (2601.00197) presents an empirical study focusing on the efficacy of various deep learning models in predicting financial time-series data, emphasizing stock prices. It challenges the prevailing assumption that modern, attention-based architectures such as transformers invariably surpass traditional methods by examining their performance against vanilla Long Short-Term Memory (LSTM) networks within a unified experimental framework.
Background and Motivation
Predicting stock prices is particularly challenging due to the non-linear, volatile, and highly stochastic nature of financial markets. Traditional models fail to capture these dynamics adequately, often relying on linear assumptions that do not reflect real-world complexities. Recent advances in deep learning, particularly RNNs and LSTMs, have shown improved capabilities in modeling these complexities. Despite the proven success of attention mechanisms and transformer architectures—exemplified by the transformer model’s success in other domains such as NLP—the paper argues that a simple LSTM can outperform these more complex models when addressing stock price forecasting in data-constrained environments.
Methodology
The study evaluates a suite of models, including baseline LSTMs, attention-enhanced LSTMs, temporal convolutional networks (TCN), Informer variants, and the Transformer-based Temporal Fusion Transformer (TFT). Each model was trained on financial data from Yahoo! Finance for stocks listed on NYSE and NASDAQ, covering a period from 2010 to 2020. The datasets were split into 80% for training and 20% for testing, employing z-score normalization based on training set statistics.
Model training utilized the Adam optimizer, with specific hyperparameters like past history (60 days), batch size, and dropout specified uniformly for all models. This allowed for direct comparison of architectures under similar training conditions to ensure fairness and focus on architectural differences rather than hyperparameter tuning.
Key Observations
- One-Day-Ahead Forecasting: The analysis revealed that LSTMs maintained consistent accuracy and demonstrated stability under autoregressive forecasting settings, contrary to more expressive models like transformers, which manifested performance degradation when recursively applied.
- Ten-Day-Ahead Forecasting: Similarly, in scenarios requiring longer-term predictions, LSTMs showed robustness and maintained lower RMSE values compared to their more complex counterparts, particularly under autoregressive modes.
The study evaluated the robustness of prediction methods through a buy/sell decision-making bot, termed StockBot, which operated on a daily closing price strategy. The LSTM-driven predictions yielded more consistent portfolio growth with reduced volatility, highlighting the advantages of their inductive biases in time-limited datasets.
Implications
The findings underscore the importance of inductive bias and architectural simplicity in models, especially in data-limited contexts typical of financial forecasting. The advantage of transformer models in other domains does not readily translate to stock prediction without elaborate hyperparameter adjustments, suggesting their potential is not fully realizable in constrained financial time-series without further optimization.
The observation that vanilla LSTMs outperform transformers in certain scenarios may influence practitioners to re-evaluate the utility of complex architectures for stock prediction tasks or to better tailor hyperparameter tuning strategies and explore hybrid modeling approaches combining recurrent and attention mechanisms.
Conclusions
The paper concludes that vanilla LSTMs are particularly effective for financial forecasting when evaluated strictly on the basis of direct predictive accuracy and decision-making stability. Given the constraints of financial data, which differ from NLP datasets in volume and noise characteristics, the simplicity and robustness of LSTMs make them a preferred choice in practice, particularly when resources for extensive tuning are limited. Future work may explore adaptive strategies and hybrid architectures to harness the strengths of both recurrent and attention-based models, focusing on environment-specific optimizations to enhance their forecasting capabilities.