Papers
Topics
Authors
Recent
Search
2000 character limit reached

StockGPT: A GenAI Model for Stock Prediction and Trading

Published 7 Apr 2024 in q-fin.CP, cs.AI, q-fin.PM, q-fin.PR, and q-fin.ST | (2404.05101v3)

Abstract: This paper introduces StockGPT, an autoregressive ``number'' model trained and tested on 70 million daily U.S.\ stock returns over nearly 100 years. Treating each return series as a sequence of tokens, StockGPT automatically learns the hidden patterns predictive of future returns via its attention mechanism. On a held-out test sample from 2001 to 2023, daily and monthly rebalanced long-short portfolios formed from StockGPT predictions yield strong performance. The StockGPT-based portfolios span momentum and long-/short-term reversals, eliminating the need for manually crafted price-based strategies, and yield highly significant alphas against leading stock market factors, suggesting a novel AI pricing effect. This highlights the immense promise of generative AI in surpassing human in making complex financial investment decisions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Chronos: Learning the Language of Time Series” In arXiv preprint arXiv:2403.07815, 2024
  2. “The social impact of generative ai: An analysis on ChatGPT” In Proceedings of the 2023 ACM Conference on Information Technology for Social Good, 2023, pp. 363–373
  3. “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
  4. Werner FM De Bondt and Richard Thaler “Does the stock market overreact?” In Journal of Finance 40.3 Wiley Online Library, 1985, pp. 793–805
  5. “BERT: Pre-training of deep bidirectional transformers for language understanding” In arXiv preprint arXiv:1810.04805, 2018
  6. Eugene F Fama “Efficient capital markets” In Journal of finance 25.2, 1970, pp. 383–417
  7. Eugene F Fama and Kenneth R French “A five-factor asset pricing model” In Journal of financial economics 116.1 Elsevier, 2015, pp. 1–22
  8. Eugene F Fama and James D MacBeth “Risk, return, and equilibrium: Empirical tests” In Journal of political economy 81.3 The University of Chicago Press, 1973, pp. 607–636
  9. Shihao Gu, Bryan Kelly and Dacheng Xiu “Empirical asset pricing via machine learning” In The Review of Financial Studies 33.5 Oxford University Press, 2020, pp. 2223–2273
  10. Yufeng Han, Ke Yang and Guofu Zhou “A new anomaly: The cross-sectional profitability of technical analysis” In Journal of Financial and Quantitative Analysis 48.5 Cambridge University Press, 2013, pp. 1433–1461
  11. “An augmented q-factor model with expected growth” In Review of Finance 25.1 Oxford University Press, 2021, pp. 1–41
  12. Narasimhan Jegadeesh “Evidence of predictable behavior of security returns” In Journal of Finance 45.3 Wiley Online Library, 1990, pp. 881–898
  13. “Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency” In Journal of Finance 48.1, 1993, pp. 65–91
  14. Jingwen Jiang, Bryan Kelly and Dacheng Xiu “(Re-) Imag (in) ing Price Trends” In The Journal of Finance 78.6 Wiley Online Library, 2023, pp. 3193–3249
  15. Jingwen Jiang, Bryan T Kelly and Dacheng Xiu “Expected returns and large language models” In Available at SSRN, 2022
  16. “Financial machine learning” In Foundations and Trends® in Finance 13.3-4 Now Publishers, Inc., 2023, pp. 205–363
  17. “Sentiment Trading with Large Language Models” In Finance Research Letters, 2024
  18. “Can ChatGPT forecast stock price movements? Return predictability and large language models” In arXiv preprint arXiv:2304.07619, 2023
  19. “Artificial intelligence in developing countries: The impact of generative artificial intelligence (AI) technologies for development” In Information Development SAGE Publications Sage UK: London, England, 2023
  20. Whitney K Newey and Kenneth D West “Hypothesis testing with efficient method of moments estimation” In International Economic Review JSTOR, 1987, pp. 777–787
  21. Henrik Skaug Sætra “Generative AI: Here to stay, but for good?” In Technology in Society 75 Elsevier, 2023, pp. 102372
  22. “Attention is all you need” In Advances in neural information processing systems 30, 2017
  23. “ClinicalGPT: large language models finetuned with diverse medical data and comprehensive evaluation” In arXiv preprint arXiv:2306.09968, 2023
  24. “BloombergGPT: A large language model for finance” In arXiv preprint arXiv:2303.17564, 2023
  25. Yi Yang, Mark Christopher Siy UY and Allen Huang “FinBERT: A Pretrained Language Model for Financial Communications”, 2020 arXiv:2006.08097
  26. “OPT: Open pre-trained transformer language models” In arXiv preprint arXiv:2205.01068, 2022
Citations (2)

Summary

  • The paper introduces StockGPT, a transformer-based model that tokenizes continuous stock return data for prediction and trading.
  • It employs a decoder-only architecture with four attention blocks on 256-day sequences, trained on historic returns to achieve a 119% annual return and Sharpe ratio of 6.5.
  • Its integration with benchmark factors and potential scalability to high-frequency data highlight significant improvements over traditional trading models.

"StockGPT: A GenAI Model for Stock Prediction and Trading" (2404.05101)

Introduction

The paper presents StockGPT, an autoregressive generative AI model developed explicitly for stock return prediction and trading. StockGPT leverages the transformer architecture commonly used in NLP but adapts it for continuous numerical stock return data. By parsing historical stock returns into tokenized sequences, StockGPT captures temporal dependencies through its attention mechanisms, supplementing traditional price-based trading strategies with AI-driven insights.

Model Architecture

StockGPT is built on a decoder-only transformer architecture, similar in form to that used by ChatGPT. This choice stems from the natural compatibility of the transformer model with time-series data, where the prediction of future values depends on previously observed data. Figure 1

Figure 1: StockGPT Architecture

The model adopts a light configuration of four attention blocks, each containing four self-attention heads, resulting in approximately one million parameters. Input sequences consist of 256 days' worth of returns for each stock, imitating the sequence length used in LLMs. The model is trained to predict the next day’s stock return based on historical sequences, crafting a form of end-to-end learning that abstracts complex trading signals from numeric data.

Implementation and Training

Stock Returns from 1926-2000 are utilized as the training dataset, providing a comprehensive historical ground for learning various market conditions. The returns are discretized to transform continuous numeric data into intervals or tokens suitable for the transformer architecture. This discretization retains essential market movements while allowing the transformer to function on data formatted similar to text.

The training regime includes minimizing cross-entropy between predicted and actual return bins across the training dataset, with StockGPT displaying substantial prediction accuracy over test periods spanning 2001-2023.

Results and Performance Metrics

The deployment of StockGPT on unseen data from 2001 to 2023 demonstrates a daily rebalanced long-short portfolio yielding an extraordinary annual return of 119% with a Sharpe ratio of 6.5. This significantly exceeds traditional language-model-driven strategies. The model achieved this high return rate even under stringent constraints, like avoiding micro-cap stocks to reduce transaction costs. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Daily Cumulative Returns

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Monthly Cumulative Returns

Further analysis revealed that adding StockGPT’s predictions to benchmark factors such as momentum, value, and reversal significantly enhances overall portfolio performance, demonstrating its efficacy in encompassing and, in cases, surpassing conventional market factors.

Comparison with Traditional Models

Contrary to LLMs operating with sentiment analysis or contextual embeddings from financial text data, StockGPT uniquely captures numerical sequences' latent dynamics. Its one-off training specificity negates regular retraining needs, providing a cost-effective and initial performance robustness lasting over two decades.

Implications and Future Directions

StockGPT’s ability to autonomously learn from numeric stock returns invites further exploration into expanding its architecture. Increasing parameter sizes, such as broader embedding dimensions or more extensive attention layers, can mitigate complexity exclusion and foster deeper learning abilities. Moreover, applying StockGPT on high-frequency data like intraday or minute-level pricing could unravel its adaptability to short-term, high-volatility trading environments.

Conclusion

StockGPT stands as a testament to applying generative transformers beyond traditional NLP domains, promising reinforced economic strategies through a direct numerical comprehension of stock markets. Its novel adaptation offers a bridge to complex market efficiency debates, presenting potential adjustments for empirical finance and quantitative trading methodologies. Enhancements in model scalability and frequency will further affirm its position within advanced financial AI systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 3 likes about this paper.