Papers
Topics
Authors
Recent
Search
2000 character limit reached

tsGT: Stochastic Time Series Modeling With Transformer

Published 8 Mar 2024 in cs.LG | (2403.05713v3)

Abstract: Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
  2. Crps learning. Journal of Econometrics, 2021.
  3. Time series: theory and methods. Springer science & business media, 2009.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. N-hits: Neural hierarchical interpolation for time series forecasting, 2022.
  6. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021a.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021b.
  8. Tsmixer: An all-mlp architecture for time series forecasting. arXiv preprint arXiv:2303.06053, 2023.
  9. Towards learning universal hyperparameter optimizers with transformers. arXiv preprint arXiv:2205.13320, 2022.
  10. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  11. A toy model of universality: Reverse engineering how networks learn group operations. arXiv preprint arXiv:2302.03025, 2023.
  12. Subgoal search for complex reasoning tasks. Advances in Neural Information Processing Systems, 34:624–638, 2021.
  13. Engle, R. F. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the econometric society, pp.  987–1007, 1982.
  14. Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood. International Journal of forecasting, 20(4):629–645, 2004.
  15. Deep learning. MIT press, 2016.
  16. Probabilistic time series forecasting with implicit quantile networks. CoRR, abs/2107.03743, 2021. URL https://arxiv.org/abs/2107.03743.
  17. Gray, R. Vector quantization. IEEE Assp Magazine, 1(2):4–29, 1984.
  18. Large language models are zero-shot time series forecasters, 2023.
  19. Hamilton, J. D. Time series analysis. Princeton university press, 2020.
  20. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  21. Hill, T. P. The significant-digit phenomenon. The American Mathematical Monthly, 102(4):322–327, 1995.
  22. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  23. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  24. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  25. Forecasting: principles and practice. OTexts, Melbourne, Australia, 3rd edition, 2021. URL https://otexts.com/fpp3. Accessed on May 10, 2023.
  26. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  27. Fqformer: A fully quantile transformer for time series forecasting. In 8th SIGKDD International Workshop on Mining and Learning from Time Series–Deep Forecasting: Models, Interpretability, and Applications, 2022.
  28. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
  29. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  30. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=cGDAkQo1C0p.
  31. Bendr: using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data. Frontiers in Human Neuroscience, 15:653659, 2021.
  32. Forecasting deep learning dynamics with applications to hyperparameter tuning, 2020. URL https://openreview.net/forum?id=ByxHJeBYDB.
  33. Kupiec, P. Techniques for verifying the accuracy of risk management models. Journal of Derivatives, 3:73–84, 1995.
  34. Vector quantization: a weighted version for time-series forecasting. Future Generation Computer Systems, 21(7):1056–1067, 2005.
  35. Solving quantitative reasoning problems with language models. arXiv preprint arXiv:2206.14858, 2022.
  36. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
  37. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022.
  38. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748–1764, 2021.
  39. Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach. Journal of empirical finance, 7(3-4):271–300, 2000.
  40. Quantitative risk management: concepts, techniques and tools-revised edition. Princeton university press, 2015.
  41. Magnushammer: A transformer-based approach to premise selection. arXiv preprint arXiv:2303.04488, 2023.
  42. Progress measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=9XFSbDPmdW.
  43. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
  44. Nobel Foundation. The nobel prize in economic sciences 2003. https://www.nobelprize.org/prizes/economic-sciences/2003/summary/, 2003.
  45. Investigating the limitations of transformers with simple arithmetic tasks. arXiv preprint arXiv:2102.13019, 2021.
  46. Neural discrete representation learning. arXiv preprint arXiv:1711.00937, 2017.
  47. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  48. N-beats: Neural basis expansion analysis for interpretable time series forecasting, 2020.
  49. Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344, 2022.
  50. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  51. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  52. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  53. Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. arXiv preprint arXiv:2206.11251, 2022.
  54. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
  55. Sutton, R. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
  56. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
  57. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  58. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
  59. Tsay, R. S. Analysis of Financial Time Series. Wiley, Hoboken, NJ, 3rd edition edition, 2010.
  60. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  61. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=NpsVSN6o4ul.
  62. Transformers in time series: A survey. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pp.  6778–6786. ijcai.org, 2023. doi: 10.24963/IJCAI.2023/759. URL https://doi.org/10.24963/ijcai.2023/759.
  63. Whittle, P. Hypothesis testing in time series analysis, volume 4. Almqvist & Wiksells boktr., 1951.
  64. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022.
  65. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
  66. Bloomberggpt: A large language model for finance, 2023.
  67. Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712, 2023.
  68. Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Mathematical Geosciences, 50(2):209–234, 2018.
  69. Fast and precise: Adjusting planning horizon with adaptive subgoal search. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=7JsGYvjE88d.
  70. Are transformers effective for time series forecasting? In Williams, B., Chen, Y., and Neville, J. (eds.), Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pp.  11121–11128. AAAI Press, 2023. doi: 10.1609/AAAI.V37I9.26317. URL https://doi.org/10.1609/aaai.v37i9.26317.
  71. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=vSVLM2j9eie.
  72. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  73. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp.  11106–11115. AAAI Press, 2021. doi: 10.1609/AAAI.V35I12.17325. URL https://doi.org/10.1609/aaai.v35i12.17325.
  74. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. arXiv preprint arXiv:2201.12740, 2022.
  75. Power time series forecasting by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.
  76. Modeling financial time series with S-PLUS, volume 2. Springer, 2006.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.