tsGT: Stochastic Time Series Modeling With Transformer
Abstract: Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
- Crps learning. Journal of Econometrics, 2021.
- Time series: theory and methods. Springer science & business media, 2009.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- N-hits: Neural hierarchical interpolation for time series forecasting, 2022.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021a.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021b.
- Tsmixer: An all-mlp architecture for time series forecasting. arXiv preprint arXiv:2303.06053, 2023.
- Towards learning universal hyperparameter optimizers with transformers. arXiv preprint arXiv:2205.13320, 2022.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- A toy model of universality: Reverse engineering how networks learn group operations. arXiv preprint arXiv:2302.03025, 2023.
- Subgoal search for complex reasoning tasks. Advances in Neural Information Processing Systems, 34:624–638, 2021.
- Engle, R. F. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the econometric society, pp. 987–1007, 1982.
- Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood. International Journal of forecasting, 20(4):629–645, 2004.
- Deep learning. MIT press, 2016.
- Probabilistic time series forecasting with implicit quantile networks. CoRR, abs/2107.03743, 2021. URL https://arxiv.org/abs/2107.03743.
- Gray, R. Vector quantization. IEEE Assp Magazine, 1(2):4–29, 1984.
- Large language models are zero-shot time series forecasters, 2023.
- Hamilton, J. D. Time series analysis. Princeton university press, 2020.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Hill, T. P. The significant-digit phenomenon. The American Mathematical Monthly, 102(4):322–327, 1995.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Forecasting: principles and practice. OTexts, Melbourne, Australia, 3rd edition, 2021. URL https://otexts.com/fpp3. Accessed on May 10, 2023.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
- Fqformer: A fully quantile transformer for time series forecasting. In 8th SIGKDD International Workshop on Mining and Learning from Time Series–Deep Forecasting: Models, Interpretability, and Applications, 2022.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=cGDAkQo1C0p.
- Bendr: using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data. Frontiers in Human Neuroscience, 15:653659, 2021.
- Forecasting deep learning dynamics with applications to hyperparameter tuning, 2020. URL https://openreview.net/forum?id=ByxHJeBYDB.
- Kupiec, P. Techniques for verifying the accuracy of risk management models. Journal of Derivatives, 3:73–84, 1995.
- Vector quantization: a weighted version for time-series forecasting. Future Generation Computer Systems, 21(7):1056–1067, 2005.
- Solving quantitative reasoning problems with language models. arXiv preprint arXiv:2206.14858, 2022.
- Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
- Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022.
- Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748–1764, 2021.
- Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach. Journal of empirical finance, 7(3-4):271–300, 2000.
- Quantitative risk management: concepts, techniques and tools-revised edition. Princeton university press, 2015.
- Magnushammer: A transformer-based approach to premise selection. arXiv preprint arXiv:2303.04488, 2023.
- Progress measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=9XFSbDPmdW.
- A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
- Nobel Foundation. The nobel prize in economic sciences 2003. https://www.nobelprize.org/prizes/economic-sciences/2003/summary/, 2003.
- Investigating the limitations of transformers with simple arithmetic tasks. arXiv preprint arXiv:2102.13019, 2021.
- Neural discrete representation learning. arXiv preprint arXiv:1711.00937, 2017.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- N-beats: Neural basis expansion analysis for interpretable time series forecasting, 2020.
- Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
- Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. arXiv preprint arXiv:2206.11251, 2022.
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
- Sutton, R. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
- Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
- Tsay, R. S. Analysis of Financial Time Series. Wiley, Hoboken, NJ, 3rd edition edition, 2010.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=NpsVSN6o4ul.
- Transformers in time series: A survey. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pp. 6778–6786. ijcai.org, 2023. doi: 10.24963/IJCAI.2023/759. URL https://doi.org/10.24963/ijcai.2023/759.
- Whittle, P. Hypothesis testing in time series analysis, volume 4. Almqvist & Wiksells boktr., 1951.
- Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022.
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
- Bloomberggpt: A large language model for finance, 2023.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712, 2023.
- Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Mathematical Geosciences, 50(2):209–234, 2018.
- Fast and precise: Adjusting planning horizon with adaptive subgoal search. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=7JsGYvjE88d.
- Are transformers effective for time series forecasting? In Williams, B., Chen, Y., and Neville, J. (eds.), Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pp. 11121–11128. AAAI Press, 2023. doi: 10.1609/AAAI.V37I9.26317. URL https://doi.org/10.1609/aaai.v37i9.26317.
- Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=vSVLM2j9eie.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 11106–11115. AAAI Press, 2021. doi: 10.1609/AAAI.V35I12.17325. URL https://doi.org/10.1609/aaai.v35i12.17325.
- Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. arXiv preprint arXiv:2201.12740, 2022.
- Power time series forecasting by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.
- Modeling financial time series with S-PLUS, volume 2. Springer, 2006.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.