Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain
Abstract: Time series has been left behind in the era of pre-training and transfer learning. While research in the fields of natural language processing and computer vision are enjoying progressively larger datasets to train massive models, the most popular time series datasets consist of only tens of thousands of time steps, limiting our ability to study the effectiveness of pre-training and scaling. Recent studies have also cast doubt on the need for expressive models and scale. To alleviate these issues, we introduce three large-scale time series forecasting datasets from the cloud operations (CloudOps) domain, the largest having billions of observations, enabling further study into pre-training and scaling of time series models. We build the empirical groundwork for studying pre-training and scaling of time series models and pave the way for future research by identifying a promising candidate architecture. We show that it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method - achieving a 27% reduction in error on the largest dataset. Code and datasets can be found https://github.com/SalesforceAIResearch/pretrain-time-series-cloudops.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
- Deep learning for time series forecasting: Tutorial and literature survey. ACM Computing Surveys, 55(6):1–36, 2022.
- GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp. 95–136, virtual+Dublin, May 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.bigscience-1.9. URL https://aclanthology.org/2022.bigscience-1.9.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
- Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges, 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 153–167, 2017.
- Normalizing kalman filters for multivariate time series analysis. Advances in Neural Information Processing Systems, 33:2995–3007, 2020.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi:10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- Unified language model pre-training for natural language understanding and generation. Advances in neural information processing systems, 32, 2019.
- Tactis: Transformer-attentional copulas for time series. In International Conference on Machine Learning, pp. 5447–5493. PMLR, 2022.
- Mitigating cold-start forecasting using cold causal demand forecasting model. arXiv preprint arXiv:2306.09261, 2023.
- Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021. URL http://jmlr.org/papers/v22/20-451.html.
- StatsForecast: Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https://github.com/Nixtla/statsforecast.
- Probabilistic forecasting with spline quantile function rnns. In The 22nd international conference on artificial intelligence and statistics, pp. 1901–1910. PMLR, 2019.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
- Monash time series forecasting archive. In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
- Robust random cut forest based anomaly detection on streams. In International conference on machine learning, pp. 2712–2721. PMLR, 2016.
- Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In Proceedings of the International Symposium on Quality of Service, pp. 1–10, 2019.
- Gaussian error linear units (gelus), 2023.
- Forecast evaluation for data scientists: common pitfalls and best practices. Data Mining and Knowledge Discovery, 37(2):788–832, 2023.
- Automatic anomaly detection in the cloud via statistical learning. arXiv preprint arXiv:1704.07706, 2017.
- Forecasting: principles and practice. OTexts, 2018.
- Automatic time series forecasting: the forecast package for r. Journal of statistical software, 27:1–22, 2008.
- ForecastPFN: Universal forecasting for healthcare. In ICLR 2023 Workshop on Time Series Representation Learning for Health, 2023. URL https://openreview.net/forum?id=ru_NsRoUUk.
- Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748–1764, 2021.
- Non-stationary transformers: Exploring the stationarity in time series forecasting. Advances in Neural Information Processing Systems, 35:9881–9893, 2022.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Intelligent virtual machine provisioning in cloud computing. In IJCAI, pp. 1495–1502, 2020.
- A survey on time-series pre-trained models. arXiv preprint arXiv:2305.10716, 2023.
- Statistical and machine learning forecasting methods: Concerns and ways forward. PloS one, 13(3):e0194889, 2018.
- Scoring rules for continuous probability distributions. Management science, 22(10):1087–1096, 1976.
- A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Jbdc0vTOcol.
- Codegen: An open large language model for code with multi-turn program synthesis. ICLR, 2023.
- N-beats: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1ecqn4YwB.
- Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9242–9250, 2021.
- Generative agents: Interactive simulacra of human behavior, 2023.
- Learning quantile functions without quantile crossing for distribution-free time series forecasting. In International Conference on Artificial Intelligence and Statistics, pp. 8127–8150. PMLR, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pp. 8857–8868. PMLR, 2021a.
- Multivariate probabilistic time series forecasting via conditioned normalizing flows. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=WiGQBFuVRv.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- High-dimensional multivariate forecasting with low-rank gaussian copula processes. Advances in neural information processing systems, 32, 2019.
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
- statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference, 2010.
- Investigating the accuracy of cross-learning time series forecasting methods. International Journal of Forecasting, 37(3):1072–1084, 2021.
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
- Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053, 2017.
- John Wilkes. More Google cluster data. Google research blog, November 2011. Posted at http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html.
- A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989.
- CoST: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=PilZY3omXV2.
- Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022b.
- Learning deep time-index models for time series forecasting. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 37217–37237. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/woo23b.html.
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
- On layer normalization in the transformer architecture. In International Conference on Machine Learning, pp. 10524–10533. PMLR, 2020.
- Fits: Modeling time series with 10k10𝑘10k10 italic_k parameters. arXiv preprint arXiv:2307.03756, 2023.
- Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8980–8987, 2022.
- Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp. 11121–11128, 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 11106–11115, 2021.
- Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pp. 27268–27286. PMLR, 2022.
- One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.