CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning
Abstract: Deep learning (e.g., Transformer) has been widely and successfully used in multivariate time series forecasting (MTSF). Unlike existing methods that focus on training models from a single modal of time series input, LLMs based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data. However, current LLM-based MTSF methods usually focus on adapting and fine-tuning LLMs, while neglecting the distribution discrepancy between textual and temporal input tokens, thus leading to sub-optimal performance. To address this issue, we propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF by reducing the distribution discrepancy between textual and temporal data, which mainly consists of the temporal target branch with temporal input and the textual source branch with aligned textual input. To reduce the distribution discrepancy, we develop the cross-modal match module to first align cross-modal input distributions. Additionally, to minimize the modality distribution gap in both feature and output spaces, feature regularization loss is developed to align the intermediate features between the two branches for better weight updates, while output consistency loss is introduced to allow the output representations of both branches to correspond effectively. Thanks to the modality alignment, CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks with low computational complexity, and exhibiting favorable few-shot and zero-shot abilities similar to that in LLMs. Code is available at https://github.com/Hank0626/LLaTA.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Multivariate time series dataset for space weather data analytics. Scientific data, 7(1):227, 2020.
- An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
- Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
- TEMPO: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948, 2023.
- N-HiTs: Neural hierarchical interpolation for time series forecasting. arXiv preprint arXiv:2201.12886, 2022.
- LLM4TS: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607. PMLR, 2020.
- Learning an augmented rgb representation with cross-modal knowledge distillation for action detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13053–13064, 2021.
- Periodicity decoupling framework for long-term series forecasting. In International Conference on Learning Representations, 2024.
- Long-term forecasting with TiDE: Time-series dense encoder. arXiv preprint arXiv:2304.08424, 2023.
- Forecasting natural gas consumption in istanbul using neural networks and multivariate time series methods. Turkish Journal of Electrical Engineering and Computer Sciences, 20(5):695–711, 2012.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789–1819, 2021.
- One-stage low-resolution text recognition with high-resolution knowledge transfer. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2189–2198, 2023.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Attention-guided answer distillation for machine reading comprehension. arXiv preprint arXiv:1808.07644, 2018.
- LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Time-LLM: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
- Cross-modal distillation for speaker recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12977–12985, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35:22199–22213, 2022.
- Large language models are few-shot health learners. arXiv preprint arXiv:2305.15525, 2023.
- iTransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625, 2023.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023.
- N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. International Conference on Learning Representations, 2019.
- Andrew Patton. Copula methods for forecasting multivariate time series. Handbook of economic forecasting, 2:899–960, 2013.
- Language models are unsupervised multitask learners. 2019.
- Spyros Makridakis. M4 dataset, 2018.
- Drive&segment: Unsupervised semantic segmentation of urban scenes via cross-modal distillation. In European Conference on Computer Vision, pages 478–495. Springer, 2022.
- MICN: Multi-scale local and global context modeling for long-term series forecasting. In International Conference on Learning Representations, 2022.
- ETSformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022.
- Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, 2021.
- TimesNet: Temporal 2d-variation modeling for general time series analysis. In International Conference on Learning Representations, 2023.
- Knowledge distillation meets self-supervision. In European Conference on Computer Vision, pages 588–604. Springer, 2020.
- Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023.
- Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In International Conference on Learning Representations, 2023.
- FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pages 27268–27286. PMLR, 2022.
- One Fits All: Power general time series analysis by pretrained lm. Advances in Neural Information Processing Systems, 36, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.