Large-scale pre-training and zero-shot forecasting for Seg-MoE

Investigate large-scale pre-training strategies for the segment-wise Mixture-of-Experts architecture Seg-MoE and evaluate its zero-shot forecasting performance on long-term multivariate time-series benchmarks, determining whether pre-trained Seg-MoE models can perform accurate zero-shot forecasting without task-specific fine-tuning.

Background

The paper benchmarks Seg-MoE against several large time-series foundation models (e.g., Timer-XL, Time-MoE, Moirai) that leverage large-scale pre-training and often support zero-shot forecasting. In contrast, the reported Seg-MoE results are obtained without any pre-training.

Despite strong from-scratch performance, the authors explicitly note that extending Seg-MoE with large-scale pre-training and assessing zero-shot forecasting is left for future work, highlighting the need to develop suitable pre-training regimes and evaluate zero-shot capabilities for the proposed segment-wise Mixture-of-Experts architecture.

References

We note that all results are obtained with no pre-training. Investigating large-scale pre-training and zero-shot forecasting for is a promising direction, but we leave it for future work.

Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers  (2601.21641 - Ortigossa et al., 29 Jan 2026) in Appendix: Additional Experimental Results, Subsection "Additional Baselines"