TSLib: Time Series Analysis Toolkit
- Time Series Library (TSLib) is a software toolkit designed for the end-to-end analysis of temporal data, featuring standardized pipelines and modular designs.
- TSLib integrates diverse functionalities including data organization, deep learning modeling, robust preprocessing, and experimental reproducibility.
- TSLib supports multiple paradigms across R and Python to address time series challenges such as forecasting, anomaly detection, and imputation.
A Time Series Library (TSLib) is a software toolkit designed to facilitate the end-to-end analysis, modeling, and benchmarking of temporal or sequential data. Such libraries have emerged to address the intrinsic complexity, heterogeneity, and reproducibility challenges in time series research, spanning data organization, deep learning modeling, feature engineering, and pipeline orchestration. Library variants under the TSLib label have been implemented for different environments and purposes: as a benchmarking and modeling platform for deep time series models, as a "tidy" data container and manipulation infrastructure (notably in the R ecosystem), and as feature extraction frameworks. The following survey provides a technical overview and comparison of these paradigms, focusing on their design, underlying principles, component models, and integration strategies.
1. Motivation, Scope, and Historical Context
Time series data is characterized by sequences of measurements ordered by time, often displaying non-stationarity, regime changes, missingness, and heterogeneous domain structure (e.g., energy, healthcare, finance). Traditional toolkits suffer from non-unified data formats, opacity in time handling, and limited reproducibility. TSLib addresses these barriers by providing:
- Unified experiment pipelines for benchmarking deep learning models across classification, forecasting, imputation, anomaly detection, and (prospectively) clustering tasks (Wang et al., 2024).
- Rigorous data semantics via explicit encoding of time indices, keys, and relational attributes, notably in the tsibble conception within the R ecosystem (1901.10257).
- Flexible, modular APIs for plugging various model architectures, forecasting strategies, and preprocessing steps, as in Python-based TSLib and related libraries (Wang et al., 2024, Kostromina et al., 19 Sep 2025).
- Community-driven extension, reproducibility, and standardization, addressing subtle pipeline variants that previously compromised result comparability (Wang et al., 2024).
2. Design Architectures: Data Abstraction, Processing Layers, and Pipeline Management
tsibble-based TSLib (R)
The tsibble package embodies TSLib principles via a three-pillar design:
- Explicit Indexing: The time index is a visible, named column in data frames, ensuring that temporal semantics are carried throughout all transformations, in contrast to hidden attributes in base-R or wide matrix formats.
- Declarative Keying: One or more key columns uniquely identify each unit-trajectory over time, supporting both univariate and panel/multivariate structures.
- Interval Inference: For regularly spaced data, the interval is inferred by greatest common divisor computation on index differences; for irregular data, operations proceed with regularity flags (1901.10257).
The formal representation is:
where is the data (tibble), is the index, is the compound key, and (optional) is the interval. Injectivity, strict time ordering, and (optional) regularity are enforced as library invariants.
Python TSLib for Deep Learning
The Python-based TSLib (Wang et al., 2024) features four modular layers:
- Data Loader: Handles multiple raw formats (.csv, .npz, .txt), executes dataset-specific splits (train/val/test), and sliding window segmentation per experimental protocol.
- Preprocessing: Implements stationarization (e.g., Reversible Instance Norm), time and feature windowing, decomposition into trend/seasonal blocks, and basis/FFT expansions.
- Model Interface: All models subclass a
BaseModel, standardizing theforward,loss_fn, and training interface. - Training & Evaluation: Experiment orchestrator manages early stopping, logging, checkpointing, and metrics computation, with transparent GPU or CPU parallelism (Wang et al., 2024).
A high-level dependency graph is:
1 |
Data Source → Preprocessor → Model (with TaskHead) → Trainer → Evaluator |
This enables reproducible, plug-and-play model evaluation across domains.
3. Core Models, Features, and Analytical Tasks
TSLib instantiates a representative suite of deep architectures, spanning:
- MLP-based: N-BEATS (iterative backcast/forecast decomposition), DLinear (direct input-to-output mapping), TSMixer, FITS (Fourier domain scaling/phase shift).
- RNN-based: LSTNet, DA-RNN, DeepAR, DSSM, Mamba (state-space recurrence).
- CNN-based: WaveNet, TCN, SCINet, TimesNet (FFT and Inception for periodicity detection).
- GNN-based: DCRNN, STGCN, Graph WaveNet, MTGNN for spatio-temporal graphs.
- Transformer-based: Informer (ProbSparse attention), Autoformer (series decomposition + autocorrelation), FEDformer, Pyraformer, PatchTST (channel-independent patch embeddings), iTransformer.
Tasks supported include:
| Task Type | Description | Example Datasets |
|---|---|---|
| Forecasting | Map | ETT-h1/h2, ILI |
| Imputation | Recover missing for | SMD, MSL |
| Classification | heartbeat, SCP1/2 | |
| Anomaly Detection | Assign per time step | SMD, SWaT |
These models and tasks are instantiated via factory design patterns, further enabling extension to new architectures or domains (Wang et al., 2024).
4. Forecasting Strategies and Pipeline Operators
TSLib implementations, especially in the Python ecosystem (e.g., Tsururu), offer a taxonomy of multi-step forecasting strategies (Kostromina et al., 19 Sep 2025):
- Recursive: Train a 1-step-ahead predictor , roll forward recursively.
- Direct: Train independent predictors for horizon steps.
- MIMO: One model predicts the entire output vector per input.
- Rec-MIMO: Hybrid; predictions made, rolled out in batches.
- FlatWideMIMO: Single-step prediction with horizon index as explicit input.
Pipeline operators in tsibble TSLib (R) or similar frameworks include time-gap management (has_gaps, fill_gaps), row-wise filtering, feature and time-based aggregation (index_by, group_by_key), reshaping (pivot_longer, pivot_wider), rolling/sliding window operations (slide, tile, stretch), and integration with visualization/modeling APIs (1901.10257). Tsururu adds normalization strategies (e.g., LastKnownNormalizer, StandardScaler) with empirical evidence that certain normalizations significantly impact validation and test set performance (Kostromina et al., 19 Sep 2025).
5. Evaluation Protocols, Metrics, and Benchmarking
TSLib emphasizes reproducibility and comparability by enforcing standardized pipelines for data splits, preprocessing, model training, and metric computation (Wang et al., 2024):
- Metrics: MAE, MSE, SMAPE, MASE for regression; Accuracy, F1 for classification/anomaly.
- Benchmarking: Evaluations across up to 30 datasets spanning energy, economics, health, anomaly logs. Direct comparisons are made under identical hyperparameter settings.
- Empirical Findings: MLP-based DLinear and N-BEATS excel in long-term forecasting but are less effective for classification or anomaly detection. CNN-based (TimesNet, SCINet) and Transformer-based (PatchTST, Autoformer) models lead in their respective domains under fair evaluation (Wang et al., 2024).
- Best Practices: Always apply normalization, leverage stationarization, and test both unified and task-specific hyperparameter configurations.
6. Extensibility, Integration, and Industry Application
TSLib frameworks (both R and Python) are designed for modular extension:
- New Models and Datasets: Simple inheritance from
BaseModel/BaseDatasetand registration withModelFactory/DatasetFactory(Python) or functionally via the tidyverse in R (Wang et al., 2024, 1901.10257). - Feature Extraction Integration: Libraries like FATS specialize in extracting curated sets of >60 time series features, from basic statistics to advanced autocorrelation and Lomb–Scargle periodogram metrics (Nun et al., 2015). These features can be integrated into machine learning pipelines (e.g., scikit-learn) or downstream TSLib workflows.
- Interoperability: Tsururu demonstrates fit/predict wrappers accommodating external models (ARIMA, Prophet, LSTM, XGBoost) under a unified strategy and preprocessing interface (Kostromina et al., 19 Sep 2025).
- Production Readiness: Automated backtesting, rolling validation, and support for non-aligned, heterogeneous series are realized in Python-based pipelines (Kostromina et al., 19 Sep 2025).
7. Comparison, Limitations, and Recommendations
A comparison of different TSLib paradigms reveals:
| Library | Primary Focus | Data Handling | Models Included | Extensibility |
|---|---|---|---|---|
| tsibble (R) | Tidying, manipulation | Explicit index/key | Integrates with fable/mable | High via verbs, subclassing |
| TSLib (Python) | Deep model benchmarking | Unified API, config | 24+ deep models | Model/dataset subclassing |
| Tsururu (Py) | Forecasting strategies | Modular pipelining | Plug-in models | Custom strat/models |
| FATS (Python) | Feature extraction | Mag, time, errors | >60 feature fncs | Feature plugin, scikit-learn |
- TSLib frameworks enforce explicitness and preserve temporal semantics, mitigating common sources of pipeline error such as hidden time indices or discarded keys (1901.10257).
- Direct support for panel, irregular, and event-driven data, as well as built-in unit testing and validation, increases reproducibility.
- For forecasting, ablation over global/multivariate fitting, normalization, and strategy is recommended; empirical benchmarks show significant effects depending on data alignment, horizon, and architecture (Kostromina et al., 19 Sep 2025).
- Use cases extend from energy prediction and healthcare event detection to astronomical light curve analysis and financial volatility indexing.
A plausible implication is that future TSLib developments will further unify processing across hybrid tasks and leverage ever-more expressive model architectures, while continuing to foreground transparency and reproducibility.