Papers
Topics
Authors
Recent
Search
2000 character limit reached

TS-TCC: Temporal & Contextual Contrast in Time-Series

Updated 21 January 2026
  • The paper demonstrates that combining cross-view temporal prediction with global contextual discrimination yields robust and transferable time-series features, outperforming previous methods.
  • It details a dual-branch architecture using weak and strong augmentations to capture fine-grained temporal dynamics and preserve global context.
  • Empirical evaluations on HAR, Sleep-EDF, and Epilepsy datasets show improvements in accuracy and transferability, even in few-shot scenarios.

Time-Series Representation Learning via Temporal and Contextual Contrasting (TS-TCC) is a self-supervised framework for learning expressive features from unlabeled time-series data by combining cross-view temporal prediction and global contextual discrimination. TS-TCC is specifically tailored to temporal signals, addressing the limitations of prior contrastive paradigms developed for spatially-structured domains, such as images, by integrating both robust temporal modeling and context-aware contrastive objectives (Eldele et al., 2021, Eldele et al., 2022).

1. Conceptual Motivation and Framework Overview

The core motivation for TS-TCC lies in the challenge of extracting discriminative representations from unlabeled time-series, where complex temporal dependencies coexist with high labeling costs. While established contrastive learning methods (e.g., SimCLR, MoCo, CPC) succeed in spatial domains, they insufficiently preserve temporal structure when naively applied to sequence data, especially under standard augmentations.

TS-TCC introduces a dual-branch paradigm: each raw sequence is transformed into two correlated views via distinct augmentation pipelines (a “weak” and a “strong” view), designed to perturb amplitude, order, and fine-grained temporal relations. The self-supervised signal is established by two modules:

  • A temporal contrasting (TC) module that enforces cross-view future prediction to capture invariant, temporally aware features;
  • A contextual contrasting (CC) module that aligns global context vectors from different views of the same sequence while discriminating across sequences.

This structure yields a representation that is both temporally robust and contextually discriminative.

2. Time-Series-Specific Augmentation Strategies

TS-TCC employs augmentation techniques specifically devised for temporal signals, in contrast to the image-centric operations typical in prior contrastive SSL.

Weak augmentations (Tw\mathcal{T}_w) maintain global shape and primary dynamics:

  • Jitter: Additive small-variance Gaussian noise per channel, post min–max normalization (σ[0,0.1]\sigma \in [0, 0.1]).
  • Scaling: Multiply sequence by a global random factor sampled from [1/s,s][1/s, s] (typically s=2s=2).
  • Time-shift: (Extension in (Eldele et al., 2022)) Circularly shift the sequence by up to ±p%\pm p\%.

Strong augmentations (Ts\mathcal{T}_s) introduce substantial distortion:

  • Permutation: Segmentation of xx into MM contiguous parts (MM dataset-dependent; e.g., M=10M=10 for HAR, M=20M=20 for Sleep-EDF), followed by random reordering.
  • High-variance jitter: Additive Gaussian noise with σ[0.1,1]\sigma \in [0.1, 1].
  • Compositional perturbations: e.g., simultaneous permutation and strong jitter.

Each input xx is transformed into: xwTw(x),xsTs(x)x^w \sim \mathcal{T}_w(x), \quad x^s \sim \mathcal{T}_s(x) These augmentations enable TS-TCC to capture invariance to amplitude changes and time order distortions while preserving temporal coherence.

3. Temporal Contrasting Module

The temporal contrasting module is designed to encourage representations that encode temporal dependencies robust to augmentation. The flow is as follows:

  • Both xwx^w and xsx^s are processed by a shared encoder fencf_{\textrm{enc}} (a 3-block 1D CNN yielding per-timestep embeddings ztRdz_t \in \mathbb{R}^d), optionally followed by an MLP.
  • Each sequence of embeddings is passed through an autoregressive Transformer farf_{ar} (4 layers, h=100h=100, 4 heads, pre-norm, dropout 0.1), prepending a learned “context token.”
  • At each time tt, the output ctRh\mathbf{c}_t\in\mathbb{R}^h serves as a summary of ztz_{\leq t}.

Cross-View Future Prediction:

At a given time tt and offset kk (1kK1\leq k \leq K, with K0.4TK \sim 0.4T), the context vector from one view is used to predict the future latent of the other view. The prediction is parameterized by linear projections {Wk}\{W_k\}: y^=Wkctszt+kw\hat{y} = W_k c_t^s \approx z_{t+k}^w and vice versa. The prediction is optimized with an InfoNCE-style loss contrasting the pair (Wkcts,zt+kw)\left( W_kc_t^s,\, z_{t+k}^w\right) against negatives znwz_n^w from other sequences: LTCs=1Kk=1Klogexp(sim(Wkcts,zt+kw)/τ)nNt,kexp(sim(Wkcts,znw)/τ)\mathcal{L}_{TC}^s = -\frac{1}{K}\sum_{k=1}^K \log \frac{ \exp(\mathrm{sim}(W_k c_t^s, z_{t+k}^w)/\tau) }{ \sum_{n \in \mathcal{N}_{t,k}} \exp(\mathrm{sim}(W_k c_t^s, z_n^w)/\tau) } The total temporal contrasting loss is Ltemp=LTCs+LTCw\mathcal{L}_{temp} = \mathcal{L}_{TC}^s + \mathcal{L}_{TC}^w.

This enforces not only invariance to augmentation but also promotes modeling of sequence evolution across perturbations.

4. Contextual Contrasting Module

Following temporal contrasting, TS-TCC extracts global context vectors—usually the output of the “CLS” token of the Transformer at the final position—for each view. These vectors, cTwc_T^w and cTsc_T^s, are further processed by an MLP projection head to produce context embeddings uiu_i.

Given a batch of NN sequences, the set of $2N$ context vectors is used in an instance-wise InfoNCE contrastive task. For each sequence, the two views (ui,ui+{u_i, u_{i^+}}) form a positive pair (ii indexes one view, i+i^+ the alternative view). All other vectors in the batch serve as negatives.

The contextual contrasting loss is: (i,i+)=logexp(sim(ui,ui+)/τ)m=12N1[mi]exp(sim(ui,um)/τ)\ell(i,i^+) = -\log\frac{ \exp(\mathrm{sim}(u_i,u_{i^+})/\tau) }{ \sum_{m=1}^{2N} \mathbf{1}_{[m \ne i]} \exp(\mathrm{sim}(u_i,u_m)/\tau) }

LCC=12Nk=1N[(2k1,2k)+(2k,2k1)]\mathcal{L}_{CC} = \frac{1}{2N}\sum_{k=1}^N [\ell(2k-1,2k) + \ell(2k,2k-1)]

This loss maximizes agreement between global summaries of the two views of each sample, thereby enhancing sample-level discrimination.

5. Joint Objective and Network Architecture

The complete TS-TCC loss is a weighted sum: Lunsup=λ1(LTCs+LTCw)+λ2LCC\mathcal{L}_{\textrm{unsup}} = \lambda_1(\mathcal{L}_{TC}^s + \mathcal{L}_{TC}^w) + \lambda_2 \mathcal{L}_{CC} Empirically, λ1=1\lambda_1=1, λ2=0.7\lambda_2=0.7 yield stable results.

Network architecture:

  • Encoder: 3-block 1D CNN (Conv–BatchNorm–ReLU–Dropout–MaxPool), d=128d=128 per-timestep features.
  • Autoregressive head: 4-layer Transformer (h=100h=100 for most datasets).
  • Projection head: two-layer MLP for CC/SCC.
  • Optimization: Adam, lr=3×1043\times10^{-4}, weight decay=3×1043\times10^{-4}, β1=0.9\beta_1=0.9, β2=0.99\beta_2=0.99. Batch size: 128.
  • Augmentation settings: permutation segments M=10M=10 (UCI HAR), M=12M=12 (Epilepsy), M=20M=20 (Sleep-EDF); scale ratio=2; jitter σ\sigma as above; temperature τ=0.2\tau=0.2; K0.4TK\approx0.4T for future step range.

This modular design is broadly compatible with univariate and multivariate time-series and is dataset-agnostic aside from augmentation tuning.

6. Empirical Evaluation and Performance

TS-TCC was evaluated on multiple real-world datasets: UCI HAR (9-axis motion, 6 classes), Sleep-EDF EEG (single channel, 5 sleep stages), Epileptic Seizure Recognition (single channel, binary), as well as a fault diagnosis transfer setting and UCR benchmark datasets (Eldele et al., 2021, Eldele et al., 2022).

Linear evaluation (encoder frozen):

Dataset Random SSL-ECG CPC SimCLR TS-TCC Supervised
HAR (ACC) 57.9 65.3 83.8 81.0 90.4 90.1
Sleep (ACC) 35.6 74.6 82.8 78.9 83.0 83.4
Epilepsy (ACC) 90.3 93.7 96.6 96.1 97.2 96.7

Few-shot or semi-supervised fine-tuning:

  • With 1% labels, TS-TCC achieves \sim70% (HAR) and \sim90% (Epilepsy) MF1, significantly exceeding supervised (which drops below 50%).
  • With 10% labeled data, TS-TCC performance is within 2% of full-supervision on all datasets.

Transfer learning:

On the four-domain fault-diagnosis dataset (12 domain pairs):

  • Supervised pretrain + fine-tune: 63.8% accuracy
  • TS-TCC pretrain + fine-tune: 67.8% (+4.0% absolute gain)

This suggests that TS-TCC representations possess strong domain transferability even with minimal downstream labels.

7. Methodological Variants, Ablations, and Extensions

Ablation studies demonstrate:

  • TC only: Same-view prediction yields significantly lower accuracy (e.g., HAR ACC \sim82.8%).
  • Adding cross-aug prediction: Both strong\toweak and weak\tostrong future prediction improves accuracy (HAR \sim87.9%).
  • Full TS-TCC (TC + CC): Further improvement (HAR \sim90.4%). Single-augmentation variants show a sharp decline, especially on HAR and Sleep.

Sensitivity Analysis:

Using KK set to 40% of sequence length balances context diversity and temporal difficulty. Loss weights λ1=1\lambda_1=1, λ2=0.7\lambda_2=0.7 are robust to moderate perturbations.

Semi-supervised extension (CA-TCC) (Eldele et al., 2022):

With limited labels, CA-TCC leverages pseudo-labels after self-supervised pretraining. In place of the unsupervised contextual contrastive loss, it introduces a class-aware (supervised) contrastive loss:

  • For batch index ii and pseudo-labels {y^i}\{\hat{y}_i\}, positives are all other batch members with y^p=y^i\hat{y}_p=\hat{y}_i, negatives are the rest.
  • Empirically, with 1% labels, CA-TCC attains 77.8% accuracy and 72.6% MF1 (10 datasets), besting baselines such as MeanTeacher or FixMatch.

Limitations:

Current TS-TCC evaluations are restricted to single-modality (univariate or multivariate) time-series. Augmentation strategies are fixed rather than learned or adaptive, and the autoregressive modeling capacity is limited to moderate-sized Transformers.

8. Conclusion and Significance

TS-TCC defines a general approach for self-supervised sequence representation learning tailored to time-series, exploiting both cross-view temporal prediction and global contextual contrasting. In benchmarks, it establishes or matches fully supervised performance, excels in few-shot and transfer settings, and serves as a foundation for extensible semi-supervised variants (CA-TCC). The empirical results underscore the efficacy of combining temporal and contextual signals via contrasting for robust, transferable time-series representations (Eldele et al., 2021, Eldele et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Time-Series Representation Learning via Temporal and Contextual Contrasting (TS-TCC).