2D Pixel Time Series Representations

Updated 14 January 2026

Two-dimensional pixel time series representations transform time-ordered data into spatially structured grids, unveiling both intra- and inter-period patterns.
Methods such as period folding, recurrence plots, and raster cube approaches enable tailored encoding for frequency analysis, spatial locality, and multimodal signals.
These representations enhance forecasting, classification, and event detection while providing efficient compression and support for interactive visual analytics.

A two-dimensional pixel time series representation is an approach to encode time series data—including both multivariate and imaging sources—into spatially organized 2D (or multi-channel) pixel grids. These representations facilitate downstream analysis and learning by 2D convolutional neural networks (CNNs), exploit spatial inductive biases, enable interactive and information-dense visualization, and support domain-specific tasks via transformation, folding, or alignment with raw and derived signals. A wide spectrum of pixel time series methods exists, encompassing dense visual analytics (Schlegel et al., 2024), periodic and derivative decomposition (Nematirad et al., 31 Mar 2025), recurrence and return plots (Stival et al., 7 Jan 2026, Hellermann et al., 2021), matrix folding and coding (Lian et al., 2022), SVD-based tensorization (Khoshrou et al., 2018), raster cube approaches (Cruces et al., 2019), disentangled image timeseries encoders (Sanchez et al., 2019), and idiomatic time series plot conversion for CNNs (Rodrigues et al., 2021).

1. Construction Principles and Canonical Mappings

The foundational step is converting a time series or time-ordered sequence to a two-dimensional array suitable for pixel encoding. For univariate periodic processes, reshaping is often performed via period folding and matrix construction:

For frequency and period decomposition, methods such as FFT discover dominant cycle length $\{p_i\}$ and fold each window to $(p_i \times f_i)$ matrices, exposing intra-period structure in rows and inter-period patterns in columns (Nematirad et al., 31 Mar 2025, Khoshrou et al., 2018).
In raster domains, each 2D spatial slice (e.g., imagery or gridded geosensors) across contiguous time points $t$ is stacked to form a cube: $R[x, y, t]$ , then reordered via Morton codes for spatial locality and compressed via succinct $k^3$ -trees (Cruces et al., 2019).
For dense pixel visual analytic displays (e.g., DAVOTS), each timeseries sample $x^{(i)}$ is rendered as a row of contiguous pixel blocks, with each block encoding raw values, histograms, activations, attribution scores, and class predictions side-by-side (Schlegel et al., 2024).

In multivariate or multimodal settings, folding and channelization schemes are adopted:

Data Folding and Hyperspace Coding (DFHC) takes $X \in \mathbb{R}^{T \times D}$ , flattens into a vector, then reshapes to $H \times W \times C$ pixel grids via explicit folding functions. Gray and RGB codings preserve channel identity and intra/inter-channel interactions (Lian et al., 2022).
Pixel-wise contrastive multimodal learning encodes per-pixel time series as normalized distance or recurrence plots in multi-channel images, aligning spatial and temporal characteristics for downstream multimodal representation learning (Stival et al., 7 Jan 2026).

2. Types of Pixel Time Series Representations

A summary of common pixel time series encoding families:

Encoding Scheme	Data Structure	Principal Use Cases
Dense Pixel Rows	$N \times W$ images; blocks for raw, attribution, prediction (Schlegel et al., 2024)	XAI, model attribution
Period Folded Matrix	$(p_i \times f_i)$ or $(p_i \times f_i)$ 0 (Khoshrou et al., 2018, Nematirad et al., 31 Mar 2025)	Forecasting, decomposition
Gray & RGB Folding	$(p_i \times f_i)$ 1 via vectorization (Lian et al., 2022)	Classification, action recognition
Recurrence/Return Plot	$(p_i \times f_i)$ 2 channel images (Stival et al., 7 Jan 2026, Hellermann et al., 2021)	Feature extraction, contrastive learning, generative modeling
Raster Cube (Morton)	$(p_i \times f_i)$ 3 binary grid, $(p_i \times f_i)$ 4-tree (Cruces et al., 2019)	Geospatial, weather, compact indexing

Each method’s encoding is tuned to specific temporal and/or spatial regularities, such as diurnal cycles, local anomalies (handled via derivative heatmaps), or event-based recurrence.

3. Feature Extraction and Learning Approaches

Two-dimensional pixel representations enable application of mature image-modeling frameworks and facilitate hybrid or multimodal encoders:

Convolutional Neural Networks (CNNs) are leveraged for 2D convolutions over pixel matrices. In Times2D, simultaneous intra-period and inter-period convolution enhances capture of short- and long-term phenomena (Nematirad et al., 31 Mar 2025). Recurrence plots and extended intertemporal return plots allow direct processing by image-based encoders and generative adversarial networks (GANs) (Hellermann et al., 2021, Stival et al., 7 Jan 2026).
Disentangled representation learning is implemented by encoder pairs (shared and exclusive), where temporal and static image content are decoupled and used for segmentation, retrieval, and change detection (Sanchez et al., 2019).
Data folding approaches (DFHC) exploit 2D CNN inductive bias with coding variations (e.g., transform-based, step, RGB, gray), optimizing feature extraction depending on channel structure and signal frequency (Lian et al., 2022).

Contrastive learning on multimodal pixel representations achieves state-of-the-art transfer and extraction performance for pixel-level and image-level tasks (Stival et al., 7 Jan 2026).

4. Visualization, Interaction, and Explainability

Dense pixel representations support high-density analytics and exploratory interfaces:

DAVOTS implements interactive visual analytics where per-row samples display raw series, activations, and model attributions in contiguous pixel blocks. Features such as clustering-based ordering, histogram overlays, row-brush selection, and tooltips facilitate in-depth data/model exploration (Schlegel et al., 2024).
Hierarchical clustering over pixel rows, using Euclidean, normalized Euclidean, or Pearson-based distances, enables pattern and motif recognition in large time series corpora, enhancing subgroup identification and pattern ordering.
In SVD-based pixel time series visualization, low-rank approximations (via Frobenius norm minimization and SSIM evaluation) yield cleaner displays and focus attention on interpretable granularities (Khoshrou et al., 2018).

Guidelines prioritize normalization, color mapping (diverging for raw/attribution, sequential for histogram), and immediate display of distributional context via block-histograms (Schlegel et al., 2024).

5. Compression, Indexing, and Efficient Query

Spatial and temporal locality in pixel time series representations can be exploited for high compression and rapid queries:

The raster time series $(p_i \times f_i)$ 5-tree compresses space by clustering 1-bits along pixel, value, and time axes. Compression ratios up to 70% over baseline $(p_i \times f_i)$ 6-tree approaches are typical for high-frequency datasets (Cruces et al., 2019).
Query primitives—point lookup, window, spatio-temporal range, value-constrained—are performed via Morton-order quadbox decomposition and $(p_i \times f_i)$ 7-tree traversal, attaining low-microsecond to sub-millisecond latencies (output-size dependent).
DFHC and periodic folding methods are computationally efficient ( $(p_i \times f_i)$ 8 for folding, quasi-linear for transform codes), and support augmentation via sliding windows or super-resolution (Lian et al., 2022).

Temporal resolution and spatial clustering directly impact compression; high locality yields optimal storage and performance.

6. Benchmarking and Empirical Performance

Several studies benchmark the effectiveness of pixel time series representations:

CNN classifiers on line plot images outperform advanced 1D and time series-specific models on multiple UCR datasets and custom benchmarks (OPTOX, FISIO), achieving up to 0.9765 median test accuracy (Rodrigues et al., 2021).
DFHC encoding yields 100% accuracy on Parkinson’s diagnosis, 92.86% on bearing fault detection, and 99.7% on gymnastics recognition, surpassing classical time series methods (Lian et al., 2022). Step and transform-coded variants enable adaptation to signal frequency.
Contrasting on pixel-wise recurrence plots (NDVI/EVI/SAVI) results in 97.44% ACC on EuroSAT and exceeds 1D CNN, XGBoost, and MOMENT for pixel-level tasks (Stival et al., 7 Jan 2026).
SVD visualizations optimize low-rank approximations for visual clarity; selection of rank $(p_i \times f_i)$ 9 balances SSIM and representation fidelity (Khoshrou et al., 2018).
GAN-based image models (WGAN-GP) on XIRP representations produce synthetic time series with reduced forecasting error (e.g., 3.66% MAPE, 0.0216 CRPS), outperforming TimeGAN and other supervised generative approaches (Hellermann et al., 2021).

7. Extensions, Limitations, and Future Directions

Key extensions and domain adaptations:

Multi-scale folding and channel mosaicking accommodate higher-dimensional time series or heterogeneous sensor clusters (Lian et al., 2022).
Learnable folding, super-resolution enhancement, and differentiable mapping functions offer the prospect of data-driven spatial arrangement and refinement (Lian et al., 2022).
Times2D fusion of multiple periods and derivatives in unified pixel space enables CNN pattern-matching across frequencies and highlights events and anomalies (Nematirad et al., 31 Mar 2025).
Application domains include remote sensing, planetary observation, geospatial analysis, smart energy, fault detection, biomedical waveform generation, and synthetic augmentation (Sanchez et al., 2019, Hellermann et al., 2021, Stival et al., 7 Jan 2026).

Known limitations include dependency on regular sampling, impact of folding/agglomeration choices, sensitivity to colormap mapping, and the need for adaption to irregular or event-driven time series. A plausible implication is that ongoing research will target learnable and domain-adaptive folding, as well as the development of efficient indexing structures and rendering toolboxes tailored for dense pixel time series representations.

In sum, two-dimensional pixel time series representations supply an expressive, computationally tractable, and visually interpretable substrate for temporal process analysis, leveraging advances in image modeling, interactive analytics, and spatial indexing across diverse scientific and engineering domains (Schlegel et al., 2024, Nematirad et al., 31 Mar 2025, Lian et al., 2022, Cruces et al., 2019, Hellermann et al., 2021, Stival et al., 7 Jan 2026, Khoshrou et al., 2018, Sanchez et al., 2019, Rodrigues et al., 2021).