Information-Weighted SAX

Updated 17 February 2026

Information-Weighted SAX is an advanced extension of SAX that integrates segment information measures and data-driven discretization for non-Gaussian, heterogeneous time series.
It utilizes methods like Particle Swarm Optimization and kernel density estimation to optimize segment weights and establish adaptive breakpoints.
These enhancements lead to improved lower-bound tightness, reduced reconstruction error, and superior performance in classification, clustering, and anomaly detection.

Information-Weighted SAX (Symbolic Aggregate approXimation) refers to a family of extensions and modifications of the original SAX algorithm. These variants optimize the process of converting real-valued time series into symbolic sequences by directly addressing the information content of each segment and/or adapting the discretization boundaries to reflect empirical, potentially non-Gaussian data distributions. Two principal research directions have established this concept: (1) explicit segment weighting based on information content, often optimized via metaheuristics such as Particle Swarm Optimization (PSO); and (2) nonparametric, data-driven determination of discretization intervals using kernel density estimation or related quantization techniques, sometimes called information-driven or distribution-agnostic SAX. Both directions preserve the lower-bounding property essential for time series indexing and search.

1. Motivation and Limitations of Classical SAX

The original SAX algorithm encodes a normalized time series $X$ of length $N$ into a word of length $M$ using the following steps: (i) z-normalization to zero mean and unit variance; (ii) segmentation via Piecewise Aggregate Approximation (PAA) into $M$ non-overlapping blocks, each summarized by its sample mean; (iii) discretization of PAA values using breakpoints computed under the Gaussian $N(0,1)$ assumption to yield equiprobable symbols from an alphabet of size $\kappa$ (Bountrogiannis et al., 2021, Kloska et al., 2022, Fuad, 2013). This yields symbolic words suitable for fast comparison with the MINDIST metric.

However, two key limitations have been identified:

The Gaussian/equiprobable symbol assumption is often violated in real-world or high-correlation data, leading to suboptimal discretization and reduced lower-bound tightness.
The uniform importance assumed for each segment during distance computation ignores local information content; many time series are heterogeneous in informativeness across time.

Information-Weighted SAX variants address these problems by either optimizing segment weights according to information content or by deriving discretization intervals that are "information-aware" in a distribution-agnostic manner.

2. Segment Information Content and Weighted SAX Distances

Explicit information-weighting of SAX segments begins by defining a per-segment measure of informativeness (Fuad, 2013):

Variance-based content: For segment $i$ of $T = [t_1, ..., t_n]$ (normalized and divided into $N$ segments), compute

$\mathrm{Var}_i = \frac{1}{n/N} \sum_{j=1}^{n/N}(t_{(i-1)n/N+j} - \mu_i)^2, \;\text{where}\; \mu_i = \frac{1}{n/N}\sum_j t_{(i-1)n/N+j}$

Entropy-based content: Discretize segment values into $B$ bins, estimate probabilities $p_{i,k}$ , then

$H_i = -\sum_{k=1}^{B} p_{i,k}\log_2 p_{i,k}$

A normalized weight vector $\tilde{w}_i = \frac{I_i}{\sum_{j=1}^N I_j}$ is obtained, where $I_i$ is either $\mathrm{Var}_i$ or $H_i$ .

The classic MINDIST symbolic distance is replaced by the Weighted Minimum Distance (WMD):

$\text{WMD}(S,R) = \sqrt{\frac{n}{N} \sum_{i=1}^N w_i [\text{dist}(s_i, r_i)]^2}$

where $w_i \in [0,1]$ (Fuad, 2013). This preserves the lower-bounding property: WMD $\le$ MINDIST $\le$ Euclidean distance.

3. Optimizing Segment Weights via Particle Swarm Optimization

The setting of segment weights can be directly optimized for improved downstream task performance (e.g., classification error) using Particle Swarm Optimization (PSO) (Fuad, 2013). The PSO process is as follows:

Each PSO particle represents a candidate weight vector $\mathbf{x}_p = [w_{p,1}, ..., w_{p,N}] \in [0,1]^N$ .
Position and velocity vectors are iteratively updated using cognitive and social learning rules:

$\mathbf{v}_p^{k+1} = \omega^k \mathbf{v}_p^k + c_1 r_1 (\mathbf{l}_p^k - \mathbf{x}_p^k) + c_2 r_2 (\mathbf{g}^k - \mathbf{x}_p^k)$

$\mathbf{x}_p^{k+1} = \mathbf{x}_p^k + \mathbf{v}_p^{k+1}$

The fitness function is the leave-one-out 1-NN classification error using WMD.
Typical parameters: swarm size $=16$ , iterations $=20$ , $c_1 = c_2 = 2.0$ , inertia weight $\omega^k = 1 - k/\text{nItr}$ .

Empirical evaluations on UCR datasets show that PSO-optimized weights systematically reduce the classification error compared to unweighted MINDIST, particularly for series with pronounced local features (Fuad, 2013).

4. Distribution-Agnostic and Information-Driven Discretization

A related family of information-weighted SAX approaches eliminates the need for parametric (Gaussian) breakpoint computation by fitting a kernel density estimate (KDE) to the empirical distribution of PAA coefficients (Bountrogiannis et al., 2021, Kloska et al., 2022). The process is:

Compute PAA vector $Y = [y_1,...,y_M]$ and re-normalize to unit variance to counteract PAA-induced shrinkage.
Fit KDE:

$\hat{f}_{h,K}(y) = \frac{1}{M h}\sum_{i=1}^{M} K\left(\frac{y - y_i}{h}\right)$

with kernel $K$ (commonly Epanechnikov or Gaussian) and bandwidth $h$ (Silverman's or ISJ rule).

Use Lloyd–Max quantization to compute optimal breakpoints $\{b_j\}$ and centroids $\{c_j\}$ to minimize

$J = \mathbb{E}[(Y - Q(Y))^2] = \sum_{j=1}^{\kappa} \int_{b_{j-1}}^{b_j} (y - c_j)^2 \hat{f}(y) dy$

Assign each $y_i$ to symbol $j$ for which $b_{j-1} \leq y_i < b_j$ .

This KDE-based discretization ensures that symbols represent truly equiprobable regions under the data's actual distribution, which reduces the information-loss penalty (KL divergence) incurred by falsely-assuming Gaussianity (Bountrogiannis et al., 2021, Kloska et al., 2022).

5. Algorithms and Implementation Details

A summary table illustrates the major algorithmic steps for the key Information-Weighted SAX methods:

Method	Key Step(s)	Optimization/Adaptation
Weighted SAX (WMD)	Compute info weights ( $\mathrm{Var}_i, H_i$ ); WMD	Weights by PSO ( $\mathbf{x}_p$ )
pSAX	KDE on PAA+varnorm; Lloyd-Max breakpoints/centroids	Data-driven quantization; batch
edwSAX	KDE on PAA (ISJ/Epanechnikov); CDF-inverse bpts	Numerically solve eq-mass splits

pSAX and edwSAX output both the symbolic word over $M$ positions and the discretization codebook (breakpoints, centroids), supporting both symbolic distances and RMSE-based reconstruction/analysis (Bountrogiannis et al., 2021, Kloska et al., 2022).

The computational complexity of these methods is dominated by the KDE and quantization phases: for $M$ PAA points, $O(M^2)$ naively, with $O(M \cdot H)$ feasible in practice for 1D KDE; Lloyd–Max iterations each cost $O(\kappa \cdot B)$ , $B$ being the histogram bin count (Bountrogiannis et al., 2021).

6. Evaluation: Lower-Bounding, Reconstruction, and Empirical Findings

Experimental results reported in multiple studies demonstrate:

Tightness of Lower Bound (TLB): pSAX and edwSAX achieve TLB values up to 0.98 for $\alpha=100$ (alphabet size), substantially exceeding classical SAX, with most gains realized up to $\alpha \approx 30$ (Kloska et al., 2022, Bountrogiannis et al., 2021).
Reconstruction Error: RMSE for information-weighted methods is lower than classical SAX. For example, edwSAX achieves RMSE $\approx0.20$ at $\alpha=10$ compared to $\approx0.24$ for SAX (Kloska et al., 2022). pSAX also minimizes MSE as per its quantizer design (Bountrogiannis et al., 2021).
Classification Performance: PSO-weighted WMD (information-weighted distance) consistently improves 1-NN accuracy on UCR datasets across varying $\alpha$ , with learned weights concentrating on salient or high-variance regions (Fuad, 2013).
Anomaly/Discord Discovery: Information-weighted SAX variants aid in faster discovery and more robust anomaly flagging compared to uniform SAX, as shown in HOT-SAX settings (Bountrogiannis et al., 2021).

7. Practical Considerations, Strengths, and Limitations

Information-weighted SAX approaches are modular: they can be integrated into existing SAX-based pipelines by either substituting the Gaussian breakpoints for KDE-derived (or Lloyd–Max-optimal) codebooks, or by replacing MINDIST with a segment-weighted WMD, often learned via PSO (Bountrogiannis et al., 2021, Kloska et al., 2022, Fuad, 2013).

Key strengths include:

Data-adaptive discretization for robust symbol partitioning under non-Gaussian, multimodal, or skewed distributions;
Task-specific optimization of segment contributions, improving classification, clustering, and anomaly detection performance;
Preservation of the lower-bounding guarantee necessary for subsequence matching and similarity search.

Limitations and considerations:

Learning and maintenance of segment weights (especially via PSO) requires labeled data and additional computational overhead, with hyperparameter sensitivity and possible overfitting.
Data-driven discretization (KDE, Lloyd-Max) entails higher computational cost vs. parametric breakpoints, especially for streaming or online updating, although efficient 1D KDE and change-detection-based retraining schemes can mitigate this (Kloska et al., 2022, Bountrogiannis et al., 2021).
If data are well-approximated by $N(0,1)$ , the extra complexity of information-weighted SAX offers minimal benefit.

In summary, Information-Weighted SAX encompasses methodologies that enhance the informativeness and discriminative power of symbolic time series representations by directly modeling empirical distributions and/or optimizing segment-level importance, thereby achieving superior lower-bound tightness, reduced information loss, and more effective symbolic analysis across a wide range of time series mining tasks (Bountrogiannis et al., 2021, Kloska et al., 2022, Fuad, 2013).