Papers
Topics
Authors
Recent
Search
2000 character limit reached

Selective Temporal Hamming Distance (STH)

Updated 8 December 2025
  • Selective Temporal Hamming Distance (STH) is a metric for discrete event systems that compares state-transition event timeseries while emphasizing specific states and preserving temporal details.
  • It computes similarity by weighting intervals where at least one series is in a state of interest, thereby generalizing classical Hamming and Jaccard metrics to continuous time.
  • Empirical evaluations demonstrate that STH achieves significant speedups and clearer clustering in real-world scenarios such as weather event analysis and sleep-stage annotation.

Selective Temporal Hamming Distance (STH) is a metric designed to compare time series generated by discrete event systems, with a focus on state transitions and event timing. Unlike standard methods that either reduce temporal information via costly resampling or treat event sequences without accounting for state durations, STH operates directly on state-transition event timeseries (STE-ts) and enables selective emphasis on subsets of states while avoiding distortion and inefficiency.

1. Foundations and Notation

STH is formulated for discrete event systems (DES) where the state space S={s1,…,s∣S∣}S = \{s_1, \ldots, s_{|S|}\} defines all possible system states, and T⊆S×ST \subseteq S \times S enumerates allowed state-changing transitions. A state-transition event timeseries (STE-ts) is represented as S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n)), where each ok∈So_k \in S is maintained over the interval [tk,tk+1)[t_k, t_{k+1}) and transitions are instantaneous, with the condition ok≠ok+1o_k \ne o_{k+1}. When comparing two STE-ts Si,SjS_i, S_j spanning [t0,tend][t_0, t_{end}], the merged set of change-points produces a partition of consecutive, disjoint intervals Jij={I1,I2,...,IM}J_{ij} = \{I_1, I_2, ..., I_M\} with Ik=[τk,τk+1)I_k = [\tau_k, \tau_{k+1}), duration T⊆S×ST \subseteq S \times S0, and states T⊆S×ST \subseteq S \times S1 active over T⊆S×ST \subseteq S \times S2 for T⊆S×ST \subseteq S \times S3 respectively.

States are further partitioned as follows: T⊆S×ST \subseteq S \times S4 ("states of interest"), T⊆S×ST \subseteq S \times S5 ("other" states), and T⊆S×ST \subseteq S \times S6 ("excluded"/ambiguous states). A state-similarity function T⊆S×ST \subseteq S \times S7 (defaulting to identity) is used for interval-wise comparison.

2. Formal Definition and Properties

2.1 STH Similarity and Distance

STH restricts attention to intervals where at least one series is in T⊆S×ST \subseteq S \times S8 and neither is in T⊆S×ST \subseteq S \times S9. For each interval S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))0:

  • S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))1 if S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))2 and S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))3 and S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))4, else S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))5.
  • S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))6.

Similarity is calculated as:

S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))7

If S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))8, STH is undefined or set to S=((t0,o0),(t1,o1),…,(tn,on))S = ((t_0, o_0), (t_1, o_1), \ldots, (t_n, o_n))9 by application-dependent convention.

The associated distance is:

ok∈So_k \in S0

2.2 Relationship to Hamming and Jaccard Metrics

Setting ok∈So_k \in S1 and ok∈So_k \in S2, STH reduces to the normalized temporal Hamming distance (nTHD). In binary state systems with ok∈So_k \in S3, ok∈So_k \in S4, ok∈So_k \in S5 and ok∈So_k \in S6 as the identity, STH coincides with temporal Jaccard similarity/distance, generalizing the static Jaccard index to continuous time (Marié et al., 1 Dec 2025).

3. Algorithmic Structure and Computational Complexity

The algorithm for STH iterates over all intervals induced by the union of change-points in ok∈So_k \in S7, computing the numerator and denominator accumulators according to the definitions above. Key steps:

  • Merge change-points, sort, and produce ok∈So_k \in S8.
  • For ok∈So_k \in S9, determine [tk,tk+1)[t_k, t_{k+1})0, [tk,tk+1)[t_k, t_{k+1})1 for current interval, update numerator/denominator subject to [tk,tk+1)[t_k, t_{k+1})2 filters.

This yields an overall time complexity of [tk,tk+1)[t_k, t_{k+1})3, where [tk,tk+1)[t_k, t_{k+1})4 are the event counts in [tk,tk+1)[t_k, t_{k+1})5. In comparison, uniform resampling methods operate at [tk,tk+1)[t_k, t_{k+1})6 ([tk,tk+1)[t_k, t_{k+1})7 = span, [tk,tk+1)[t_k, t_{k+1})8 = frequency), incurring a rate-dependent precision/speed trade-off and often distortion (Marié et al., 1 Dec 2025).

4. Theoretical Properties

STH shows desirable metric qualities under mild conditions ([tk,tk+1)[t_k, t_{k+1})9, ok≠ok+1o_k \ne o_{k+1}0):

  • Non-negativity: ok≠ok+1o_k \ne o_{k+1}1 since ok≠ok+1o_k \ne o_{k+1}2.
  • Symmetry: ok≠ok+1o_k \ne o_{k+1}3.
  • Identity of indiscernibles: ok≠ok+1o_k \ne o_{k+1}4 all qualifying intervals have equal states.
  • Triangle inequality: Satisfied under the stated conditions (proof in [36]).

STH generalizes classical Hamming and Jaccard: with ok≠ok+1o_k \ne o_{k+1}5 (all states of interest), STHD is the continuous-time analogue of normalized Hamming distance in the limit of infinitesimal resampling; for binary systems and restricted ok≠ok+1o_k \ne o_{k+1}6, it recovers temporal Jaccard distance.

5. Practical Application and Empirical Validation

Empirical results demonstrate substantial advantages of STH for pattern mining and clustering in large-scale, non-uniform time series data from diverse domains (Marié et al., 1 Dec 2025).

5.1 Computational Performance

STH achieves speedups from ok≠ok+1o_k \ne o_{k+1}7 to ok≠ok+1o_k \ne o_{k+1}8 over 5-minute resampled normalized Hamming, and up to ok≠ok+1o_k \ne o_{k+1}9 with decreasing resampling periods (30-day random binary series). This is directly attributed to the linear complexity and avoidance of temporal discretization.

5.2 Avoidance of Temporal Distortion

On periodic series with phase shift, STHD returns the exact normalized distance (e.g., Si,SjS_i, S_j0 for Si,SjS_i, S_j1 period shift), whereas resampled metrics may exhibit severe bias, including failure to detect distortion at certain rates.

5.3 Clustering and State Selection

Applications include weather event clustering (US dataset, 8 states) and sleep-stage annotation (W,1,2,3,R,?,E):

  • STH with Si,SjS_i, S_j2 yields large, undifferentiated clusters.
  • Excluding "Normal" clarifies geographical variation in abnormal events.
  • Focusing Si,SjS_i, S_j3 isolates winter-heavy regions.
  • For sleep-stage, STH enables cleaner patient clustering by ignoring ambiguous intervals (Si,SjS_i, S_j4), outperforming temporal Jaccard in handling missing data.

6. Example Computation

Consider two STE-ts over Si,SjS_i, S_j5:

  • Si,SjS_i, S_j6: Si,SjS_i, S_j7, resulting intervals Si,SjS_i, S_j8.
  • Si,SjS_i, S_j9: [t0,tend][t_0, t_{end}]0, intervals [t0,tend][t_0, t_{end}]1.

Merged change-points produce intervals: [t0,tend][t_0, t_{end}]2.

Case A ([t0,tend][t_0, t_{end}]3):

  • Matches on [t0,tend][t_0, t_{end}]4 and [t0,tend][t_0, t_{end}]5 (both in A); mismatch elsewhere.
  • STH = [t0,tend][t_0, t_{end}]6, STHD = [t0,tend][t_0, t_{end}]7.

Case B ([t0,tend][t_0, t_{end}]8):

  • Only intervals with [t0,tend][t_0, t_{end}]9 in at least one series considered: Jij={I1,I2,...,IM}J_{ij} = \{I_1, I_2, ..., I_M\}0.
  • STH = Jij={I1,I2,...,IM}J_{ij} = \{I_1, I_2, ..., I_M\}1, STHD = Jij={I1,I2,...,IM}J_{ij} = \{I_1, I_2, ..., I_M\}2.

7. Utility and Broader Significance

Selective Temporal Hamming Distance supports mathematically rigorous comparison of state-transition event timeseries without resampling, integrating full state durations and selective focus on relevant states or exclusion criteria. Generalization to classical Hamming and Jaccard metrics in the continuous-time domain and proven metric properties enable direct use in clustering, kernel methods, and scalable nearest-neighbor search. Robust empirical validation confirms improved precision and efficiency, facilitating large-scale analysis across domains such as weather events and clinical annotations (Marié et al., 1 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Selective Temporal Hamming Distance (STH).