Hierarchical Scattering Transform Module

Updated 4 February 2026

HSTM is a structured, multi-layered framework designed to extract stable and invariant representations from signals using learnable filter banks.
It constructs hierarchical scattering paths that integrate fixed wavelet convolutions with parameterized filters to capture multi-scale and higher-order features.
Empirical studies show HSTM enhances performance in forecasting and classification by ensuring translation invariance and robustness to deformations.

A Hierarchical Scattering Transform Module (HSTM) is a structured, multi-layered mathematical framework designed to extract stable, invariant, and highly informative representations from input signals. Originally developed within the context of scattering networks and later adapted for time series forecasting and unsupervised learning on images and graphs, HSTM generalizes classical wavelet scattering transforms by introducing learnable filter banks, flexible cascade depths, and, in some formulations, unsupervised combinatorial pairing. Characterized by its provable translation invariance, stability to deformations, and explicit multi-scale feature construction, HSTM forms a foundational component in advanced forecasting and classification pipelines (Li, 28 Jan 2026, Cheng et al., 2015).

1. Core Principles and Motivation

HSTM is motivated by the need to analyze signals—temporal, spatial, or on graphs—in a manner that robustly encodes both fine-grained (high-frequency) and slow (low-frequency) patterns while being resilient to local time-warp or permutation and invariant to translation or position. For forecasting, this enables predictive models to recognize recurring motifs independent of their absolute location and to maintain stability under small deformations, crucial for nonstationary real-world time series (Li, 28 Jan 2026).

Standard wavelet scattering transforms accomplish these objectives using cascades of fixed wavelet convolutions, modulus nonlinearities, and low-pass averaging, yielding representations with provable invariance and Lipschitz stability [Mallat’12]. HSTM extends these constructions in two principal directions:

Building hierarchical (multi-order) scattering paths to capture higher-order, scale-interactive features.
Replacing fixed (analytic) wavelet filters with learnable, parameterized filter banks to adapt representations to dataset-specific structures.

2. Mathematical Framework

For a one-dimensional time series $x(t)$ or a $d$ -dimensional vector $x$ , HSTM proceeds through cascaded operations parameterized by the maximum scale $J$ and the order depth $M$ . The primary operations are:

Low-Pass Averaging (Order-0):

$S^{(0)}[x](t) = (x * \varphi_J)(t)$

where $\varphi_J$ is a low-pass filter with support $2^J$ .

First-Order Wavelet Modulus and Averaging:

$U^{(1)}[x](j_1, t) = \left| (x * \psi_{j_1,\theta})(t) \right|$

$S^{(1)}[x](j_1, t) = U^{(1)}[x](j_1, \cdot) * \varphi_J(t)$

with $\psi_{j,\theta} = \psi_j * g_\theta$ , and $g_\theta$ a learnable convolutional residual.

Higher-Order Recursion:

$U^{(m)}[x](j_1 \ldots j_m, t) = \left| U^{(m-1)}[x](j_1 \ldots j_{m-1}, \cdot) * \psi_{j_m,\theta} \right|(t)$

$S^{(m)}[x](j_1 \ldots j_m, t) = U^{(m)}[x](j_1 \ldots j_m, \cdot) * \varphi_J(t)$

The collection $S[x] = \{ S^{(0)}[x],\,S^{(1)}[x](j_1),\, …,\,S^{(M)}[x](j_1\ldots j_M) \}$ comprises the full hierarchical scattering representation (Li, 28 Jan 2026).

In discrete settings, as in Haar-based HSTM architectures, hierarchical layers are created by recursive pairwise addition and absolute difference, with unsupervised data-driven pairings optimized to maximize feature variance or minimize within-pair variation (Cheng et al., 2015).

3. Architectural Implementation

The practical instantiation of HSTM depends on the application domain:

Wavelet-Based HSTM (for Time Series):
- Uses Morlet-like band-pass filters at dyadic scales $2^j$ for $j=1,\ldots,J$ .
- Learnable residual filters $g_\theta$ enrich each wavelet, with $K \ll 2^j$ filter length.
- Depth is typically set to $M=2$ , so both first- and second-order scattering paths are realized.
- Channel-wise batch normalization is applied before modulus; no additional pooling beyond the final averaging and optional subsampling.
- The outputs are feature maps $H_0, H_{j_1}, H_{j_1,j_2}$ of reduced and variable temporal resolution, concatenated into a multi-scale tensor for downstream analysis (Li, 28 Jan 2026).
Haar-Based HSTM (for Images and Graphs):
- Input $x$ is partitioned into disjoint pairs at each layer, producing local sums and absolute differences.
- Pair assignment is optimized unsupervised (via min-weight perfect matching) to maximize spread or minimize sparsity of output coefficients.
- The mapping is provably contractive, and for graph data, pairings are restricted to topologically-adjacent clusters, yielding permutation-invariant cluster descriptors.
- The depth $J$ and scale of neighbor pairings are domain-adapted (Cheng et al., 2015).

4. Invariance and Stability Properties

HSTM constructions guarantee:

Translation Invariance: Large-support averaging impart invariance to translations up to $O(2^{-J})$ error.
Stability to Time-Warping: Lipschitz continuity of the scattering cascade preserves stability to small deformations (for $x_\tau(t) = x(t-\tau(t))$ ),

$\| S[x] - S[x_\tau] \|_2 \leq C \cdot \sup |\tau'(t)| \cdot \|x\|_2$

Permutation Invariance (Graphs): For input signals on graphs, structured pairing provides permutation-invariant descriptors of each connected cluster at every layer (Cheng et al., 2015).
Contractivity: Each layer contracts distances by no more than a constant factor (typically $\sqrt{2}$ for Haar case), ensuring robustness to perturbations.

5. Integration with Forecasting and Attention Frameworks

In ScatterFusion, HSTM serves as the core representation learning block. The multi-scale features $H_j$ produced by HSTM are subsequently processed by the Scale-Adaptive Feature Enhancement (SAFE) module, which applies per-scale soft attention weighting. Scale-specific context vectors $h_j$ are formed, and their weighted sum is forecast-horizon-adapted before entering the Multi-Resolution Temporal Attention (MRTA) mechanism. MRTA further processes these features by learning dependencies at multiple time horizons and merging information across temporal resolutions (Li, 28 Jan 2026).

This modular pipeline enables end-to-end learning where the HSTM is fully differentiable thanks to its learnable filters. The decomposition into hierarchical scattering paths, followed by adaptive scale selection and multi-resolution attention, is crucial for handling both short-term and global dynamics in forecasting tasks.

6. Empirical Effectiveness and Ablation Results

Ablation studies in ScatterFusion demonstrate that replacing the full hierarchical, learnable HSTM with a standard (fixed, no second-order) wavelet scattering module leads to marked degradation in mean squared error on canonical long-term forecasting benchmarks. For example, the increase in MSE is reported at +5.5% (ECL@96), +7.6% (Weather@96), and +7.1% (ETTh1@336) when HSTM is ablated (Li, 28 Jan 2026). This establishes the hierarchical and learnable aspects as primary contributors to improved performance for time series forecasting.

7. Extensions: Unsupervised Learning and Graph Domains

HSTM generalizes to domains beyond temporal sequences:

On images, HSTM implements a hierarchy of local nonlinear contractions (sum and absolute difference), driven by approximate unsupervised optimal pairings to enhance discriminative features and suppress noise (Cheng et al., 2015).
On graphs, the structured dyadic pairing of node clusters results in descriptors invariant to permutation of vertices within connected components, with theoretical guarantees on stability and invariance extending from the Euclidean case to arbitrary graph topologies.

Structured pseudocode is provided for implementing HSTM on both regular grids and general graphs, with explicit mention of computational complexity (perfect matching via Edmonds' algorithm in $O(d^3)$ , $d$ being the input dimensionality). A greedy $O(d^2)$ pairing offers a practical approximation.

References:

ScatterFusion: A Hierarchical Scattering Transform Framework for Enhanced Time Series Forecasting (Li, 28 Jan 2026)
Deep Haar Scattering Networks (Cheng et al., 2015)

Markdown Report Issue Upgrade to Chat

References (2)

ScatterFusion: A Hierarchical Scattering Transform Framework for Enhanced Time Series Forecasting (2026)

Deep Haar Scattering Networks (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Scattering Transform Module (HSTM).