Horizon Activation Mapping for Forecasting
- Horizon Activation Mapping (HAM) is a dual-method approach combining gradient-based interpretability and activation tuning to optimize neural forecasting.
- HAM uses gradient norm curves and activation curvature metrics to reveal model bias toward short- or long-range dependencies in forecast subseries.
- HAM facilitates architecture selection and hyperparameter tuning by mapping activation curvature and state entropy to extend forecast horizon persistence.
Horizon Activation Mapping (HAM) is both a quantitative interpretability framework for neural forecasting models and a principled algorithmic tool for optimizing nonlinear architectures in time series prediction. The term encompasses two distinct but thematically unified methodological streams: (1) a gradient-based interpretability approach for delineating which forecast horizon subseries most influence parameter updates in general neural forecasting architectures (Krupakar et al., 5 Jan 2026), and (2) a systematic recipe for tuning node activation functions in reservoir computers to maximize the time interval over which accurate predictions are possible, via curvature and entropy metrics (Hurley et al., 2023). Both approaches resolve the key challenge of understanding and shaping how neural models distribute computational “effort” over prediction time, facilitating both architecture selection and hyperparameter optimization.
1. Defining Horizon Activation Mapping in Forecasting
In model-agnostic interpretability, HAM formalizes the attribution of parameter update magnitudes to temporal forecast subseries, replacing the spatial attention maps of Grad-CAM with temporal “horizon subseries” (Krupakar et al., 5 Jan 2026). In each mode (causal or anti-causal), HAM computes, for every possible subseries cutoff, the aggregate gradient norm of the masked loss with respect to parameters. The result is a curve , which expresses how much each early (causal) or late (anti-causal) segment of the horizon influences effective training. Comparing these curves to lines of proportionality (ideal uniform contributions across the horizon) exposes model bias toward short- or long-range dependencies.
In reservoir computing, HAM refers to the workflow of choosing and tuning activation functions (e.g., Swish(β), Shifted tanh(b), or a library of 16 canonical choices) to optimize the forecast horizon (FH)—the dynamical timescale for which reservoir predictions retain fidelity on chaotic benchmarks such as the Lorenz attractor. The method quantifies how activation curvature and state entropy (ASE) jointly influence FH, enabling principled exploration of activation parameter space to maximize predictive stability (Hurley et al., 2023).
2. Mathematical Formulation and Methodology
A. HAM in General Neural Forecasting
Let be a forecasting model with horizon and per-timestep loss , so that
Define binary masks and for causal and anti-causal modes. The masked subseries loss is
and the key metric is the mean gradient norm over examples,
The “line of proportionality” is a scaled baseline:
where is the maximal observed gradient norm. Deviations between and reflect model architecture or training regime bias.
Auxiliary analyses include the “gradient-equivariant point” where (demarcating equal investment in early vs late subseries) and the signed, normalized difference plot
which emphasizes short- vs long-horizon dominance.
B. HAM in Reservoir Computer Optimization
The reservoir FH is mathematically defined as
with as the Lyapunov time and typically set to $5$ (15% of the Lorenz -range). For each activation function , the weighted curvature is
with
and the empirical node input distribution. Average State Entropy (ASE) is computed by averaging the instantaneous entropy (via Gaussian kernel density estimates) over all time steps preceding the FH error crossing.
3. Empirical Outcomes and Benchmark Comparisons
A. Gradient-based HAM Across Neural Families
HAM was applied to a diverse suite of modern multivariate forecasters (MLP-based CycleNet, N-Linear, N-HITS; self-attention-based FEDformer, Pyraformer; SSM-based SpaceTime; diffusion-based Multi-Resolution DDPM) on the ETTm2 dataset with forecast horizons (Krupakar et al., 5 Jan 2026).
Observed phenomena include:
- Dropout in N-HITS significantly amplifies overall and late-horizon gradient norms.
- Larger batch sizes increase magnitudes, realign gradient curves toward proportionality, and shift attention bias toward early subseries.
- Early-stopped models show overall dampened gradients and loss of strong horizon bias, with the equivariant point close to .
- Normalization in N-Linear modulates early- vs late-horizon gradient allocation.
- In SpaceTime, longer forecast horizons transition G_c(h) from linear to exponential due to state-space model dynamics.
B. Activation-Driven HAM in Reservoir Computers
A systematic survey of 16 activation functions (7 non-monotonic, 9 monotonic) found notable FH differences:
| Function | Type | FH (N=300) |
|---|---|---|
| Logish | Non-monotonic | |
| Swish() | Non-monotonic | |
| Shifted tanh() | Monotonic | |
| Hard-tanh | Monotonic | |
| Hard-Sigmoid | Monotonic |
FH increases monotonically with activation curvature for Swish(β) and generally across the library. Maximum FH is found at intermediate ASE; excessively high or low state entropy leads to shorter horizons (Hurley et al., 2023).
4. Interpretation and Diagnostic Utilities
HAM visualizations provide several crucial diagnostics:
- Curves above the proportional baseline indicate subseries of heightened parameter-update sensitivity; below indicates underweighting.
- The signed area between and quantifies net attention bias to short/long subseries.
- The difference plot enables scale-free comparisons, e.g., for cross-family benchmarking in model selection.
- The equivariant point immediately signals the horizon regime where model focus transitions from early to late.
In reservoir networks, rapid quantitative mapping of FH over (activation parameter, reservoir size) space exposes narrow “sweet spots” where chosen activation and entropy/curvature metrics lead to substantial predictive persistence.
5. Practical Application and Optimization Workflow
A. Interpretability and Selection
HAM facilitates model-agnostic selection on the validation set by identifying which architectures exhibit the desired gradient attenuation or persistence across the forecast horizon. For long-term planning tasks, models with persistent and post- are preferred (Krupakar et al., 5 Jan 2026).
Batch size and early-stopping can be tuned using HAM to avoid regimes where all gradient curves merely scale or collapse, which would indicate diminished learning signal differentiation across the horizon.
B. Reservoir Computer Optimization
The HAM guideline for activation tuning is:
- Set target FH and select system (e.g., Lorenz).
- Choose activation family or fixed function library.
- For each candidate and parameter, train and assess FH, , ASE.
- Map and locate maximal ridges.
- Ensure (Lorenz) and ASE in the intermediate regime.
- Fix optimal activation, re-tune other hyperparameters if desired.
This approach reduces the high-dimensional search over activation functions to a low-dimensional manifold corresponding to empirically verified “sweet spots” of forecast persistence (Hurley et al., 2023).
6. Limitations and Extensions
HAM is restricted to aggregate gradient norm information and does not resolve the contributions of specific parameters or internal layers, although layer-wise extensions are plausible. Computational cost scales as backpropagation steps per dataset epoch, implying a need for efficient approximations or subsampling for very large datasets or horizons. In reservoir computing, results have benchmark specificity (e.g., Lorenz system), but the methodology is transferable given calibration (Hurley et al., 2023). For general neural models, future directions include layer-wise HAM, integration with probabilistic forecast diagnostics, and combinations with internal attention-based interpretations.
A plausible implication is that by combining HAM’s diagnostic metrics with traditional validation losses, practitioners can achieve more robust architecture design and tuning for tasks with variable forecast horizon requirements, and cross-family benchmarking is enhanced by the scale-free normalization afforded by difference plots.
7. Significance in Time Series Forecasting Research
Horizon Activation Mapping synthesizes architectural interpretability and principled hyperparameter tuning into a unified analytic toolkit. It underpins both model-agnostic comparative evaluation and activation function selection—core processes in long-horizon and multivariate time-series forecasting. Both variants of HAM have demonstrated capability to extract actionable insight from models as diverse as MLPs, SSMs, attention models, and diffusion forecasters, and to expose nontrivial dependencies of predictive longevity on both low-level nonlinearities and high-level architectural design (Krupakar et al., 5 Jan 2026, Hurley et al., 2023). The formalization of horizon-oriented mapping thus represents a significant advance in the analysis and optimization of temporal learning systems.