Dynamic Dual-Prototype Bank (DDP)
- DDP is a module that leverages dual prototype banks to disentangle and represent both common trends and rare anomalies in time series data.
- It uses a dual-path context-aware routing mechanism with Pearson correlation for selecting and aggregating prototypes effectively.
- By integrating specialized loss functions for separation, rarity, and diversity, DDP enhances forecasting accuracy and model adaptivity.
The Dynamic Dual-Prototype Bank (DDP) is a module introduced to enable context-aware pattern disentanglement and adaptive representation for time series forecasting. It operates by maintaining two learnable banks of prototypes: one specialized in capturing common, recurrent patterns (such as trends or seasonalities), and another engineered to dynamically store rare, critical events. By leveraging a dual-path context-aware routing mechanism and a tailored disentanlgement-guided loss, DDP equips backbone forecasting models with the capacity to distinguish, recall, and utilize both prevailing and infrequent temporal behaviors for improved predictive performance (Yang et al., 23 Jan 2026).
1. Architecture and Bank Construction
The DDP comprises two distinct learnable sets:
- Common Pattern Bank (): Contains prototypes intended to represent stable, high-frequency modes of the data, including trends and periodic behaviors.
- Rare Pattern Bank (): Contains prototypes designated to encode irregular, low-frequency, or anomalous events.
Each prototype is a -dimensional vector in a latent space. Initialization is performed as follows:
where are linear, RBF (radial basis function), and periodic kernels, with as mixing weights.
These sequences are mapped to embeddings:
Forming banks:
Both banks are updated end-to-end via gradient descent according to the total loss function; common prototypes specialize toward stable patterns, rare prototypes adapt to infrequent events.
2. Dual-Path Context-Aware Routing (DPC) Mechanism
DPC is the retrieval and routing component operating at inference and training time. It is responsible for context-selective integration of prototypes with backbone model representations. Given a time series input and its latent encoding :
2.1 Similarity Computation
Pearson correlation is computed between and each prototype in both banks:
2.2 Prototype Selection
- Common Path: Selects the top- most similar common prototypes,
- Rare Path: Picks the single most similar rare prototype if similarity exceeds threshold ,
2.3 Weighting and Aggregation
Weighted sum for common () and one-hot for rare ():
Contribution vectors:
Fusion: with a trainable projection.
3. Disentanglement-Guided Loss (DGLoss)
Supervision is enforced using a composite loss:
- Mean Squared Error (): Forecast regression signal.
- Separation Loss (): Encourages dissimilarity between top matches of common and rare banks, facilitating role separation. For and empirical pattern-frequency weight , with margin :
- Rarity Preservation Loss (): Promotes distinctiveness of rare prototypes via log-softmax over activated similarities:
- Common Diversity Loss (): Ensures diversity among common prototypes:
4. Algorithmic Flow
The following succinctly summarizes the training loop:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Input: Historical series X, target Y, backbone f, banks B_c, B_r, hyperparams (K, τ, ε,…)
Output: Trained backbone + banks
for each training batch (X, Y):
h ← f(X)
ρ_c ← PearsonCorr(X, B_c)
ρ_r ← PearsonCorr(X, B_r)
I_c ← TopK(ρ_c, K)
I_r ← argmax_j ρ_r[j] if max(ρ_r)>ε else ∅
ω_c ← softmax(ρ_c[I_c]/τ)
ω_r ← one_hot(I_r, length=N)
z_c ← Σ_{k∈I_c} ω_c[k]·B_c[k]
z_r ← Σ_{j∈I_r} ω_r[j]·B_r[j]
Ŷ ← W_o([h ∥ z_c ∥ z_r])
L_MSE ← mean((Ŷ−Y)²)
L_sep ← compute_separation_loss(ρ_c, ρ_r, ω_batch, m)
L_rare← compute_rarity_loss(ρ_r, I_r)
L_div ← compute_diversity_loss(B_c)
L_DGL ← λ_sep·L_sep + λ_rare·L_rare + λ_div·L_div
L ← L_MSE + L_DGL
Backpropagate L, update f, W_o, B_c, B_r |
5. Operational Example
Consider a univariate input series exhibiting a smooth trend and an end-of-series spike:
- Common Path: The DPC aligns the trend portion of to , selecting prototypes that best resemble the backbone trend, weighted by shape similarity, forming .
- Rare Path: The abrupt spike produces high correlation with a rare prototype in ; if above threshold, that prototype is activated, and encodes this anomaly.
- The concatenation is projected to the final forecast, integrating smooth and abrupt event motifs for adaptivity and reliability in output.
6. Theoretical and Practical Significance
DDP is model-agnostic and auxiliary; it does not require modification of the backbone forecasting architecture. It enables “pattern disentanglement and context-aware adaptation,” equipping models with the ability to represent both dataset-specific stable behaviors and infrequent, critical events. The end-to-end learnability, explicit specialization of banks, and use of dedicated loss terms ensure robust coverage of diverse temporal regimes. Empirically, this structure has demonstrated consistent improvements in predictive accuracy and reliability across various real-world benchmarks, reflecting enhanced utilization of both frequent and rare structural cues in sequential data (Yang et al., 23 Jan 2026).