Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Dual-Prototype Bank (DDP)

Updated 27 January 2026
  • DDP is a module that leverages dual prototype banks to disentangle and represent both common trends and rare anomalies in time series data.
  • It uses a dual-path context-aware routing mechanism with Pearson correlation for selecting and aggregating prototypes effectively.
  • By integrating specialized loss functions for separation, rarity, and diversity, DDP enhances forecasting accuracy and model adaptivity.

The Dynamic Dual-Prototype Bank (DDP) is a module introduced to enable context-aware pattern disentanglement and adaptive representation for time series forecasting. It operates by maintaining two learnable banks of prototypes: one specialized in capturing common, recurrent patterns (such as trends or seasonalities), and another engineered to dynamically store rare, critical events. By leveraging a dual-path context-aware routing mechanism and a tailored disentanlgement-guided loss, DDP equips backbone forecasting models with the capacity to distinguish, recall, and utilize both prevailing and infrequent temporal behaviors for improved predictive performance (Yang et al., 23 Jan 2026).

1. Architecture and Bank Construction

The DDP comprises two distinct learnable sets:

  • Common Pattern Bank (Bc\mathcal{B}_c): Contains MM prototypes intended to represent stable, high-frequency modes of the data, including trends and periodic behaviors.
  • Rare Pattern Bank (Br\mathcal{B}_r): Contains NN prototypes designated to encode irregular, low-frequency, or anomalous events.

Each prototype is a DD-dimensional vector in a latent space. Initialization is performed as follows:

sciGP(0,λK+λrKr+λpKp)(i=1,...,M), srjN(0,σ2I)(j=1,...,N)\begin{aligned} \mathbf{s}_c^i &\sim \mathcal{GP}\left(0,\,\lambda_{\ell} K_{\ell} + \lambda_{r} K_{r} + \lambda_{p} K_{p}\right) \quad (i = 1, ..., M), \ \mathbf{s}_r^j &\sim \mathcal{N}(0, \sigma^2 I) \quad (j = 1, ..., N) \end{aligned}

where K,Kr,KpK_{\ell}, K_{r}, K_{p} are linear, RBF (radial basis function), and periodic kernels, with λ\lambda as mixing weights.

These sequences are mapped to embeddings: pci=Projc(sci)RD,prj=Projr(srj)RD\mathbf{p}_c^i = \mathrm{Proj}_c(\mathbf{s}_c^i) \in \mathbb{R}^D, \quad \mathbf{p}_r^j = \mathrm{Proj}_r(\mathbf{s}_r^j) \in \mathbb{R}^D

Forming banks: Bc={pc1,...,pcM},Br={pr1,...,prN}\mathcal{B}_c = \{\mathbf{p}_c^1, ..., \mathbf{p}_c^M\}, \quad \mathcal{B}_r = \{\mathbf{p}_r^1, ..., \mathbf{p}_r^N\}

Both banks are updated end-to-end via gradient descent according to the total loss function; common prototypes specialize toward stable patterns, rare prototypes adapt to infrequent events.

2. Dual-Path Context-Aware Routing (DPC) Mechanism

DPC is the retrieval and routing component operating at inference and training time. It is responsible for context-selective integration of prototypes with backbone model representations. Given a time series input XRt×C\mathbf{X} \in \mathbb{R}^{t \times C} and its latent encoding hRD\mathbf{h} \in \mathbb{R}^D:

2.1 Similarity Computation

Pearson correlation is computed between X\mathbf{X} and each prototype in both banks: ρc=s(X,Bc)RM ρr=s(X,Br)RN s(X,p)=(XX)(pp)XX  pp\begin{aligned} \boldsymbol{\rho}_c &= s(\mathbf{X}, \mathcal{B}_c) \in \mathbb{R}^M \ \boldsymbol{\rho}_r &= s(\mathbf{X}, \mathcal{B}_r) \in \mathbb{R}^N \ s(\mathbf{X}, \mathbf{p}) &= \frac{(\mathbf{X} - \overline{\mathbf{X}}) \cdot (\mathbf{p} - \overline{p})}{\|\mathbf{X} - \overline{\mathbf{X}}\|\; \|\mathbf{p} - \overline{p}\|} \end{aligned}

2.2 Prototype Selection

  • Common Path: Selects the top-KK most similar common prototypes,

Ic=TopK(ρc;K)\mathcal{I}_c = \mathrm{TopK}(\boldsymbol{\rho}_c; K)

  • Rare Path: Picks the single most similar rare prototype if similarity exceeds threshold ε\varepsilon,

Ir={argmaxjρr,j,maxjρr,j>ε ,otherwise\mathcal{I}_r = \begin{cases} \arg\max_j \rho_{r,j}, & \max_j \rho_{r,j} > \varepsilon \ \varnothing, & \text{otherwise} \end{cases}

2.3 Weighting and Aggregation

Weighted sum for common (ωc\omega_c) and one-hot for rare (ωr\omega_r): ωc=Softmax(ρc[Ic]τ),ωr{0,1}N with 1 at Ir\omega_c = \mathrm{Softmax}\left(\tfrac{\boldsymbol{\rho}_c[\mathcal{I}_c]}{\tau}\right), \quad \omega_r \in \{0, 1\}^N \text{ with } 1 \text{ at } \mathcal{I}_r

Contribution vectors: zc=kIcωckpck zr=jIrωrjprj\begin{aligned} \mathbf{z}_c &= \sum_{k \in \mathcal{I}_c} \omega_c^k \, \mathbf{p}_c^k \ \mathbf{z}_r &= \sum_{j \in \mathcal{I}_r} \omega_r^j \, \mathbf{p}_r^j \end{aligned}

Fusion: Y^=Wo[hzczr]\widehat{\mathbf{Y}} = \mathbf{W}_o[\mathbf{h} \, \| \, \mathbf{z}_c \, \| \, \mathbf{z}_r] with Wo\mathbf{W}_o a trainable projection.

3. Disentanglement-Guided Loss (DGLoss)

Supervision is enforced using a composite loss: L=LMSE+λsepLsep+λrareLrare+λdivLdiv\mathcal{L} = \mathcal{L}_{\mathrm{MSE}} + \lambda_{\mathrm{sep}} \mathcal{L}_{\mathrm{sep}} + \lambda_{\mathrm{rare}} \mathcal{L}_{\mathrm{rare}} + \lambda_{\mathrm{div}} \mathcal{L}_{\mathrm{div}}

  • Mean Squared Error (LMSE\mathcal{L}_{\mathrm{MSE}}): Forecast regression signal.
  • Separation Loss (Lsep\mathcal{L}_{\mathrm{sep}}): Encourages dissimilarity between top matches of common and rare banks, facilitating role separation. For Δρ=ρcmaxρrmax\Delta\rho = \rho_c^{\max} - \rho_r^{\max} and empirical pattern-frequency weight ω\omega, with margin mm:

Lsep=E[ωmax(0,mΔρ)+(1ω)max(0,m+Δρ)]\mathcal{L}_{\mathrm{sep}} = \mathbb{E}\left[\omega \max(0, m - \Delta\rho) + (1 - \omega) \max(0, m + \Delta\rho)\right]

  • Rarity Preservation Loss (Lrare\mathcal{L}_{\mathrm{rare}}): Promotes distinctiveness of rare prototypes via log-softmax over activated similarities:

Lrare=1AkAlogexp(skkτ)j=1Nexp(skjτ)\mathcal{L}_{\mathrm{rare}} = -\frac{1}{|\mathcal{A}|} \sum_{k \in \mathcal{A}} \log\frac {\exp(\tfrac{s_{kk}}{\tau})} {\sum_{j=1}^N \exp(\tfrac{s_{kj}}{\tau})}

Ldiv=1M(M1)i=1Mji(pcipcjpcipcj)2\mathcal{L}_{\mathrm{div}} = \frac{1}{M(M-1)} \sum_{i=1}^M\sum_{j \neq i} \left( \frac{\mathbf{p}_c^i{}^\top \mathbf{p}_c^j}{\|\mathbf{p}_c^i\| \|\mathbf{p}_c^j\|} \right)^2

4. Algorithmic Flow

The following succinctly summarizes the training loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Input: Historical series X, target Y, backbone f, banks B_c, B_r, hyperparams (K, τ, ε,…)
Output: Trained backbone + banks

for each training batch (X, Y):
  h ← f(X)
  ρ_c ← PearsonCorr(X, B_c)
  ρ_r ← PearsonCorr(X, B_r)
  I_c ← TopK(ρ_c, K)
  I_r ← argmax_j ρ_r[j] if max(ρ_r)>ε else ∅
  ω_c ← softmax(ρ_c[I_c]/τ)
  ω_r ← one_hot(I_r, length=N)
  z_c ← Σ_{k∈I_c} ω_c[k]·B_c[k]
  z_r ← Σ_{j∈I_r} ω_r[j]·B_r[j]
  Ŷ ← W_o([h ∥ z_c ∥ z_r])
  L_MSE ← mean((Ŷ−Y)²)
  L_sep ← compute_separation_loss(ρ_c, ρ_r, ω_batch, m)
  L_rare← compute_rarity_loss(ρ_r, I_r)
  L_div ← compute_diversity_loss(B_c)
  L_DGL ← λ_sep·L_sep + λ_rare·L_rare + λ_div·L_div
  L     ← L_MSE + L_DGL
  Backpropagate L, update f, W_o, B_c, B_r

5. Operational Example

Consider a univariate input series XX exhibiting a smooth trend and an end-of-series spike:

  • Common Path: The DPC aligns the trend portion of XX to Bc\mathcal{B}_c, selecting prototypes that best resemble the backbone trend, weighted by shape similarity, forming zc\mathbf{z}_c.
  • Rare Path: The abrupt spike produces high correlation with a rare prototype in Br\mathcal{B}_r; if above threshold, that prototype is activated, and zr\mathbf{z}_r encodes this anomaly.
  • The concatenation [h;zc;zr][\mathbf{h}; \mathbf{z}_c; \mathbf{z}_r] is projected to the final forecast, integrating smooth and abrupt event motifs for adaptivity and reliability in output.

6. Theoretical and Practical Significance

DDP is model-agnostic and auxiliary; it does not require modification of the backbone forecasting architecture. It enables “pattern disentanglement and context-aware adaptation,” equipping models with the ability to represent both dataset-specific stable behaviors and infrequent, critical events. The end-to-end learnability, explicit specialization of banks, and use of dedicated loss terms ensure robust coverage of diverse temporal regimes. Empirically, this structure has demonstrated consistent improvements in predictive accuracy and reliability across various real-world benchmarks, reflecting enhanced utilization of both frequent and rare structural cues in sequential data (Yang et al., 23 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Dual-Prototype Bank (DDP).