Dynamic Dual-Prototype Bank (DDP)

Updated 27 January 2026

DDP is a module that leverages dual prototype banks to disentangle and represent both common trends and rare anomalies in time series data.
It uses a dual-path context-aware routing mechanism with Pearson correlation for selecting and aggregating prototypes effectively.
By integrating specialized loss functions for separation, rarity, and diversity, DDP enhances forecasting accuracy and model adaptivity.

The Dynamic Dual-Prototype Bank (DDP) is a module introduced to enable context-aware pattern disentanglement and adaptive representation for time series forecasting. It operates by maintaining two learnable banks of prototypes: one specialized in capturing common, recurrent patterns (such as trends or seasonalities), and another engineered to dynamically store rare, critical events. By leveraging a dual-path context-aware routing mechanism and a tailored disentanlgement-guided loss, DDP equips backbone forecasting models with the capacity to distinguish, recall, and utilize both prevailing and infrequent temporal behaviors for improved predictive performance (Yang et al., 23 Jan 2026).

1. Architecture and Bank Construction

The DDP comprises two distinct learnable sets:

Common Pattern Bank ( $\mathcal{B}_c$ ): Contains $M$ prototypes intended to represent stable, high-frequency modes of the data, including trends and periodic behaviors.
Rare Pattern Bank ( $\mathcal{B}_r$ ): Contains $N$ prototypes designated to encode irregular, low-frequency, or anomalous events.

Each prototype is a $D$ -dimensional vector in a latent space. Initialization is performed as follows:

$\begin{aligned} \mathbf{s}_c^i &\sim \mathcal{GP}\left(0,\,\lambda_{\ell} K_{\ell} + \lambda_{r} K_{r} + \lambda_{p} K_{p}\right) \quad (i = 1, ..., M), \ \mathbf{s}_r^j &\sim \mathcal{N}(0, \sigma^2 I) \quad (j = 1, ..., N) \end{aligned}$

where $K_{\ell}, K_{r}, K_{p}$ are linear, RBF (radial basis function), and periodic kernels, with $\lambda$ as mixing weights.

These sequences are mapped to embeddings: $\mathbf{p}_c^i = \mathrm{Proj}_c(\mathbf{s}_c^i) \in \mathbb{R}^D, \quad \mathbf{p}_r^j = \mathrm{Proj}_r(\mathbf{s}_r^j) \in \mathbb{R}^D$

Forming banks: $\mathcal{B}_c = \{\mathbf{p}_c^1, ..., \mathbf{p}_c^M\}, \quad \mathcal{B}_r = \{\mathbf{p}_r^1, ..., \mathbf{p}_r^N\}$

Both banks are updated end-to-end via gradient descent according to the total loss function; common prototypes specialize toward stable patterns, rare prototypes adapt to infrequent events.

2. Dual-Path Context-Aware Routing (DPC) Mechanism

DPC is the retrieval and routing component operating at inference and training time. It is responsible for context-selective integration of prototypes with backbone model representations. Given a time series input $\mathbf{X} \in \mathbb{R}^{t \times C}$ and its latent encoding $\mathbf{h} \in \mathbb{R}^D$ :

2.1 Similarity Computation

Pearson correlation is computed between $\mathbf{X}$ and each prototype in both banks: $\begin{aligned} \boldsymbol{\rho}_c &= s(\mathbf{X}, \mathcal{B}_c) \in \mathbb{R}^M \ \boldsymbol{\rho}_r &= s(\mathbf{X}, \mathcal{B}_r) \in \mathbb{R}^N \ s(\mathbf{X}, \mathbf{p}) &= \frac{(\mathbf{X} - \overline{\mathbf{X}}) \cdot (\mathbf{p} - \overline{p})}{\|\mathbf{X} - \overline{\mathbf{X}}\|\; \|\mathbf{p} - \overline{p}\|} \end{aligned}$

2.2 Prototype Selection

Common Path: Selects the top- $K$ most similar common prototypes,

$\mathcal{I}_c = \mathrm{TopK}(\boldsymbol{\rho}_c; K)$

Rare Path: Picks the single most similar rare prototype if similarity exceeds threshold $\varepsilon$ ,

$\mathcal{I}_r = \begin{cases} \arg\max_j \rho_{r,j}, & \max_j \rho_{r,j} > \varepsilon \ \varnothing, & \text{otherwise} \end{cases}$

2.3 Weighting and Aggregation

Weighted sum for common ( $\omega_c$ ) and one-hot for rare ( $\omega_r$ ): $\omega_c = \mathrm{Softmax}\left(\tfrac{\boldsymbol{\rho}_c[\mathcal{I}_c]}{\tau}\right), \quad \omega_r \in \{0, 1\}^N \text{ with } 1 \text{ at } \mathcal{I}_r$

Contribution vectors: $\begin{aligned} \mathbf{z}_c &= \sum_{k \in \mathcal{I}_c} \omega_c^k \, \mathbf{p}_c^k \ \mathbf{z}_r &= \sum_{j \in \mathcal{I}_r} \omega_r^j \, \mathbf{p}_r^j \end{aligned}$

Fusion: $\widehat{\mathbf{Y}} = \mathbf{W}_o[\mathbf{h} \, \| \, \mathbf{z}_c \, \| \, \mathbf{z}_r]$ with $\mathbf{W}_o$ a trainable projection.

3. Disentanglement-Guided Loss (DGLoss)

Supervision is enforced using a composite loss: $\mathcal{L} = \mathcal{L}_{\mathrm{MSE}} + \lambda_{\mathrm{sep}} \mathcal{L}_{\mathrm{sep}} + \lambda_{\mathrm{rare}} \mathcal{L}_{\mathrm{rare}} + \lambda_{\mathrm{div}} \mathcal{L}_{\mathrm{div}}$

Mean Squared Error ( $\mathcal{L}_{\mathrm{MSE}}$ ): Forecast regression signal.
Separation Loss ( $\mathcal{L}_{\mathrm{sep}}$ ): Encourages dissimilarity between top matches of common and rare banks, facilitating role separation. For $\Delta\rho = \rho_c^{\max} - \rho_r^{\max}$ and empirical pattern-frequency weight $\omega$ , with margin $m$ :

$\mathcal{L}_{\mathrm{sep}} = \mathbb{E}\left[\omega \max(0, m - \Delta\rho) + (1 - \omega) \max(0, m + \Delta\rho)\right]$

Rarity Preservation Loss ( $\mathcal{L}_{\mathrm{rare}}$ ): Promotes distinctiveness of rare prototypes via log-softmax over activated similarities:

$\mathcal{L}_{\mathrm{rare}} = -\frac{1}{|\mathcal{A}|} \sum_{k \in \mathcal{A}} \log\frac {\exp(\tfrac{s_{kk}}{\tau})} {\sum_{j=1}^N \exp(\tfrac{s_{kj}}{\tau})}$

Common Diversity Loss ( $\mathcal{L}_{\mathrm{div}}$ ): Ensures diversity among common prototypes:

$\mathcal{L}_{\mathrm{div}} = \frac{1}{M(M-1)} \sum_{i=1}^M\sum_{j \neq i} \left( \frac{\mathbf{p}_c^i{}^\top \mathbf{p}_c^j}{\|\mathbf{p}_c^i\| \|\mathbf{p}_c^j\|} \right)^2$

4. Algorithmic Flow

The following succinctly summarizes the training loop:

Input: Historical series X, target Y, backbone f, banks B_c, B_r, hyperparams (K, τ, ε,…)
Output: Trained backbone + banks

for each training batch (X, Y):
  h ← f(X)
  ρ_c ← PearsonCorr(X, B_c)
  ρ_r ← PearsonCorr(X, B_r)
  I_c ← TopK(ρ_c, K)
  I_r ← argmax_j ρ_r[j] if max(ρ_r)>ε else ∅
  ω_c ← softmax(ρ_c[I_c]/τ)
  ω_r ← one_hot(I_r, length=N)
  z_c ← Σ_{k∈I_c} ω_c[k]·B_c[k]
  z_r ← Σ_{j∈I_r} ω_r[j]·B_r[j]
  Ŷ ← W_o([h ∥ z_c ∥ z_r])
  L_MSE ← mean((Ŷ−Y)²)
  L_sep ← compute_separation_loss(ρ_c, ρ_r, ω_batch, m)
  L_rare← compute_rarity_loss(ρ_r, I_r)
  L_div ← compute_diversity_loss(B_c)
  L_DGL ← λ_sep·L_sep + λ_rare·L_rare + λ_div·L_div
  L     ← L_MSE + L_DGL
  Backpropagate L, update f, W_o, B_c, B_r

5. Operational Example

Consider a univariate input series $X$ exhibiting a smooth trend and an end-of-series spike:

Common Path: The DPC aligns the trend portion of $X$ to $\mathcal{B}_c$ , selecting prototypes that best resemble the backbone trend, weighted by shape similarity, forming $\mathbf{z}_c$ .
Rare Path: The abrupt spike produces high correlation with a rare prototype in $\mathcal{B}_r$ ; if above threshold, that prototype is activated, and $\mathbf{z}_r$ encodes this anomaly.
The concatenation $[\mathbf{h}; \mathbf{z}_c; \mathbf{z}_r]$ is projected to the final forecast, integrating smooth and abrupt event motifs for adaptivity and reliability in output.

6. Theoretical and Practical Significance

DDP is model-agnostic and auxiliary; it does not require modification of the backbone forecasting architecture. It enables “pattern disentanglement and context-aware adaptation,” equipping models with the ability to represent both dataset-specific stable behaviors and infrequent, critical events. The end-to-end learnability, explicit specialization of banks, and use of dedicated loss terms ensure robust coverage of diverse temporal regimes. Empirically, this structure has demonstrated consistent improvements in predictive accuracy and reliability across various real-world benchmarks, reflecting enhanced utilization of both frequent and rare structural cues in sequential data (Yang et al., 23 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Dual-Prototype Disentanglement: A Context-Aware Enhancement Framework for Time Series Forecasting (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Dual-Prototype Bank (DDP).

Dynamic Dual-Prototype Bank (DDP)

1. Architecture and Bank Construction

2. Dual-Path Context-Aware Routing (DPC) Mechanism

2.1 Similarity Computation

2.2 Prototype Selection

2.3 Weighting and Aggregation

3. Disentanglement-Guided Loss (DGLoss)

4. Algorithmic Flow

5. Operational Example

6. Theoretical and Practical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Dynamic Dual-Prototype Bank (DDP)

1. Architecture and Bank Construction

2. Dual-Path Context-Aware Routing (DPC) Mechanism

2.1 Similarity Computation

2.2 Prototype Selection

2.3 Weighting and Aggregation

3. Disentanglement-Guided Loss (DGLoss)

4. Algorithmic Flow

5. Operational Example

6. Theoretical and Practical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research