Papers
Topics
Authors
Recent
Search
2000 character limit reached

BDM: Bi-LSTM Embedding Denoising Autoencoder

Updated 19 January 2026
  • BDM is a hybrid model integrating Bi-LSTM embedding, denoising autoencoder, and Transformer forecasting to improve multistep time-series predictions.
  • Its denoising autoencoder reduces noise and extracts robust signal components, enhancing prediction accuracy across varying temporal horizons.
  • Empirical evaluations on EV charging loads show significant MAE reductions compared to standalone Transformer models at longer forecast horizons.

The Bi-LSTM Embedding Denoising Autoencoder Model (BDM) is a hybrid deep-learning architecture tailored for multistep time-series forecasting. Developed for applications such as short-term electric vehicle (EV) charging load prediction, BDM integrates three core components: a bidirectional LSTM-based embedding layer, a denoising autoencoder, and a Transformer encoder–decoder module. By sequentially combining local feature learning, noise suppression, and long-range dependency modeling, BDM yields robust and accurate forecasts across a variety of temporal horizons (Koohfar et al., 21 Sep 2025).

1. Model Architecture and Workflow

BDM consists of three distinct stages, each contributing specialized representations and processing to the time series forecasting pipeline.

Input Construction:

At each time step tt, the model aggregates the raw multivariate input xt\mathbf{x}_t (e.g., EV charging loads, timestamps, weather features) with a normalized time vector tvt=[t;norm(t)]\mathrm{tv}_t = [t; \text{norm}(t)], producing the joint input zt=[xt;tvt]\mathbf{z}_t = [\mathbf{x}_t; \mathrm{tv}_t].

Stage I – Bi-LSTM Embedding:

The sequence {zt}t=1T\{\mathbf{z}_t\}_{t=1}^T passes through a two-layer bidirectional LSTM. Outputs from the forward and backward passes are concatenated and projected through a dense layer to form time-step embeddings et\mathbf{e}_t. This process encapsulates both past and future temporal context.

Stage II – Denoising Autoencoder:

Gaussian noise or dropout-style masking is applied to each et\mathbf{e}_t to form e~t\tilde{\mathbf{e}}_t. A neural encoder hencoderh_{encoder} compresses the noisy embedding into a lower-dimensional code hth_t, which the decoder gdecoderg_{decoder} reconstructs to yield e^t\hat{\mathbf{e}}_t. The denoising autoencoder is trained to minimize

LAE=1Tt=1Tetgdecoder(hencoder(e~t))22.\mathcal{L}_{AE} = \frac{1}{T} \sum_{t=1}^{T} \|\mathbf{e}_t - g_{decoder}(h_{encoder}(\tilde{\mathbf{e}}_t))\|^2_2.

This explicitly learns a denoised representation emphasizing signal over noise.

Stage III – Transformer Forecasting:

The denoised embeddings e^t\hat{\mathbf{e}}_t are linearly projected into queries, keys, and values, then processed by a multi-layer, multi-head self-attention Transformer encoder–decoder. The Transformer outputs the HH-step forecast sequence {y^t+1,,y^t+H}\{\hat{y}_{t+1}, \ldots, \hat{y}_{t+H}\}; the forecasting objective is standard MSE:

Lforecast=1Ni=1N(yiy^i)2.\mathcal{L}_{forecast} = \frac{1}{N}\sum_{i=1}^N (y_i - \hat{y}_i)^2.

2. Mathematical Formulation

The following encapsulates the primary mathematical operations performed by the BDM model:

  • Bi-LSTM Embedding:

ht=LSTMfwd(zt,ht1),ht=LSTMbwd(zt,ht+1),\overrightarrow{h}_t = \mathrm{LSTM}_{\text{fwd}}(\mathbf{z}_t, \overrightarrow{h}_{t-1}), \quad \overleftarrow{h}_t = \mathrm{LSTM}_{\text{bwd}}(\mathbf{z}_t, \overleftarrow{h}_{t+1}),

et=fembed(zt)=dense([ht;ht]).\mathbf{e}_t = f_{embed}(\mathbf{z}_t) = \text{dense}([\overrightarrow{h}_t ; \overleftarrow{h}_t]).

  • Noise Injection:

e~t=et+nt,ntN(0,σ2I) or masking.\tilde{\mathbf{e}}_t = \mathbf{e}_t + \mathbf{n}_t, \quad \mathbf{n}_t \sim \mathcal{N}(\mathbf{0}, \sigma^2 I) \text{ or masking}.

  • Denoising Autoencoder Loss:

LAE=1Tt=1Tetgdecoder(hencoder(e~t))22.\mathcal{L}_{AE} = \frac{1}{T} \sum_{t=1}^T \|\mathbf{e}_t - g_{decoder}(h_{encoder}(\tilde{\mathbf{e}}_t))\|_2^2.

  • Forecasting Loss:

Lforecast=1Ni=1N(yiy^i)2.\mathcal{L}_{forecast} = \frac{1}{N}\sum_{i=1}^N (y_i - \hat{y}_i)^2.

3. Hyper-Parameterization and Training Regimen

BDM’s operational efficacy is a consequence of carefully tuned hyper-parameters and a targeted optimization protocol.

Subsystem Key Hyper-Parameters
Bi-LSTM Hidden size: 64 (direction), Layers: 2
Denoising Autoencoder Latent dim: 32, σ0.1\sigma \approx 0.1 noise, ReLU
Transformer Model dim: 64, 8 heads, 3 layers, Dropout: 0.1

Additional training settings include the use of the Adam optimizer with an initial learning rate of 10310^{-3}, scheduled warmup over the first 10% of steps, followed by inverse square-root decay. Mini-batches of size 32 are employed, and early stopping on validation loss is integrated within the 100-epoch training window.

4. Empirical Evaluation and Quantitative Comparison

BDM is evaluated on a dataset comprising hourly EV charging loads from approximately 6,800 charging sessions (spanning December 2018–January 2020). The data is partitioned with an 80%/10%/10% train/validation/test split. Forecast horizons cover 24, 48, 72, 96, and 120 hours. Performance metrics include RMSE and MAE, averaged over 5 experimental runs. Benchmarked against standalone Transformer, CNN, RNN, LSTM, and GRU baselines, BDM achieves superior results on four out of five horizons.

Horizon (h) BDM MAE Transformer MAE Relative MAE Reduction
24 0.085 0.060 Transformer better
48 0.066 0.103 −36%
72 0.069 0.103 −33%
96 0.060 0.110 −46%
120 0.089 0.139 −36%

On the 24-hour horizon, Transformer slightly outperforms BDM; at all subsequent horizons, BDM demonstrates substantial and consistent MAE reductions (Koohfar et al., 21 Sep 2025).

5. Design Rationale and Theoretical Motivation

The BDM architecture leverages the complementary strengths of each component:

  • Bi-LSTM Embedding: Enables the integration of both forward and backward temporal context, encapsulating richer local features than unidirectional or feedforward models.
  • Denoising Autoencoder: Explicitly reduces sensitivity to sensor noise and outliers by learning a concise manifold of underlying dynamics and prioritizing dominant signal components.
  • Transformer Encoder–Decoder: Excels at modeling long-range dependencies via self-attention, overcoming the vanishing-gradient limitations of RNN-based predictors.

Sequentially uniting these paradigms, BDM achieves robustness to spurious fluctuations, improved local pattern extraction, and enhanced capacity for capturing both short-term and long-term dependencies.

6. Application Scope and Broader Context

BDM was originally implemented for short-term EV charging load prediction but generalizes to any multivariate time series context with non-negligible noise and both local and global temporal structure. This hybridization paradigm aligns with broader trends in time series research, wherein targeted integration of recurrent, autoencoding, and attention mechanisms yields advances over monolithic architectures (Koohfar et al., 21 Sep 2025). A plausible implication is that domain-specific adaptation of the denoising scheme or embedding size could further enhance performance across application verticals.

7. Limitations and Prospects

While BDM establishes a new empirical baseline for multi-horizon EV charging forecasts, its marginal underperformance at the shortest horizon suggests that architecture selection should be horizon-dependent. This model’s complexity and staged training requirements may also entail increased computational overhead relative to standalone models. Future research may investigate architectural unification, adaptive noise schemes, or domain-aware regularization to further optimize multistep sequence forecasting systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bi-LSTM Embedding Denoising Autoencoder Model (BDM).