BDM: Bi-LSTM Embedding Denoising Autoencoder
- BDM is a hybrid model integrating Bi-LSTM embedding, denoising autoencoder, and Transformer forecasting to improve multistep time-series predictions.
- Its denoising autoencoder reduces noise and extracts robust signal components, enhancing prediction accuracy across varying temporal horizons.
- Empirical evaluations on EV charging loads show significant MAE reductions compared to standalone Transformer models at longer forecast horizons.
The Bi-LSTM Embedding Denoising Autoencoder Model (BDM) is a hybrid deep-learning architecture tailored for multistep time-series forecasting. Developed for applications such as short-term electric vehicle (EV) charging load prediction, BDM integrates three core components: a bidirectional LSTM-based embedding layer, a denoising autoencoder, and a Transformer encoder–decoder module. By sequentially combining local feature learning, noise suppression, and long-range dependency modeling, BDM yields robust and accurate forecasts across a variety of temporal horizons (Koohfar et al., 21 Sep 2025).
1. Model Architecture and Workflow
BDM consists of three distinct stages, each contributing specialized representations and processing to the time series forecasting pipeline.
Input Construction:
At each time step , the model aggregates the raw multivariate input (e.g., EV charging loads, timestamps, weather features) with a normalized time vector , producing the joint input .
Stage I – Bi-LSTM Embedding:
The sequence passes through a two-layer bidirectional LSTM. Outputs from the forward and backward passes are concatenated and projected through a dense layer to form time-step embeddings . This process encapsulates both past and future temporal context.
Stage II – Denoising Autoencoder:
Gaussian noise or dropout-style masking is applied to each to form . A neural encoder compresses the noisy embedding into a lower-dimensional code , which the decoder reconstructs to yield . The denoising autoencoder is trained to minimize
This explicitly learns a denoised representation emphasizing signal over noise.
Stage III – Transformer Forecasting:
The denoised embeddings are linearly projected into queries, keys, and values, then processed by a multi-layer, multi-head self-attention Transformer encoder–decoder. The Transformer outputs the -step forecast sequence ; the forecasting objective is standard MSE:
2. Mathematical Formulation
The following encapsulates the primary mathematical operations performed by the BDM model:
- Bi-LSTM Embedding:
- Noise Injection:
- Denoising Autoencoder Loss:
- Forecasting Loss:
3. Hyper-Parameterization and Training Regimen
BDM’s operational efficacy is a consequence of carefully tuned hyper-parameters and a targeted optimization protocol.
| Subsystem | Key Hyper-Parameters |
|---|---|
| Bi-LSTM | Hidden size: 64 (direction), Layers: 2 |
| Denoising Autoencoder | Latent dim: 32, noise, ReLU |
| Transformer | Model dim: 64, 8 heads, 3 layers, Dropout: 0.1 |
Additional training settings include the use of the Adam optimizer with an initial learning rate of , scheduled warmup over the first 10% of steps, followed by inverse square-root decay. Mini-batches of size 32 are employed, and early stopping on validation loss is integrated within the 100-epoch training window.
4. Empirical Evaluation and Quantitative Comparison
BDM is evaluated on a dataset comprising hourly EV charging loads from approximately 6,800 charging sessions (spanning December 2018–January 2020). The data is partitioned with an 80%/10%/10% train/validation/test split. Forecast horizons cover 24, 48, 72, 96, and 120 hours. Performance metrics include RMSE and MAE, averaged over 5 experimental runs. Benchmarked against standalone Transformer, CNN, RNN, LSTM, and GRU baselines, BDM achieves superior results on four out of five horizons.
| Horizon (h) | BDM MAE | Transformer MAE | Relative MAE Reduction |
|---|---|---|---|
| 24 | 0.085 | 0.060 | Transformer better |
| 48 | 0.066 | 0.103 | −36% |
| 72 | 0.069 | 0.103 | −33% |
| 96 | 0.060 | 0.110 | −46% |
| 120 | 0.089 | 0.139 | −36% |
On the 24-hour horizon, Transformer slightly outperforms BDM; at all subsequent horizons, BDM demonstrates substantial and consistent MAE reductions (Koohfar et al., 21 Sep 2025).
5. Design Rationale and Theoretical Motivation
The BDM architecture leverages the complementary strengths of each component:
- Bi-LSTM Embedding: Enables the integration of both forward and backward temporal context, encapsulating richer local features than unidirectional or feedforward models.
- Denoising Autoencoder: Explicitly reduces sensitivity to sensor noise and outliers by learning a concise manifold of underlying dynamics and prioritizing dominant signal components.
- Transformer Encoder–Decoder: Excels at modeling long-range dependencies via self-attention, overcoming the vanishing-gradient limitations of RNN-based predictors.
Sequentially uniting these paradigms, BDM achieves robustness to spurious fluctuations, improved local pattern extraction, and enhanced capacity for capturing both short-term and long-term dependencies.
6. Application Scope and Broader Context
BDM was originally implemented for short-term EV charging load prediction but generalizes to any multivariate time series context with non-negligible noise and both local and global temporal structure. This hybridization paradigm aligns with broader trends in time series research, wherein targeted integration of recurrent, autoencoding, and attention mechanisms yields advances over monolithic architectures (Koohfar et al., 21 Sep 2025). A plausible implication is that domain-specific adaptation of the denoising scheme or embedding size could further enhance performance across application verticals.
7. Limitations and Prospects
While BDM establishes a new empirical baseline for multi-horizon EV charging forecasts, its marginal underperformance at the shortest horizon suggests that architecture selection should be horizon-dependent. This model’s complexity and staged training requirements may also entail increased computational overhead relative to standalone models. Future research may investigate architectural unification, adaptive noise schemes, or domain-aware regularization to further optimize multistep sequence forecasting systems.