FaultDiffusion: Few-Shot Fault Generation
- FaultDiffusion is a generative modeling framework that synthesizes fault time-series by leveraging abundant normal data and a few fault examples.
- It uses a two-stage training process with a diffusion backbone pre-trained on normal data followed by fine-tuning a positive–negative difference adapter for fault adaptation.
- The framework incorporates a diversity regularizer to prevent mode collapse, achieving strong, benchmarked performance on industrial fault datasets.
FaultDiffusion is a generative modeling framework for few-shot fault time-series generation developed for scenarios with abundant normal (healthy) multivariate sensor data and scarce annotated fault (anomalous) data. Addressing the challenge of generating diverse, realistic synthetic fault samples despite limited faulty examples, FaultDiffusion leverages a diffusion model backbone pre-trained on normal series, then fine-tunes a lightweight positive-negative difference adapter on a small set of fault traces while preventing mode collapse with a novel diversity regularizer. The approach is distinguished by its two-stage training, architectural innovations for domain adaptation, and strong empirical performance across industrial benchmark datasets (Xu et al., 19 Nov 2025).
1. Problem Formulation and Motivating Context
The fundamental problem addressed by FaultDiffusion is the reliable synthesis of fault time-series under an extreme data imbalance regime, typical in industrial equipment monitoring. The multivariate time-series is denoted , with large normal dataset and a small -shot fault set where . The objective is to train a generator such that its output distribution closely matches the true fault distribution . This is formalized as
where is the normal data distribution and is a learned correction capturing the domain shift (Xu et al., 19 Nov 2025).
2. Diffusion Model Backbone
FaultDiffusion builds on the Denoising Diffusion Probabilistic Model (DDPM) framework, which comprises a forward noising process and a learned reverse process. The forward process evolves as
with and variance schedule . The joint noising process is
where . The reverse process is parameterized as
Training proceeds by minimizing a noise prediction loss derived from the ELBO:
where (Xu et al., 19 Nov 2025).
3. Positive–Negative Difference Adapter
To enable few-shot adaptation from normal to fault domains, FaultDiffusion introduces a positive–negative difference adapter. During fine-tuning (on ), only the adapter’s parameters are updated while the diffusion backbone remains frozen. At each network layer , the adapter receives as input the backbone hidden activations and an accumulation of prior adapter outputs:
The adapter mechanism employs sliding-window multi-head attention, conferring sensitivity to local temporal anomalies. Its output is fused by a residual connection:
where is a learnable scaling factor. This enables effective modeling of , conditioning the generation process on learned fault signatures (Xu et al., 19 Nov 2025).
4. Diversity Regularization and Training Procedure
To prevent mode collapse in the low-data regime, FaultDiffusion utilizes an inter-sample diversity loss:
where are independent noise prediction samples from the network for the same input. The final fine-tuning loss is
with optionally enforcing an penalty on adapter output and balancing the terms.
The algorithm consists of:
- Fine-tuning: sampling mini-batches from , adding noise, passing through the frozen backbone and trainable adapter, evaluating losses, and updating adapter parameters.
- Sampling: iterative denoising from , with the adapter correcting backbone outputs at each step (Xu et al., 19 Nov 2025).
5. Experimental Protocols and Evaluation Metrics
FaultDiffusion is evaluated on a custom industrial dataset (15 fault types), the Tennessee Eastman Process (TEP; 6 faults), and DAMADICS (4 valve-fault types), following a strict few-shot protocol with . Baselines include TimeGAN, TimeVAE, Cot-GAN, and Diffusion-TS (the latter trained jointly on normal and fault examples).
The evaluation framework comprises:
- Context-FID: Local contextual Fréchet distance measuring authenticity.
- Correlational Score: Error on cross-correlation matrices.
- Discriminative Score: AUC of real vs synthetic classifier.
- Predictive Score: TSTR (train-synthetic, test-real) forecast error.
The following table summarizes results on the industrial dataset (sequence length 24, lower is better):
| Method | context-FID | Corr. Score | Disc. Score | Pred. Score |
|---|---|---|---|---|
| Cot-GAN | 6.336 | 142.72 | 0.436 | 0.133 |
| TimeGAN | 7.025 | 137.12 | 0.438 | 0.137 |
| TimeVAE | 5.990 | 134.21 | 0.438 | 0.115 |
| Diffusion-TS | 6.728 | 117.44 | 0.465 | 0.135 |
| Ours | 6.081 | 127.37 | 0.415 | 0.131 |
On public TEP/DAMADICS datasets, FaultDiffusion achieves the lowest context-FID and predictive errors in 8/10 fault settings and provides competitive correlational and discriminative scores (Xu et al., 19 Nov 2025).
6. Ablation Studies and Downstream Utility
Ablation results on the industrial set indicate the criticality of both the adapter and the diversity loss. Removing both yields context-FID ; adapter only ; diversity only ; and the full model . For downstream time-series classification, training a 15-way classifier on synthetic data gives $0.8933$ accuracy ( higher than Diffusion-TS at $0.7413$) (Xu et al., 19 Nov 2025).
This suggests FaultDiffusion’s generative outputs are both diverse and task-relevant for subsequent fault diagnosis pipelines.
7. Significance and Architectural Insights
FaultDiffusion demonstrates that pre-training a diffusion backbone on abundant normal data, then fine-tuning a compact, residual adapter to model the fault-normal domain gap, is highly effective in few-shot settings. The lightweight fine-tuning is particularly advantageous when fault annotations are rare and costly, which is common in industrial contexts. The explicit diversity regularizer addresses a major limitation of conventional generative methods when data is scarce, promoting better intra-class variability in generated samples.
A plausible implication is that such adapter-based approaches, combined with self-supervised pretraining on normal data and explicit diversity constraints, may generalize to other rare-event synthesis tasks beyond fault diagnosis (Xu et al., 19 Nov 2025).