Papers
Topics
Authors
Recent
Search
2000 character limit reached

FaultDiffusion: Few-Shot Fault Generation

Updated 26 December 2025
  • FaultDiffusion is a generative modeling framework that synthesizes fault time-series by leveraging abundant normal data and a few fault examples.
  • It uses a two-stage training process with a diffusion backbone pre-trained on normal data followed by fine-tuning a positive–negative difference adapter for fault adaptation.
  • The framework incorporates a diversity regularizer to prevent mode collapse, achieving strong, benchmarked performance on industrial fault datasets.

FaultDiffusion is a generative modeling framework for few-shot fault time-series generation developed for scenarios with abundant normal (healthy) multivariate sensor data and scarce annotated fault (anomalous) data. Addressing the challenge of generating diverse, realistic synthetic fault samples despite limited faulty examples, FaultDiffusion leverages a diffusion model backbone pre-trained on normal series, then fine-tunes a lightweight positive-negative difference adapter on a small set of fault traces while preventing mode collapse with a novel diversity regularizer. The approach is distinguished by its two-stage training, architectural innovations for domain adaptation, and strong empirical performance across industrial benchmark datasets (Xu et al., 19 Nov 2025).

1. Problem Formulation and Motivating Context

The fundamental problem addressed by FaultDiffusion is the reliable synthesis of fault time-series under an extreme data imbalance regime, typical in industrial equipment monitoring. The multivariate time-series is denoted X1:τ=(x1,...,xτ)Rτ×dX_{1:\tau} = (x_1, ..., x_\tau) \in \mathbb{R}^{\tau \times d}, with large normal dataset DN={X1:τ,in}i=1Nn\mathcal{D}_N = \{ X^n_{1:\tau,i} \}_{i=1}^{N_n} and a small KK-shot fault set DF={X1:τ,jf}j=1K\mathcal{D}_F = \{ X^f_{1:\tau,j} \}_{j=1}^K where KNnK \ll N_n. The objective is to train a generator GG such that its output distribution p^f(x)\hat{p}_f(x) closely matches the true fault distribution pf(x)p_f(x). This is formalized as

pf(x)=pn(x)+Δθ(x),p_f(x) = p_n(x) + \Delta_\theta(x),

where pnp_n is the normal data distribution and Δθ(x)\Delta_\theta(x) is a learned correction capturing the domain shift (Xu et al., 19 Nov 2025).

2. Diffusion Model Backbone

FaultDiffusion builds on the Denoising Diffusion Probabilistic Model (DDPM) framework, which comprises a forward noising process and a learned reverse process. The forward process evolves as

q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I),

with t=1,,Tt = 1,\ldots,T and variance schedule {βt}\{ \beta_t \}. The joint noising process is

q(xtx0)=N(xt;αˉtx0,(1αˉt)I),q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar\alpha_t} x_0, (1-\bar\alpha_t) I),

where αˉt=s=1t(1βs)\bar\alpha_t = \prod_{s=1}^t(1 - \beta_s). The reverse process is parameterized as

pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t)).p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t)).

Training proceeds by minimizing a noise prediction loss derived from the ELBO:

Ldiff=Et,x0,ϵ[ϵϵθ(xt,t)22],\mathcal{L}_{\text{diff}} = \mathbb{E}_{t, x_0, \epsilon} \bigl[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2_2 \bigr],

where xt=αˉtx0+1αˉtϵ,ϵN(0,I)x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1-\bar\alpha_t}\epsilon, \epsilon \sim \mathcal{N}(0, I) (Xu et al., 19 Nov 2025).

3. Positive–Negative Difference Adapter

To enable few-shot adaptation from normal to fault domains, FaultDiffusion introduces a positive–negative difference adapter. During fine-tuning (on DF\mathcal{D}_F), only the adapter’s parameters are updated while the diffusion backbone remains frozen. At each network layer tt, the adapter receives as input the backbone hidden activations hback(t)h^{(t)}_{\mathrm{back}} and an accumulation of prior adapter outputs:

hin(t)=hback(t)+k=1t1hloc(k).h^{(t)}_{\mathrm{in}} = h^{(t)}_{\mathrm{back}} + \sum_{k=1}^{t-1} h^{(k)}_{\mathrm{loc}}.

The adapter mechanism employs sliding-window multi-head attention, conferring sensitivity to local temporal anomalies. Its output hloc(t)h^{(t)}_{\mathrm{loc}} is fused by a residual connection:

hback(t+1)=hback(t)+αhloc(t),h^{(t+1)}_{\mathrm{back}} = h^{(t)}_{\mathrm{back}} + \alpha h^{(t)}_{\mathrm{loc}},

where α\alpha is a learnable scaling factor. This enables effective modeling of Δθ(x)pf(x)pn(x)\Delta_\theta(x) \approx p_f(x) - p_n(x), conditioning the generation process on learned fault signatures (Xu et al., 19 Nov 2025).

4. Diversity Regularization and Training Procedure

To prevent mode collapse in the low-data regime, FaultDiffusion utilizes an inter-sample diversity loss:

Ldiv=E[s1s222],\mathcal{L}_{\mathrm{div}} = \mathbb{E} \left[ \| s_1 - s_2 \|^2_2 \right],

where s1,s2s_1, s_2 are independent noise prediction samples from the network for the same input. The final fine-tuning loss is

L=Ldiffdenoising+λadvLadapter(optional) direct supervision of Δ+λdivLdivdiversity,\mathcal{L} = \underbrace{\mathcal{L}_{\mathrm{diff}}}_{\text{denoising}} + \lambda_{\mathrm{adv}} \underbrace{\mathcal{L}_{\mathrm{adapter}}}_{\substack{\text{(optional) direct} \ \text{supervision of } \Delta}} + \lambda_{\mathrm{div}} \underbrace{\mathcal{L}_{\mathrm{div}}}_{\text{diversity}},

with Ladapter\mathcal{L}_{\mathrm{adapter}} optionally enforcing an 2\ell_2 penalty on adapter output and λadv,λdiv\lambda_{\mathrm{adv}}, \lambda_{\mathrm{div}} balancing the terms.

The algorithm consists of:

  • Fine-tuning: sampling mini-batches from DF\mathcal{D}_F, adding noise, passing through the frozen backbone and trainable adapter, evaluating losses, and updating adapter parameters.
  • Sampling: iterative denoising from xTN(0,I)x_T \sim \mathcal{N}(0,I), with the adapter correcting backbone outputs at each step (Xu et al., 19 Nov 2025).

5. Experimental Protocols and Evaluation Metrics

FaultDiffusion is evaluated on a custom industrial dataset (15 fault types), the Tennessee Eastman Process (TEP; 6 faults), and DAMADICS (4 valve-fault types), following a strict few-shot protocol with K{1,5}K \in \{1, 5\}. Baselines include TimeGAN, TimeVAE, Cot-GAN, and Diffusion-TS (the latter trained jointly on normal and fault examples).

The evaluation framework comprises:

  • Context-FID: Local contextual Fréchet distance measuring authenticity.
  • Correlational Score: Error on cross-correlation matrices.
  • Discriminative Score: AUC of real vs synthetic classifier.
  • Predictive Score: TSTR (train-synthetic, test-real) forecast error.

The following table summarizes results on the industrial dataset (sequence length 24, lower is better):

Method context-FID Corr. Score Disc. Score Pred. Score
Cot-GAN 6.336 142.72 0.436 0.133
TimeGAN 7.025 137.12 0.438 0.137
TimeVAE 5.990 134.21 0.438 0.115
Diffusion-TS 6.728 117.44 0.465 0.135
Ours 6.081 127.37 0.415 0.131

On public TEP/DAMADICS datasets, FaultDiffusion achieves the lowest context-FID and predictive errors in 8/10 fault settings and provides competitive correlational and discriminative scores (Xu et al., 19 Nov 2025).

6. Ablation Studies and Downstream Utility

Ablation results on the industrial set indicate the criticality of both the adapter and the diversity loss. Removing both yields context-FID =10.47= 10.47; adapter only =7.35= 7.35; diversity only =8.21= 8.21; and the full model =5.12= 5.12. For downstream time-series classification, training a 15-way classifier on synthetic data gives $0.8933$ accuracy (20%\sim 20\% higher than Diffusion-TS at $0.7413$) (Xu et al., 19 Nov 2025).

This suggests FaultDiffusion’s generative outputs are both diverse and task-relevant for subsequent fault diagnosis pipelines.

7. Significance and Architectural Insights

FaultDiffusion demonstrates that pre-training a diffusion backbone on abundant normal data, then fine-tuning a compact, residual adapter to model the fault-normal domain gap, is highly effective in few-shot settings. The lightweight fine-tuning is particularly advantageous when fault annotations are rare and costly, which is common in industrial contexts. The explicit diversity regularizer addresses a major limitation of conventional generative methods when data is scarce, promoting better intra-class variability in generated samples.

A plausible implication is that such adapter-based approaches, combined with self-supervised pretraining on normal data and explicit diversity constraints, may generalize to other rare-event synthesis tasks beyond fault diagnosis (Xu et al., 19 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FaultDiffusion.