Papers
Topics
Authors
Recent
Search
2000 character limit reached

Physics-Guided Tiny-Mamba Transformer

Updated 5 February 2026
  • The paper introduces PG-TMT, a compact tri-branch encoder that integrates physics-guided spectral mapping and EVT-calibrated thresholds to enhance early fault detection in rotating machinery.
  • It fuses depthwise convolution, state-space modeling, and local transformer attention to capture micro-transients, long-range dynamics, and cross-channel resonances with high precision.
  • Experimental evaluations show robust PR-AUC and ROC AUC, low latency, and reliable transfer across domains under severe nonstationary conditions and class imbalances.

The Physics-Guided Tiny-Mamba Transformer (PG-TMT) is a compact, tri-branch encoder architecture designed for reliability-aware early fault warning in rotating machinery under nonstationary conditions, domain shifts, and severe class imbalance. PG-TMT integrates physically guided priors—explicit temporal-to-spectral mappings aligned with mechanical defect frequencies—into a fusion of depthwise-separable convolution, state-space modeling, and attention-based resonance capture. Decision reliability is ensured through extreme-value theory (EVT) calibrated thresholds and hysteretic alarm logic. Evaluation across public and industrial datasets demonstrates competitive precision-recall metrics, timeliness, robust transfer, and deployment feasibility (Li et al., 29 Jan 2026).

1. Tri-Branch Encoder Architecture

PG-TMT processes online windows of multichannel vibration signals, xt∈RC×L\mathbf{x}_t\in\mathbb{R}^{C\times L}, to produce a calibrated anomaly score st∈[0,1]s_t\in[0,1] at each time tt (hop h≪Lh\ll L, batch-size 1). The encoder is organized into three complementary branches:

  • Depthwise-Separable Convolutional Stem (Micro-Transients):

A cascade of causal 1D depthwise convolutions (kernel size kk, optional dilation δ\delta) is followed by per-channel pointwise (1×11\times1) convolutions. At each layer ℓ\ell, for input z(ℓ)∈RC×L\mathbf{z}^{(\ell)}\in\mathbb{R}^{C\times L},

z~c,∗(ℓ)=Conv1D(zc,∗(ℓ); k,δ),zf,∗(ℓ+1)=∑c=1Cwf,c(ℓ) z~c,∗(ℓ).\tilde{\mathbf{z}}^{(\ell)}_{c,*} = \mathrm{Conv1D}\bigl(\mathbf{z}^{(\ell)}_{c,*};\,k,\delta\bigr),\quad \mathbf{z}^{(\ell+1)}_{f,*} = \sum_{c=1}^C w^{(\ell)}_{f,c}\,\tilde{\mathbf{z}}^{(\ell)}_{c,*}.

The receptive field, st∈[0,1]s_t\in[0,1]0, is tuned for sub-millisecond impact-like transients. Output: st∈[0,1]s_t\in[0,1]1.

  • Tiny-Mamba State-Space Branch (Long-Range Dynamics):

A gated, linear state-space model captures near-linear degradation over hundreds or thousands of timesteps:

st∈[0,1]s_t\in[0,1]2

Here, st∈[0,1]s_t\in[0,1]3 is a channel-reduced input, st∈[0,1]s_t\in[0,1]4 is the latent state, and st∈[0,1]s_t\in[0,1]5 are learned gates. Stability is enforced via st∈[0,1]s_t\in[0,1]6 with discretization:

st∈[0,1]s_t\in[0,1]7

ensuring st∈[0,1]s_t\in[0,1]8. Output: st∈[0,1]s_t\in[0,1]9.

  • Local Transformer (Cross-Channel Resonances):

Self-attention is restricted to a causal window tt0 for each head tt1:

tt2

producing tt3 for cross-channel resonance encoding.

Branch outputs are concatenated, tt4, and fused by a gated residual:

tt5

A local attention distribution tt6, Jensen–Shannon discrepancy term, and a final score tt7 (with tt8 incorporating evidence and discrepancy) complete the inference pipeline.

2. Physically Guided Temporal–Spectral Alignment

PG-TMT imposes explicit temporal-to-spectral mapping by analytically connecting learned temporal attention to classical fault-order bands—frequencies determined by bearing geometry and shaft speed.

  • Spectral Attention:

Let tt9 be sampling rate, h≪Lh\ll L0. Spectral attention is computed as

h≪Lh\ll L1

  • Fault Orders and Band Mask:

Classical bearing defect frequencies:

h≪Lh\ll L2

For each primary order h≪Lh\ll L3, side-bands, and windowing parameters, a Gaussian mixture h≪Lh\ll L4 masks the frequencies of interest.

Smoothed spectral and mask distributions h≪Lh\ll L5, h≪Lh\ll L6 yield a physics-based alignment penalty:

h≪Lh\ll L7

and a band-alignment score

h≪Lh\ll L8

quantifying the physics-grounded plausibility of the model’s attention.

3. EVT-Calibrated Reliability-Aware Decision Logic

PG-TMT translates raw anomaly scores into calibrated, reliability-guaranteed alarms using an EVT-based extremal modeling of healthy-score exceedances.

  • Peaks-Over-Threshold Extreme-Value Modeling:

On calibration segments, scores above a high quantile h≪Lh\ll L9 are modeled via the generalized Pareto distribution (GPD), kk0. Exceedance times approximate a Poisson process of rate kk1. The on-threshold kk2 meeting false alarm intensity kk3 is

kk4

with the limiting case kk5 yielding the logarithmic form.

  • Dual-Threshold Hysteresis and Hold Time:

To suppress spurious frame-level alarms, kk6, with minimal episode duration kk7 and merging of episodes separated by less than kk8. The resulting alarm logic produces episodes whose empirical rate kk9 tracks the prescribed δ\delta0, including under speed drift when δ\delta1 is RPM-adapted.

4. Experimental Design and Evaluation Protocols

Evaluation follows strict leakage-free, right-censored streaming protocols emphasizing reliable, domain-robust deployment.

  • Streaming Protocol:

Sliding windows of length δ\delta2, hop size δ\delta3, batch=1. A burn-in period δ\delta4 initializes state. Data splits are disjoint at machine, load, speed, and sensor level, with no window crossing of split boundaries. Per-channel normalization is trained only.

  • Timeliness and Right-Censoring:

Detection time is censored if no alarm occurs before run end. Timeliness δ\delta5 is computed using Kaplan–Meier estimators, reporting mean/median MTTD with confidence intervals.

  • Datasets:
    • CWRU bearing data (speeds, loads, rigs)
    • Paderborn University (seeded faults, speedδ\delta6torque, cross-rig)
    • XJTU-SY run-to-failure (chronological splits)
    • Industrial pilot (in-service rotating machinery)
  • Metrics:
    • Precision–Recall AUC (PR-AUC) under severe class imbalance
    • ROC AUC
    • Mean time-to-detect (MTTD) at matched δ\delta7
    • Alarm intensity (episodes/hour, hysteresis+merge logic)
    • Cross-domain transfer: AUC and MTTD retention and gain under directed shifts and few-shot adaptation, using

    δ\delta8

5. Key Results and Ablation Findings

  • Detection Performance:

    • PR-AUC approximately 0.96–0.94 and ROC AUC 0.99–0.97 across CWRU/Paderborn/XJTU-SY (graceful degradation to 0 dB SNR).
    • Mean MTTD δ\delta9 28–33 s (clean), increasing to 49–61 s at SNR = 0 dB, at 1×11\times10 events/hour.
    • Empirical false-alarm intensity 1×11\times11 remains within 1×11\times12 events/hour of target, stable under RPM drift.
  • Transfer Across Domains:
    • AUC retention 1×11\times13 0.95 for cross-load/speed; MTTD retention 1×11\times14 0.9; transfer across sensor/rig/dataset is robust.
    • Few-shot adaptation (1–5% labels) recovers nearly oracle performance.
  • Ablation and Latency:
    • Removing any encoder branch or physics prior degrades PR-AUC, increases FAR, or worsens MTTD.
    • Excluding EVT/hysteresis disrupts intensity matching and increases chatter.
    • Latency: median inference 1×11\times15 10 ms (p50), 1×11\times1612 ms (p90/p99) on CPU/Jetson; model size 0.8M parameters, 0.28 GFLOPs.

6. Significance, Applications, and Interpretation

PG-TMT combines physically aligned representation learning with calibrated, interpretable, and operationally robust early fault warnings for reliability-centric prognostics and health management. Its fusion of transient detection, slow-trend modeling, cross-channel resonance capture, and analytic attention-band alignment is directly interpretable in terms of vibrational fault physics. The EVT-calibrated, hysteretic alarm logic provides explicit guarantees on false-alarm rates and episode integrity under nonstationary and imbalanced conditions. Demonstrated performance across public benchmarks and real-world pilots, together with robustness to domain shifts and low-SNR conditions, establishes PG-TMT as a deployment-ready solution for industrial rotating machinery monitoring (Li et al., 29 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics-Guided Tiny-Mamba Transformer (PG-TMT).