Physics-Guided Tiny-Mamba Transformer

Updated 5 February 2026

The paper introduces PG-TMT, a compact tri-branch encoder that integrates physics-guided spectral mapping and EVT-calibrated thresholds to enhance early fault detection in rotating machinery.
It fuses depthwise convolution, state-space modeling, and local transformer attention to capture micro-transients, long-range dynamics, and cross-channel resonances with high precision.
Experimental evaluations show robust PR-AUC and ROC AUC, low latency, and reliable transfer across domains under severe nonstationary conditions and class imbalances.

The Physics-Guided Tiny-Mamba Transformer (PG-TMT) is a compact, tri-branch encoder architecture designed for reliability-aware early fault warning in rotating machinery under nonstationary conditions, domain shifts, and severe class imbalance. PG-TMT integrates physically guided priors—explicit temporal-to-spectral mappings aligned with mechanical defect frequencies—into a fusion of depthwise-separable convolution, state-space modeling, and attention-based resonance capture. Decision reliability is ensured through extreme-value theory (EVT) calibrated thresholds and hysteretic alarm logic. Evaluation across public and industrial datasets demonstrates competitive precision-recall metrics, timeliness, robust transfer, and deployment feasibility (Li et al., 29 Jan 2026).

1. Tri-Branch Encoder Architecture

PG-TMT processes online windows of multichannel vibration signals, $\mathbf{x}_t\in\mathbb{R}^{C\times L}$ , to produce a calibrated anomaly score $s_t\in[0,1]$ at each time $t$ (hop $h\ll L$ , batch-size 1). The encoder is organized into three complementary branches:

Depthwise-Separable Convolutional Stem (Micro-Transients):

A cascade of causal 1D depthwise convolutions (kernel size $k$ , optional dilation $\delta$ ) is followed by per-channel pointwise ( $1\times1$ ) convolutions. At each layer $\ell$ , for input $\mathbf{z}^{(\ell)}\in\mathbb{R}^{C\times L}$ ,

$\tilde{\mathbf{z}}^{(\ell)}_{c,*} = \mathrm{Conv1D}\bigl(\mathbf{z}^{(\ell)}_{c,*};\,k,\delta\bigr),\quad \mathbf{z}^{(\ell+1)}_{f,*} = \sum_{c=1}^C w^{(\ell)}_{f,c}\,\tilde{\mathbf{z}}^{(\ell)}_{c,*}.$

The receptive field, $s_t\in[0,1]$ 0, is tuned for sub-millisecond impact-like transients. Output: $s_t\in[0,1]$ 1.

Tiny-Mamba State-Space Branch (Long-Range Dynamics):

A gated, linear state-space model captures near-linear degradation over hundreds or thousands of timesteps:

$s_t\in[0,1]$ 2

Here, $s_t\in[0,1]$ 3 is a channel-reduced input, $s_t\in[0,1]$ 4 is the latent state, and $s_t\in[0,1]$ 5 are learned gates. Stability is enforced via $s_t\in[0,1]$ 6 with discretization:

$s_t\in[0,1]$ 7

ensuring $s_t\in[0,1]$ 8. Output: $s_t\in[0,1]$ 9.

Local Transformer (Cross-Channel Resonances):

Self-attention is restricted to a causal window $t$ 0 for each head $t$ 1:

$t$ 2

producing $t$ 3 for cross-channel resonance encoding.

Branch outputs are concatenated, $t$ 4, and fused by a gated residual:

$t$ 5

A local attention distribution $t$ 6, Jensen–Shannon discrepancy term, and a final score $t$ 7 (with $t$ 8 incorporating evidence and discrepancy) complete the inference pipeline.

2. Physically Guided Temporal–Spectral Alignment

PG-TMT imposes explicit temporal-to-spectral mapping by analytically connecting learned temporal attention to classical fault-order bands—frequencies determined by bearing geometry and shaft speed.

Spectral Attention:

Let $t$ 9 be sampling rate, $h\ll L$ 0. Spectral attention is computed as

$h\ll L$ 1

Fault Orders and Band Mask:

Classical bearing defect frequencies:

$h\ll L$ 2

For each primary order $h\ll L$ 3, side-bands, and windowing parameters, a Gaussian mixture $h\ll L$ 4 masks the frequencies of interest.

Alignment Loss and Band-Alignment Score:

Smoothed spectral and mask distributions $h\ll L$ 5, $h\ll L$ 6 yield a physics-based alignment penalty:

$h\ll L$ 7

and a band-alignment score

$h\ll L$ 8

quantifying the physics-grounded plausibility of the model’s attention.

3. EVT-Calibrated Reliability-Aware Decision Logic

PG-TMT translates raw anomaly scores into calibrated, reliability-guaranteed alarms using an EVT-based extremal modeling of healthy-score exceedances.

Peaks-Over-Threshold Extreme-Value Modeling:

On calibration segments, scores above a high quantile $h\ll L$ 9 are modeled via the generalized Pareto distribution (GPD), $k$ 0. Exceedance times approximate a Poisson process of rate $k$ 1. The on-threshold $k$ 2 meeting false alarm intensity $k$ 3 is

$k$ 4

with the limiting case $k$ 5 yielding the logarithmic form.

Dual-Threshold Hysteresis and Hold Time:

To suppress spurious frame-level alarms, $k$ 6, with minimal episode duration $k$ 7 and merging of episodes separated by less than $k$ 8. The resulting alarm logic produces episodes whose empirical rate $k$ 9 tracks the prescribed $\delta$ 0, including under speed drift when $\delta$ 1 is RPM-adapted.

4. Experimental Design and Evaluation Protocols

Evaluation follows strict leakage-free, right-censored streaming protocols emphasizing reliable, domain-robust deployment.

Streaming Protocol:

Sliding windows of length $\delta$ 2, hop size $\delta$ 3, batch=1. A burn-in period $\delta$ 4 initializes state. Data splits are disjoint at machine, load, speed, and sensor level, with no window crossing of split boundaries. Per-channel normalization is trained only.

Timeliness and Right-Censoring:

Detection time is censored if no alarm occurs before run end. Timeliness $\delta$ 5 is computed using Kaplan–Meier estimators, reporting mean/median MTTD with confidence intervals.

Datasets:
- CWRU bearing data (speeds, loads, rigs)
- Paderborn University (seeded faults, speed $\delta$ 6torque, cross-rig)
- XJTU-SY run-to-failure (chronological splits)
- Industrial pilot (in-service rotating machinery)
Metrics:
- Precision–Recall AUC (PR-AUC) under severe class imbalance
- ROC AUC
- Mean time-to-detect (MTTD) at matched $\delta$ 7
- Alarm intensity (episodes/hour, hysteresis+merge logic)
- Cross-domain transfer: AUC and MTTD retention and gain under directed shifts and few-shot adaptation, using
$\delta$ 8

5. Key Results and Ablation Findings

Detection Performance:
- PR-AUC approximately 0.96–0.94 and ROC AUC 0.99–0.97 across CWRU/Paderborn/XJTU-SY (graceful degradation to 0 dB SNR).
- Mean MTTD $\delta$ 9 28–33 s (clean), increasing to 49–61 s at SNR = 0 dB, at $1\times1$ 0 events/hour.
- Empirical false-alarm intensity $1\times1$ 1 remains within $1\times1$ 2 events/hour of target, stable under RPM drift.
Transfer Across Domains:
- AUC retention $1\times1$ 3 0.95 for cross-load/speed; MTTD retention $1\times1$ 4 0.9; transfer across sensor/rig/dataset is robust.
- Few-shot adaptation (1–5% labels) recovers nearly oracle performance.
Ablation and Latency:
- Removing any encoder branch or physics prior degrades PR-AUC, increases FAR, or worsens MTTD.
- Excluding EVT/hysteresis disrupts intensity matching and increases chatter.
- Latency: median inference $1\times1$ 5 10 ms (p50), $1\times1$ 612 ms (p90/p99) on CPU/Jetson; model size 0.8M parameters, 0.28 GFLOPs.

6. Significance, Applications, and Interpretation

PG-TMT combines physically aligned representation learning with calibrated, interpretable, and operationally robust early fault warnings for reliability-centric prognostics and health management. Its fusion of transient detection, slow-trend modeling, cross-channel resonance capture, and analytic attention-band alignment is directly interpretable in terms of vibrational fault physics. The EVT-calibrated, hysteretic alarm logic provides explicit guarantees on false-alarm rates and episode integrity under nonstationary and imbalanced conditions. Demonstrated performance across public benchmarks and real-world pilots, together with robustness to domain shifts and low-SNR conditions, establishes PG-TMT as a deployment-ready solution for industrial rotating machinery monitoring (Li et al., 29 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Physics-Guided Tiny-Mamba Transformer for Reliability-Aware Early Fault Warning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics-Guided Tiny-Mamba Transformer (PG-TMT).