Trajectory-Aware State Block (TASB)
- TASB is a neural network module that models dynamic propagation of target-induced feature perturbations along spatial trajectories for infrared small target detection.
- It leverages velocity-constrained diffusion and semantic fusion to integrate local geometric cues with global context, enhancing target localization in cluttered backgrounds.
- Empirical results show that incorporating TASB in TAPM-Net significantly improves IoU and reduces false alarms compared to conventional detection methods.
The Trajectory-Aware State Block (TASB) is a neural network module central to the TAPM-Net architecture for infrared small target detection (ISTD). Its core function is to model the dynamic propagation of target-induced feature perturbations along spatial trajectories extracted from feature maps. Unlike conventional attention mechanisms, TASB enables anisotropic, context-sensitive state transitions guided explicitly by the underlying geometry of local disturbances, while maintaining computational efficiency and global coherence. TASB is situated downstream of the Perturbation-guided Path Module (PGM), consuming gradient-following feature trajectories and integrating context from multiple semantic levels for enhanced discrimination between target signals and structured background noise (Xie et al., 9 Jan 2026).
1. Motivation and Conceptual Foundations
The need for TASB arises from deficiencies in standard CNN- and Transformer-based approaches, which lack the capacity to express how small, low-contrast targets generate spatially directional, layer-wise feature perturbations. Small IR targets, often occupying only a few pixels, are easily confounded with background texture when modeled by isotropic, saliency-oriented methods. Physically, these targets act as point sources, triggering diffusion-like, directionally structured energy flows in feature space. By explicitly tracing these flows and modeling their evolution, the network captures essential localization and context cues that are contrast-agnostic and robust to clutter.
TASB is designed as a Mamba-based state-space unit, leveraging recent advances in efficient sequential modeling to operate over variable-length, trajectory-based input feature sequences. It supports both velocity-constrained diffusion (i.e., propagation regulated by the local feature gradient’s intensity) and semantically aligned fusion from distinct embedding levels (e.g., word- and sentence-level context).
2. Architectural Placement and Data Flow
Within TAPM-Net, the workflow proceeds as follows:
- At each encoder stage , a stack of Mamba blocks produces a feature map .
- The PGM extracts a scalar energy map and generates sets of trajectory coordinates via gradient-following on .
- Along each trajectory, channel-wise feature vectors are bilinearly interpolated to produce input sequences .
- TASB receives , dynamically models state evolution along each spatial path, and produces a set of enhanced state outputs .
- Each is projected back to its corresponding spatial location (splatting), aggregated over all visited coordinates, and fused (with learnable scalar ) into the enhanced feature map .
- is added residually to , forming for the next stage or skip-connection.
This design allows the network to exploit both local geometric propagation cues (from the trajectories) and global semantic context, while enforcing trajectory-specific state consistency.
3. Mathematical Formulation
Formally, each trajectory path comprises an ordered set of spatial points , with feature tokens sampled as
where interpolation is bilinear over . Each sequence of length is processed by TASB—modeled as a Mamba-based stateful unit—producing state outputs reflecting context-sensitive, velocity-modulated propagation:
Outputs are "splatted" back onto a zero-initialized map at coordinates (using accumulation and averaging when multiple trajectories visit the same spatial point). Final fusion is expressed as:
with a learned weighting parameter.
4. Velocity-Constrained Diffusion and Semantic Fusion
A defining feature of TASB is its explicit modeling of velocity-constrained diffusion along the feature trajectory. Propagation is guided by the local gradient magnitude of the energy field, which encodes the diffusion strength at each step. This ensures that state updates are aligned with the physical intuition of disturbance propagation—stronger dynamics near target origin points, more diffusive behavior as signals dissipate.
Additionally, TASB incorporates semantic fusion by conditioning state updates on both word-level (local) and sentence-level (contextual/global) embeddings derived from multi-scale features. This two-level embedding structure enables TASB to reconcile fine-grained, path-local evidence with broader scene context, enhancing discrimination between authentic targets and structured noise.
5. Distinction from Conventional Attention and Computational Considerations
While conventional attention modules compute global, context-agnostic token interactions, TASB restricts information flow to the precomputed, anisotropic spatial trajectories. This reduces computational burden and focuses modeling capacity on physically plausible paths of perturbation propagation. For trajectories of maximum length , TASB operates in time per stage, with and both constrained in practice (e.g., top few maxima per map, ). Empirically, the end-to-end overhead of PGM+TASB is under 5% of total encoder runtime on modern GPUs (Xie et al., 9 Jan 2026).
6. Empirical Impact and Ablation Study
Experimental evidence supports the efficacy of TASB in conjunction with PGM. On NUAA-SIRST, integrating PGM alone into a plain U-Net raises IoU from 63.12% to 72.45%. With energy-map and gradient-path ablations, performance drops to 50.44% and 74.78% (respectively), confirming the importance of explicit trajectory-aware modeling. Full TAPM-Net, with both PGM and TASB, yields an IoU of 81.94% and false alarm rates below 2% (Tables 1, 2, and 3 in (Xie et al., 9 Jan 2026)), establishing a new state-of-the-art for ISTD.
7. Supervision, Hyper-parameters, and Loss Integration
TASB’s outputs contribute both to the main segmentation head and, via PGM, to an auxiliary perturbation response map . This response map is supervised using binary cross-entropy loss against the ground-truth, encouraging concentration of trajectory activity in true target regions. The overall training loss is:
where balances main and auxiliary objectives. Key hyper-parameters for TASB integration include the trajectory step size (typically in [0.5, 2.0]), maximum trajectory length (8–16), non-maximum suppression window for seed selection, energy thresholds for path truncation, the fusion weight (initialized to 0.5 and learned), and the loss scale .
For further technical details, implementation, and evaluation benchmarks, see "TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection" (Xie et al., 9 Jan 2026).