Dual-Dynamic Tracking (UAV-Anti-UAV)

Updated 22 January 2026

Dual-Dynamic Tracking is a framework that addresses the challenge of real-time localization and pursuit when both the observer and target UAVs are dynamically maneuvering.
The approach incorporates transformer-based spatial-temporal modeling, evidential detection, and hybrid hardware strategies to manage occlusion, rapid motion, and distractor-rich environments.
Extensive datasets and benchmarks have been developed to assess tracking performance, highlighting current limitations and guiding future research in UAV anti-drone defense systems.

Dual-Dynamic Tracking (UAV-Anti-UAV) seeks to address the challenge of robust, real-time localization and pursuit of adversarial aerial targets when both the observing platform (pursuer UAV) and the target (evasive UAV) are dynamically maneuvering. Unlike classical ground-based anti-UAV settings or standard single-dynamic tracking, dual-dynamic tracking requires algorithms and systems to withstand complex, non-stationary backgrounds, abrupt viewpoint and scale changes, rapid motion, occlusion, and distractor-rich scenarios. Recent research formalizes the UAV-Anti-UAV problem, benchmarks its unique challenges via large-scale datasets, and proposes architectural innovations—for both learning-based and embedded hardware approaches—that jointly reason about temporal, spatial, and semantic context under adversarial aerial dynamics (Zhang et al., 8 Dec 2025, Zhu et al., 2023, Lu et al., 12 Dec 2025).

1. Formal Problem Definition and Unique Challenges

Let $\{I_t\}_{t=1}^T$ be a video sequence acquired from a pursuing UAV; let $\mathbf{z}_1 = [x_1, y_1, w_1, h_1]$ denote the initial ground-truth bounding box of the adversarial target UAV. The task is to estimate, for each $t \geq 2$ , a bounding box $\widehat{\mathbf{z}_t}$ that maintains tight spatial overlap (high IoU) with the ground truth as both observer and target undergo rapid, unpredictable 3D motion.

Distinguishing characteristics:

Dual Agent Dynamics: Both the sensing UAV and the adversarial UAV exhibit non-linear, unconstrained flight paths, inducing severe compounded ego-motion, multidimensional parallax, and scene contextual drift.
Small, Fast Targets: The target UAV typically occupies <1% of the frame in 30% of samples and exhibits extreme velocity (relative speed μ=0.79 image diagonals/frame).
Frequent Occlusions and Out-of-View: Full/partial occlusion, target reappearance, and viewpoint changes are common; re-detection with no fixed template frame is compulsory.
Clutter, Distractors, and Unaligned Modalities: Backgrounds vary from urban to rural, with similar distractors and multi-modal sensor streams (RGB, IR, language).
Real-Time, Resource-Constrained Requirements: Tracking must operate at ≥30fps, typically without per-frame template re-initialization or excessive compute.

Formally, tracking is cast as a spatio-temporal sequence estimation in which both input and target state distributions are non-stationary:

$\widehat{\mathbf{z}_t} = f_{\text{track}}(I_t, \mathbf{H}_{t-1}, L; w)$

where $\mathbf{H}_{t-1}$ is accumulated temporal memory (motion/appearance), $L$ is the sequence-level language prompt describing the scenario, and $w$ are the network parameters (Zhang et al., 8 Dec 2025).

2. Datasets and Task Benchmarking

1,810 video sequences (≈1.05 million frames; 9.85 hours)
RGB frames with contiguous bounding box annotations
One sequence-level language prompt per video
15 per-sequence/per-frame attributes: Camera Motion (CM), Fast Motion (FM), Small Object (SO), Similar Distractor (SD), Out-of-View (OV), Occlusion (PO/FO), Illumination Variation (IV), Scale/Aspect Ratio Variation (SV/ARV), Motion Blur (MB), etc.
High diversity: avg. sequence length 578 frames (max 17,740); macrostatistical skew toward fast, small, partially or fully occluded targets

600 thermal IR videos (640×512, 25fps, >723,000 annotated frames)
Per-frame challenge tags: Out-of-View, Occlusion, Fast Motion, Scale-Variation, etc.
No fixed appearance template; random target absence/reappearance

318 RGB/IR video pairs at 25fps; 585,900 boxes
Binary frame/sequence attributes; highly challenging for robust generalization

Benchmarking Metrics

IoU-based AUC (area under IoU-threshold curve) and mean accuracy (mACC)
Precision/Success: Center-error thresholds, area-under-curve
State Accuracy (SA) and “Acc” for dual-dynamic anti-UAV: combine IoU on present frames, “correct absence” credit, and penalize missed detections

3. Dual-Dynamic Tracking Architectures

Visual Backbone: Transformer-based model (HiViT) hierarchically encodes both template and search regions
Temporal Modeling: Discrete linear time-invariant state-space model (SSM) (Mamba) evolves a hidden state vector:

$\mathbf{h}_t = A \mathbf{h}_{t-1} + B \mathbf{u}_t, \qquad \mathbf{y}_t = C \mathbf{h}_t + D \mathbf{v}_t$

Language Integration: Mamba-encoded representations from language prompts propagate scenario-level semantic priors through SSM scanning
Temporal Token Propagation: Past search features serialized via unidirectional scan inform per-frame context in an autoregressive manner
Multi-Head Tracking Head: Outputs class, offset, and size branches with a composite loss (focal classification, $L_1$ , and generalized IoU)
No online fine-tuning in inference; purely “in-distribution” tracking

Detection-Tracking Alternation: YOLOv5s global detector runs on every frame; on positive hypothesis, a local transformer tracker is initialized
Tracker: Siamese-style backbone with Relevance Decoupling Modules (RDM); cross- and self-attention fuse instance and appearance context
Uncertainty Quantification (“Evidential Head”): After tracking, evidence is pooled; Dirichlet-distributed probabilities compute per-frame target/bkg likelihood and global uncertainty $u = K/S$
Evidential Loop: Tracking continues if confidence $\hat{p}_\text{target}\ge0.5$ and $u\leq\theta_\text{eh}$ ; else revert to detection
Dempster–Shafer Belief Formalism: Detection and tracking operate as independent evidence sources, enabling robust recovery under drift, occlusion, and re-entry

Frame-Based and Event-Driven Modes: Adaptive switching on Region Proposal (RP) area and velocity; frame mode for periodic broad detection, event mode for low-latency fast targets.
Fast Object Tracking Unit (FOTU): Parallelized trajectory monitors for high-velocity targets; enforces bounding-box update adaptivity via

$\text{TH} = b + W_a \cdot \text{Area} + W_s \cdot \text{Speed}$

Neural Processing Unit (NPU): Custom 16×16 PE array, zero-skipping MACs, dual support for image patch and trajectory inference

4. Algorithmic and System Comparisons

Architecture	Modalities	Tracking Model	AUC / mAcc	Notable Strengths
MambaSTS (Zhang et al., 8 Dec 2025)	RGB + language	Transformer + SSM + prompt	AUC=0.437, mAcc=0.443	Long-sequence modeling, semantic context
EDTC (Zhu et al., 2023)	Thermal IR	YOLOv5s + transformer	Acc=0.486 (AntiUAV600)	Uncertainty-based switching, re-detection
Hybrid Hardware (Lu et al., 12 Dec 2025)	AER events + grayscale	Frame/event-adaptive tracking	Prec@IoU≥0.5=94.8%	Ultra low-power, high-speed, energy efficiency

MambaSTS delivers +6.6 percentage point mean accuracy gain on UAV-Anti-UAV over the next best baseline on dual-dynamic tracking, but even the best trackers only reach ≈44% AUC, highlighting the severity of compounded motion and complex scene factors (Zhang et al., 8 Dec 2025). EDTC demonstrates state-of-the-art performance on absence/reappearance sequences by integrating confidence-driven mode switching; ablations reveal the criticality of evidential heads for robust adaptation (Zhu et al., 2023). Hybrid hardware approaches achieve Pareto-optimal energy/tracking tradeoffs necessary for embedded UAV platforms and extremely fast-moving targets (Lu et al., 12 Dec 2025).

5. Semantic and Multimodal Consistency Mechanisms

Recent advances utilize semantic “flows” and multi-level modulation to mitigate appearance shift and distractor confusion:

Dual-Flow Semantic Consistency (DFSC) (Jiang et al., 2021): Class-level modulation enforces inter-sequence UAV-category consistency (via cross-sequence ROI features), while instance-level modulation sharpens same-sequence discrimination. Composite loss combines RPN and Fast-RCNN objectives.
Multi-Modal Sensing: IR, RGB, language, and—prospectively—radar/LiDAR features are integrated as independent evidence sources, consistent with Dempster–Shafer fusion (Zhu et al., 2023, Zhang et al., 8 Dec 2025).
Dynamic Radar Networks (Guerra et al., 2020): Teams of UAV-based active sensors fuse range, bearing, Doppler, and velocity information via decentralized EKFs, optimizing 3D formation to maximize D-optimality of tracking information, and achieving sub-meter accuracy in environments intractable to fixed ground radars.

6. Performance Evaluations and Limitations

Quantitative performance in dual-dynamic scenarios remains limited:

MambaSTS (UAV-Anti-UAV test set): AUC=0.437; average mACC over 50 modern deep tracking algorithms is 0.272 (Zhang et al., 8 Dec 2025).
EDTC (AntiUAV600 test set): Acc=0.486; YOLOv5s-only detection yields Acc≈0.392; SOTA trackers without evidential switching range Acc∈0.14…0.33.
Hardware Hybrid System: 94.8% average tracking precision at 0.096 nJ/frame/pixel, up to 400 m and 80 px/s speed (Lu et al., 12 Dec 2025).

Limitations are pronounced under full occlusion, extreme illumination variation, or repeated out-of-view transitions; no current method can reliably re-detect under these compounded conditions. Multi-object and multi-modal extensions remain underexplored (Zhang et al., 8 Dec 2025, Zhu et al., 2023).

7. Open Research Directions

Challenges explicitly identified include:

Multi-sensor and multi-object extensions: True fusion of RGB, IR, RF, and language cues for drone swarm tracking or coordinated defense
Online adaptation and domain transfer: Rapid model updating to accommodate adversarial maneuvers and unseen environments
Nonlinear and bidirectional state-space modeling: Richer representations to encode context spanning extended temporal windows and abrupt regime changes
Hardware-efficiency: End-to-end quantization, energy-matched model design for embedded platforms
Advanced attention and re-detection: Learned mechanisms to recover from drift, occlusion, and out-of-view via dynamic attention or memory-based detection
Cooperative active sensing: Autonomous reconfiguration of sensor geometry (e.g., via distributed D-optimality) for maximally informative 3D tracking

The UAV-Anti-UAV framework and its associated datasets, architectural baselines, and metrics now form the empirical foundation for the next generation of robust aerial anti-UAV perception and pursuit systems, both in civilian safety and defense contexts (Zhang et al., 8 Dec 2025, Zhu et al., 2023, Lu et al., 12 Dec 2025, Jiang et al., 2021, Guerra et al., 2020).

Markdown Report Issue Upgrade to Chat

References (5)

How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline (2025)

Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System (2023)

A 96pJ/Frame/Pixel and 61pJ/Event Anti-UAV System with Hybrid Object Tracking Modes (2025)

Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking (2021)

Dynamic Radar Network of UAVs: A Joint Navigation and Tracking Approach (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual-Dynamic Tracking (UAV-Anti-UAV).

Dual-Dynamic Tracking (UAV-Anti-UAV)

1. Formal Problem Definition and Unique Challenges

2. Datasets and Task Benchmarking

UAV-Anti-UAV Dataset (Zhang et al., 8 Dec 2025)

AntiUAV600 (Zhu et al., 2023)

Anti-UAV (Jiang et al., 2021)

Benchmarking Metrics

3. Dual-Dynamic Tracking Architectures

Spatial-Temporal-Semantic Modeling (MambaSTS) (Zhang et al., 8 Dec 2025)

Evidential Detection and Tracking Collaboration (EDTC) (Zhu et al., 2023)

Hybrid Embedded/Hardware Tracking (Lu et al., 12 Dec 2025)

4. Algorithmic and System Comparisons

5. Semantic and Multimodal Consistency Mechanisms

6. Performance Evaluations and Limitations

7. Open Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Dual-Dynamic Tracking (UAV-Anti-UAV)

1. Formal Problem Definition and Unique Challenges

2. Datasets and Task Benchmarking

UAV-Anti-UAV Dataset (Zhang et al., 8 Dec 2025)

AntiUAV600 (Zhu et al., 2023)

Anti-UAV (Jiang et al., 2021)

Benchmarking Metrics

3. Dual-Dynamic Tracking Architectures

Spatial-Temporal-Semantic Modeling (MambaSTS) (Zhang et al., 8 Dec 2025)

Evidential Detection and Tracking Collaboration (EDTC) (Zhu et al., 2023)

Hybrid Embedded/Hardware Tracking (Lu et al., 12 Dec 2025)

4. Algorithmic and System Comparisons

5. Semantic and Multimodal Consistency Mechanisms

6. Performance Evaluations and Limitations

7. Open Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics