Discrete Dynamic Graph Neural Network (DDGNN)

Updated 17 December 2025

Discrete Dynamic Graph Neural Network (DDGNN) is an architecture that models sparse, variable-sized radar point clouds using star-graph representations for accurate human activity recognition.
It employs multi-layer Weisfeiler–Lehman GraphConv for spatial encoding and a Bi-LSTM for capturing temporal dependencies without resampling or zero-padding.
Empirical results demonstrate that DDGNN outperforms conventional point cloud networks on embedded hardware through efficient, real-time inference and high accuracy.

A Discrete Dynamic Graph Neural Network (DDGNN) is an architecture developed for extracting spatial–temporal representations from sparse and variable-size point cloud data, with particular emphasis on millimeter-wave (mmWave) radar-based human activity recognition (HAR) in privacy-sensitive scenarios. It addresses the limitations of conventional vision-based approaches and standard point cloud networks when applied to the unique sparsity, variable cardinality, and noise characteristics of radar-generated data. DDGNN systematically encodes per-frame geometric relations via dynamic star-graph representations and couples this with sequential modeling for temporal context, providing significant performance improvements over prior methods (Gao et al., 12 Dec 2025).

1. Problem Setting and Motivations

mmWave radar-based HAR represents each frame as a sparse point set in $\mathbb{R}^3$ ( $N_t$ points per frame, typically $10\leq N_t\leq50$ after preprocessing), where $N_t$ varies dynamically. Standard deep networks, such as those used for dense vision point clouds, struggle due to extreme input sparsity and frame-to-frame variability, often requiring nontrivial point selection, zero-padding, or frame aggregation (Cui et al., 2023, Tunau et al., 14 Aug 2025).

DDGNN emerged as a solution to:

Accommodate variable-size, sparse point clouds without resampling or padding.
Maintain spatial geometric structure despite low point density.
Capture inter-frame temporal dependencies for nuanced activity discrimination.
Remain lightweight enough for real-time inference on resource-constrained platforms, such as Raspberry Pi-class ARM CPUs.

2. Star-Graph Representation of Sparse Point Clouds

Instead of building fully connected or $k$ -NN graphs—which result in poorly defined or degenerate edge structures for sparse $N_t$ —DDGNN introduces a per-frame star graph. Each frame $t$ is represented as:

$V_t = \{c\} \cup \{p_1^t, \dots, p_{N_t}^t\}$

where $c$ is a static "center point" (e.g., $(0,1,0)$ ). Directed edges $e_{c \to i}$ connect the center to each radar-detected point. The adjacency matrix $A_t$ for each frame is thus:

$A_t(i, j) = \begin{cases} 1 & \text{if } i=c,\, j \neq c \ 0 & \text{otherwise} \end{cases}$

This construction ensures:

Each frame forms a valid, non-empty graph for any $N_t>0$ .
Geometric encoding is consistent and robust to missing or ghost points, focusing on body-part relations relative to a canonical center.
It avoids the need for explicit inter-frame edges; temporal dependencies are later modeled by sequential networks (Gao et al., 12 Dec 2025).

Empirical ablations confirmed that directed star-graphs outperform $k$ -NN, radius-based, or fully connected graphs in the radar domain, with undirected variants exhibiting significant drops in accuracy.

3. DDGNN Architecture and Processing Pipeline

The DDGNN pipeline integrates three components: point cloud preprocessing, spatial graph encoding, and temporal sequence modeling.

a) Preprocessing

Axis-range thresholding removes ghost and out-of-range points.
DBSCAN (with $\epsilon$ and minPts set for maximal denoising in the radar domain) retains only the largest non-noise cluster per frame, producing the filtered set $P_n\subset \mathbb{R}^3$ .

b) Spatial Encoding via GraphConv

Node features: Each vertex $v$ (center or radar point) receives an initial embedding: $x_v^t \in \mathbb{R}^3$ (coordinates) or $c_{\text{coord}}$ (for $c$ ).
A shared fully connected layer projects all $x_v^t$ into a $d$ -dimensional hidden space.
Spatial structure is encoded by two layers of Weisfeiler-Lehman-style GraphConv, yielding $H^{(2), t}_v \in \mathbb{R}^{F_{\text{out}}}$ .
Global mean pooling over all nodes collapses the variable-size per-frame graph to a fixed-length vector $V_t \in \mathbb{R}^{F_{\text{out}}}$ .

c) Temporal Modeling

$\{V_t\}_{t=1}^N$ (sequence of per-frame encodings) are input to a 2-layer Bi-LSTM (hidden size $H$ ), which produces a video-level embedding $V_{\text{temp}}$ .
A final linear layer plus softmax outputs activity class probabilities.

This approach allows the model to process arbitrarily long and variably-sized frame sequences, directly modeling temporal dynamics while remaining robust to insertions or deletions of detected points at the spatial level (Gao et al., 12 Dec 2025).

4. Comparative Performance and Ablation Studies

Comprehensive experiments demonstrated that star graph + DDGNN achieves significant gains:

Test accuracy: 94.27% (star graph, 2-layer GCN + 2-layer Bi-LSTM), approaching the 97.25% upper bound of vision-based skeletons.
Inference time on Raspberry Pi 4: 25.8 ms (graph construction) + 137.6 ms (2-layer GCN + BiLSTM), sustaining ~6 fps with 93.99% accuracy.
Baseline point cloud networks (e.g., PointNet++, PointMLP, DGCNN, PointTransformer), when appended with comparable LSTM heads or operated in spatial-only modes, achieved at best 93.73% (PointMLP+LSTM), with substantially higher memory or compute demand.

Ablations clarified architectural choices:

Ablation Variant	Accuracy (%)
FC only (no GCN)	83.27
1-layer GCN	89.02
2-layer GCN	94.27
No LSTM (spatial only)	66.54
1-layer LSTM	81.18
2-layer Bi-LSTM	94.27
KNN-graph+DDGNN	90.76
Radius-graph+DDGNN	86.92
Empty-graph/no edges	88.03
Fully-connected-graph	56.00

Full results showed specific advantage for star-graph DDGNN on subtle or ambiguous motions (e.g., horizontal-arm swings, mixed limb activities), scenarios in which $k$ -NN or non-graph approaches degraded.

5. Implementation and Application on Resource-Constrained Hardware

DDGNN was validated both in simulation and on embedded platforms:

Deployed on Raspberry Pi 4 (no GPU) using batch size 1, real-time inference achieved (≤ 163.4 ms/frame, 93.99% accuracy).
Outperforming conventional skeleton extraction or graph construction overhead, and with a smaller memory footprint than competitive LSTM or 3D-CNN pipelines (Gao et al., 12 Dec 2025).
No requirement for resampling, zero-padding, or frame aggregation, providing architectural simplicity and efficiency critical for edge computing HAR systems.

6. Relation to Broader Radar HAR Architectures

While alternative radar HAR frameworks employ voxelization plus 2D/3D CNNs (Yan et al., 12 Nov 2025), lightweight PointNet-based backbones (Gu et al., 2024), or explicit multi-person tracking and domain adaptation pipelines (Alam et al., 2021), DDGNN's combination of star-graph construction and dynamic spatial-temporal graph learning is unique in directly accommodating data sparsity and variable cardinality. In particular:

OG-PCL (Yan et al., 12 Nov 2025) relies on occupancy-gated 2D CNN branches feeding into Bi-LSTM, with carefully designed compensation mechanisms for voxel sparsity, but does not model spatial relationships at the graph level.
RobHAR (Gu et al., 2024) employs a modified PointNet backbone with BiLiLSTM and HMM-CTC for transition smoothing, but addresses variable point counts via global pooling rather than graph relational modeling.
Multi-inhabitant systems such as PALMAR (Alam et al., 2021) utilize cluster-tracking and voxel-based CNNs with HMM tracking, relying on more extensive signal-processing but without the graph-based spatial abstraction.

A plausible implication is that DDGNN's star-graph strategy is especially suited for single-actor, sparse, near real-time HAR, where skeletal annotation is unavailable or unreliable, while its modularity may inform future extensions to multi-person or collaborative sensing.

7. Key Insights and Outlook

DDGNN achieves robust and interpretable spatial–temporal representation from highly sparse radar point clouds by:

Embedding geometric relations as a directed star from a canonical center to radar returns, decoupling performance from point cloud cardinality.
Using multi-layer Weisfeiler–Lehman GraphConv for effective spatial feature propagation.
Applying temporal sequence models (Bi-LSTM) for cross-frame context without requiring explicit inter-frame graph construction.

Empirical results validate these design choices, with practical advantages in both accuracy and computational efficiency for embedded HAR systems (Gao et al., 12 Dec 2025). Future directions might include extension to multiple actor scenarios via per-cluster star-graph construction, integration with more advanced temporal models, or fusion with complementary modalities for improved robustness under adverse radar conditions.