mmWave Radar Point Cloud HAR System
- mmWave radar point cloud HAR systems are defined by generating sparse 3D point clouds via FMCW radar and sophisticated signal processing to infer human movement.
- They employ voxel, point set, and graph representations combined with temporal deep learning models to overcome sparsity, noise, and variable sampling challenges.
- State-of-the-art methods integrate clustering, tracking, and adaptive temporal aggregation to achieve real-time, privacy-preserving human activity recognition on edge devices.
A millimeter-wave (mmWave) radar point cloud-based human activity recognition (HAR) system utilizes mmWave radar sensors to generate sparse 3D point clouds of the environment, from which algorithms infer human movement and classify activities. These systems are distinguished by their privacy-preserving properties, resilience to ambient lighting, and robustness in the absence of visual cues. Unlike vision-based HAR, mmWave radar HAR must overcome challenges of severe point sparsity, noise, variable sampling, and often limited annotated data. Recent advances span voxel-based CNNs with sparsity-aware operators, point-based neural architectures, clustering and tracking for multi-inhabitant settings, and graph-based modeling, each addressing key obstacles unique to radar point clouds.
1. Signal Processing and Point Cloud Generation
mmWave radar HAR systems typically leverage frequency-modulated continuous-wave (FMCW) radar chips such as TI IWR1443BOOST or IWR6843ISK, operating in the 60–81 GHz band with multi-antenna virtual MIMO arrays (Yan et al., 12 Nov 2025, Gao et al., 12 Dec 2025, Gu et al., 2024, Cui et al., 2023). The raw analog-to-digital converter (ADC) data undergoes:
- Range FFT to extract distance (range) profiles;
- Doppler FFT for per-chirp velocity estimation;
- Angle-of-arrival (AoA) estimation via digital beamforming yields azimuth and elevation;
- CFAR detection and clutter removal to isolate meaningful reflections (yielding 3D (x,y,z) points and intensity/velocity v).
A canonical formulation is
with range , azimuth , elevation , Doppler shift , and wavelength (Yan et al., 12 Nov 2025). Output per frame is a variable-size set of points with cardinality –$100$ depending on setting.
Preprocessing may include noise filtering (CFAR thresholding, explicit out-of-range removal), segmentation, (optional) Doppler-based clutter rejection, and down- or up-sampling to fixed cardinality for deep models (Gao et al., 12 Dec 2025, Cui et al., 2023, Gu et al., 2024).
2. Representation Strategies: Voxel, Point Set, and Graph
mmWave radar point clouds are characteristically sparse and vary in size/structure, requiring tailored representations.
Voxelization discretizes 3D space into a fixed grid, e.g., , with occupancy counts per cell (Yan et al., 12 Nov 2025) or small feature vectors (count, depth, “AXOR” signatures) per voxel (Alam et al., 2021). Temporal stacking over frames forms , supporting convolutions but potentially introducing sparsity and loss of fine spatial detail.
Point set-based methods (e.g., PointNet, PointNet++, DGCNN) directly process raw coordinates, often after zero-padding or random sampling to fixed-length (Cui et al., 2023, Gu et al., 2024). The “Light-PointNet” (LPN) variant applies a shared MLP and pooling to extract global embeddings for each frame (Gu et al., 2024).
Graph-based encodings bypass voxelization and resampling. The star-graph approach augments each frame with a static center and links all points to this center, forming a variable-size graph per frame; these are processed by discrete dynamic GNNs (DDGNN) with temporal heads (Gao et al., 12 Dec 2025). This captures spatial relationships naturally under high sparsity and avoids introducing artificial structure.
3. Core Algorithms: Clustering, Association, and Deep Temporal Models
Clustering and Tracking
Multi-person scenarios require robust cluster formation from sparse, overlapping point clouds. DBSCAN is commonly used due to its ability to identify arbitrarily shaped clusters without requiring (Tunau et al., 14 Aug 2025, Alam et al., 2021, Gao et al., 12 Dec 2025). A modified metric de-emphasizes elevation:
(Alam et al., 2021, Tunau et al., 14 Aug 2025). Two-stage clustering (e.g., DBSCAN + BIRCH) further improves cluster stability for group settings (Alam et al., 2021).
Assignment of clusters to tracks over time uses the Hungarian algorithm for bipartite assignment based on centroid proximity (Tunau et al., 14 Aug 2025) or temporal association via Adaptive Order HMM (AO-HMM), which dynamically adapts order based on current inter-cluster proximity (Alam et al., 2021). Kalman filtering is sometimes applied for smoothing, but over-smoothing can damage discriminative shape features critical for contemporary DL architectures (Tunau et al., 14 Aug 2025).
Temporal and Spatial-Temporal Deep Learning
Leading systems combine spatial feature encoding with explicit temporal modeling:
- Tri-view CNNs applied to projected voxel grids from top, front, and side, as in OG-PCL, utilize sparsity-aware convolutions (OGConv), batch normalization, and global pooling. Subsequent temporal modeling employs Bi-LSTM to process the sequence of frame-level feature embeddings, supporting efficient and accurate activity recognition (Yan et al., 12 Nov 2025).
- Point-based models (e.g., LPN-BiLiLSTM in RobHAR) apply an LPN spatial backbone with a bidirectional LiteLSTM temporal head (Gu et al., 2024).
- Graph-based pipelines build frame-wise star-graphs or -NN graphs and extract spatial representations via 2-layer GCN (with sigmoid activation), which are processed by bidirectional LSTM for temporal structure (Gao et al., 12 Dec 2025).
For continuous HAR and sequence segmentation, transition post-processing via HMMs and Connectionist Temporal Classification (CTC) further boost temporal regularity and robustness (Gu et al., 2024).
4. System Performance and Evaluation
Benchmark datasets include RadHAR, MiliPoint, MMActivity, and custom multi-person sets (Yan et al., 12 Nov 2025, Cui et al., 2023, Gu et al., 2024, Alam et al., 2021). Typical metrics are overall/class-wise classification accuracy, macro precision/recall/F1, and mean localization error for keypoint estimation (Yan et al., 12 Nov 2025, Cui et al., 2023, Gu et al., 2024).
Table: Representative Results from Recent Systems
| System | Modality | Accuracy (%) | Key Technical Features |
|---|---|---|---|
| OG-PCL | Tri-view CNN, BiLSTM | 91.75 | OGConv, multi-view fusion (Yan et al., 12 Nov 2025) |
| PALMAR | Voxels+CNN/HMM+VAE | 96.0 | DBSCAN+BIRCH, AO-HMM, VAE domain adapt. (Alam et al., 2021) |
| RobHAR | LPN-BiLiLSTM | 95.3–95.4 | LPN, BiLiLSTM, HMM+CTC (Gu et al., 2024) |
| DDGNN-Star | Graph, BiLSTM | 94.27 | Star graph, 2-layer GCN, BiLSTM (Gao et al., 12 Dec 2025) |
| MiliPoint Baselines | PointNet++/DGCNN | ≤34.5 | Basic pointwise models; <35% AC (Cui et al., 2023) |
| Tunau et al. | DBSCAN/Hungarian/KF+DL | 98.0 (KM/DS only) | Clustering+DL, low-latency (Tunau et al., 14 Aug 2025) |
A key observation is that voxel- and point-based approaches require careful tuning to handle sparsity, including custom convolution blocks (OGConv) or MLPs; DBSCAN or star-graph approaches natively adapt to data structure. Multi-view fusion (+5% accuracy) and sparsity compensation (K/D scaling) in convolutions further improve performance (Yan et al., 12 Nov 2025).
5. Hardware, Latency, and Deployment
Leading mmWave radar HAR systems achieve real-time inference (20–30 Hz, latency ~25–200 ms/frame) on edge devices such as ARM CPUs (Raspberry Pi 4, Jetson Nano) and mid-range GPUs (Yan et al., 12 Nov 2025, Gu et al., 2024, Gao et al., 12 Dec 2025, Alam et al., 2021). Model sizes range from ~80K (RobHAR LPN) to 0.83M (OG-PCL), supporting lightweight embedded deployment (Yan et al., 12 Nov 2025, Gu et al., 2024). Graph-based models (DDGNN) require minimal preprocessing and run at ~7 Hz on Pi 4 (Gao et al., 12 Dec 2025).
Design recommendations reflect the trade-off between accuracy, computational cost, and latency:
- Real-time/resource-constrained: DBSCAN-only or star-GCN+BiLSTM offers high accuracy and sub-10 ms inference (Tunau et al., 14 Aug 2025, Gao et al., 12 Dec 2025).
- Server-side/high-accuracy: Full clustering (DBSCAN+Hungarian+KF) plus 3D-CNN or GNN yields maximal robustness but incurs 50× higher latency (Tunau et al., 14 Aug 2025).
- Adaptive tuning: Clustering radius, minPts, and model parameters are best set via cross-validation or Bayesian optimization on held-out sets (Tunau et al., 14 Aug 2025, Alam et al., 2021).
Robustness to scene diversity is bolstered by architectural domain adaptation (dual-VAE in PALMAR (Alam et al., 2021)), data augmentation, and parameterized preprocessing (Gu et al., 2024, Cui et al., 2023).
6. Challenges, Limitations, and Future Directions
Current limitations are imposed by hardware field of view, occlusion, point sparsity and noise, small and often homogeneous datasets (e.g., limited participant diversity), and single-radar geometry (Gao et al., 12 Dec 2025, Alam et al., 2021, Cui et al., 2023, Gu et al., 2024). Occlusion and multi-person tracking remain active problems; PALMAR addresses cross-over events via AO-HMM and a Crossover Path Disambiguation Algorithm, reducing error rates by 57% over state-of-art (Alam et al., 2021).
Planned directions include:
- Extension to health assessment tasks (e.g., breathing/respiration, fall detection) (Alam et al., 2021, Gu et al., 2024);
- Multi-radar deployment for wider area and occlusion handling (Cui et al., 2023, Gu et al., 2024);
- Semi-supervised/self-supervised prelearning to expand label coverage (Gao et al., 12 Dec 2025);
- Sensor/domain adaptation without manual re-annotation (Alam et al., 2021, Gu et al., 2024);
- Integration of mmWave with other modalities (IMU/cameras) for sensor fusion (Cui et al., 2023).
A plausible implication is that graph-based spatial encoding combined with lightweight temporal aggregation (e.g., star-GCN+BiLSTM) will remain prominent in edge HAR, with voxel-based and hybrid methods supplemented by domain adaptation for transferability.
7. Summary and Comparative Analysis
mmWave radar point cloud-based HAR is a maturing field uniting signal processing, clustering/tracking, and advanced deep learning under strong constraints of sparsity, privacy, and real-time response. OG-PCL exemplifies robust, compact processing via occupancy-gated convolutions and multi-view temporal modeling (91.75% accuracy on RadHAR), whereas PALMAR advances multi-inhabitant tracking and adaptive cross-domain HAR (96% in multi-person test) (Yan et al., 12 Nov 2025, Alam et al., 2021). Graph-based designs and pointwise MLP/GCN pipelines set new benchmarks for efficiency and accuracy with limited/flexible radar data (Gao et al., 12 Dec 2025, Gu et al., 2024). Future work will need to emphasize multi-modal robustness, adaptive deployment, and continual learning in diverse and dynamic indoor environments.