SSCATeR: Sparse Scatter Temporal Convolution
- The paper presents an incremental convolution framework (SSCATeR) that leverages temporal change detection to achieve up to 6.61× speedup in LiDAR-based 3D object detection.
- SSCATeR reduces redundant computation by updating only changed spatial regions, cutting processing by approximately 72–84% without compromising accuracy.
- The method utilizes sliding window mechanisms, paired feature maps, and specialized memory architectures to maintain real-time sub-10 ms latency in embedded GPU systems.
The Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling (SSCATeR) is an incremental convolution framework engineered for efficient, real-time 3D object detection in LiDAR point clouds. Distinct from traditional full-frame or static sparse convolution approaches, SSCATeR leverages the intrinsic streaming nature of LiDAR acquisition and temporal coherency between consecutive sweeps, concentrating computational effort exclusively on spatial regions that have changed since the last evaluation. By maintaining per-site convolutional feature state and exploiting fine-grained temporal data recycling, SSCATeR achieves substantial reductions in redundant operations, facilitating high-throughput, real-time processing while preserving bitwise-identical precision relative to standard sparse convolution techniques (Dow et al., 9 Dec 2025).
1. Temporal Change Detection and Sliding Window Mechanism
SSCATeR treats the LiDAR data stream as a temporally evolving sequence, where each point is annotated with a precise timestamp. The core of the temporal mechanism is a sliding time window of length ms, advanced in increments of ms. With each stride, new points within are incorporated, while points with timestamps less than are pruned.
Current window points are quantized into a 2D grid of pillars (e.g., grid with $0.16$ m granularity). SSCATeR maintains a Boolean "change map" : a pillar is marked changed (i.e., ) if at least one point enters or leaves it during the stride. Only pillars marked as changed are propagated for downstream convolutional updates; all others are excluded from further processing.
2. Temporal Data Recycling in Convolutional Layers
At every convolutional layer, SSCATeR maintains two feature maps, and , corresponding to the previous and current windows, held in GPU memory. Instead of recomputing the entire layer output from scratch, SSCATeR calculates the difference:
and applies a localized update at only the changed sites.
Each changed input at coordinate contributes incrementally to the output feature map via a “scatter deconv+conv” operation:
This operation both removes the old contribution from and adds the new contribution from . After update, is overwritten with for the next stride.
3. Scatter-Based Convolution with Data Reuse
For each convolutional layer, the pipeline preserves:
- Previous feature tensor,
- Previous change map,
- Kernel weights,
The following pseudocode outlines the forward pass with data reuse:
1 2 3 4 5 6 7 8 9 10 11 12 |
initialize Icurr by scattering only new pillar features into the pseudo-image
compute C by marking all input-layer positions that have changed
for each site (i, j) where C[i, j] == 1:
Δ = Icurr[i, j] - Iprev[i, j]
for m in 0…K–1, n in 0…K–1:
out_i = i – m + floor(K/2)
out_j = j – n + floor(K/2)
O[out_i, out_j] += Δ * W[m, n]
if ReLU(O[out_i, out_j]) changes from non-zero to zero:
Ccurr[out_i, out_j] = 1
Iprev ← Icurr
Cprev ← Ccurr |
Only sites flagged in the change map are iterated. The output change map is updated post-ReLU to propagate site-level change information through the network stack. All other spatial regions are skipped entirely, resulting in extreme computational sparsity.
4. Complexity Analysis and Empirical Speedup
Let denote the total number of active sites (non-empty pillars) in a frame and the number of changed sites per stride. For standard sparse convolution the per-layer complexity is ; in SSCATeR, only computation is needed. In representative datasets, (a reduction), predicting a theoretical speedup of .
Empirical evaluation reports per-layer speedup up to for a convolution layer (64 channels) on the NVIDIA Jetson AGX Orin. Across the full three-block backbone, mean speedup was (Orin) and (Xavier). End-to-end convolutional latency frequently falls below $10$ ms, supporting real-time $10$ ms stride operation. The observed feature maps and object detection outputs are bitwise-identical to standard sparse or scatter-based convolution implementations (Dow et al., 9 Dec 2025).
5. Data Structures and Memory Architecture
Efficient support for temporal data recycling demands specialized memory structures:
- Each pillar in the grid holds a small FIFO (circular queue) for tracking points with up to 100 ms retention; this enables efficient expiry/removal and centroid update.
- At every convolutional layer , a Boolean change map of matched spatial resolution tracks sites requiring update.
- Full-precision feature tensors per layer are cached in GPU memory and updated in place each stride.
- Output feature tensors are also held in GPU memory and incrementally modified, with no need to recompute unaffected regions.
This design reduces both the memory bandwidth and computational footprint by ensuring only spatial-temporal change propagates through the backbone.
6. Experimental Evaluation and Quantitative Results
Evaluation employed two custom drone LiDAR datasets (Newcastle bridge and Hamburg lock; frames each, ms window, $5$–$6$k points per window). The network backbone used a PointPillars ("Aerial-PointPillars") architecture with a $100$ ms PFN, $1$D convolution, followed by the SSCATeR-based scatter backbone and SSD-style detection head. Only the backbone was replaced; all other network modules were left unchanged.
Tests ran on NVIDIA Jetson AGX Xavier (512 CUDA) and AGX Orin (2,048 CUDA), both at maximum power. Results demonstrate:
- Precision and recall remain constant (mAP , recall on the Newcastle split), with feature maps and outputs bitwise-identical to baselines.
- PFN operation count drops by ; PFN latency on Orin is approximately $0.2$ ms.
- For a convolution (64 channels), per-layer acceleration reached up to (mean ) on Orin.
- The entire backbone yields average acceleration of (Orin) and (Xavier).
- Real-time object detection at sub-$10$ ms backbone latency is now feasible for $10$ ms stride input increments.
7. Context, Significance, and Implications
SSCATeR demonstrates that leveraging temporal continuity in LiDAR streams can deliver extreme computational sparsity and large reductions in runtime without accuracy compromise. By acting strictly on altered spatial regions and maintaining activation state at every backbone layer, SSCATeR avoids $72$– of redundant convolution operations relative to standard approaches, while ensuring detection fidelity is not impacted. A plausible implication is that similar incremental sparse update paradigms may be applicable to other temporally-dense domains or sensor modalities with localized change patterns, subject to the availability of high-resolution timestamps and appropriate memory management.
The algorithm situates itself within the scatter-based convolutional literature, extending prior methods to include explicit data reuse and temporal change tracking. Unlike mask-based or classical sparse-convolution frameworks, SSCATeR uniquely fuses sliding-window change detection with incremental scatter convolution, achieving both architectural generality and practical deployment advantages, especially for embedded GPU applications (Dow et al., 9 Dec 2025).