Papers
Topics
Authors
Recent
Search
2000 character limit reached

SSCATeR: Sparse Scatter Temporal Convolution

Updated 16 December 2025
  • The paper presents an incremental convolution framework (SSCATeR) that leverages temporal change detection to achieve up to 6.61× speedup in LiDAR-based 3D object detection.
  • SSCATeR reduces redundant computation by updating only changed spatial regions, cutting processing by approximately 72–84% without compromising accuracy.
  • The method utilizes sliding window mechanisms, paired feature maps, and specialized memory architectures to maintain real-time sub-10 ms latency in embedded GPU systems.

The Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling (SSCATeR) is an incremental convolution framework engineered for efficient, real-time 3D object detection in LiDAR point clouds. Distinct from traditional full-frame or static sparse convolution approaches, SSCATeR leverages the intrinsic streaming nature of LiDAR acquisition and temporal coherency between consecutive sweeps, concentrating computational effort exclusively on spatial regions that have changed since the last evaluation. By maintaining per-site convolutional feature state and exploiting fine-grained temporal data recycling, SSCATeR achieves substantial reductions in redundant operations, facilitating high-throughput, real-time processing while preserving bitwise-identical precision relative to standard sparse convolution techniques (Dow et al., 9 Dec 2025).

1. Temporal Change Detection and Sliding Window Mechanism

SSCATeR treats the LiDAR data stream as a temporally evolving sequence, where each point is annotated with a precise timestamp. The core of the temporal mechanism is a sliding time window of length T=100T = 100 ms, advanced in increments of Δt=10\Delta t = 10 ms. With each stride, new points within [t,t+Δt)[t, t + \Delta t) are incorporated, while points with timestamps less than tTt - T are pruned.

Current window points are quantized into a 2D grid of pillars (e.g., 504×504504 \times 504 grid with $0.16$ m granularity). SSCATeR maintains a Boolean "change map" CC: a pillar is marked changed (i.e., C[i,j]1C[i, j] \leftarrow 1) if at least one point enters or leaves it during the stride. Only pillars marked as changed are propagated for downstream convolutional updates; all others are excluded from further processing.

2. Temporal Data Recycling in Convolutional Layers

At every convolutional layer, SSCATeR maintains two feature maps, IprevI_{\mathrm{prev}} and IcurrI_{\mathrm{curr}}, corresponding to the previous and current windows, held in GPU memory. Instead of recomputing the entire layer output from scratch, SSCATeR calculates the difference:

ΔI=IcurrIprev\Delta I = I_{\mathrm{curr}} - I_{\mathrm{prev}}

and applies a localized update at only the changed sites.

Each changed input at coordinate (i,j)(i,j) contributes incrementally to the output feature map OO via a “scatter deconv+conv” operation:

for all m,n{0,,K1}:O[im+K/2,  jn+K/2]+=  ΔI[i,j]W[m,n]\text{for all } m, n \in \{0, \ldots, K-1\}: \quad O[i-m+\lfloor K/2 \rfloor,\; j-n+\lfloor K/2 \rfloor]\,{+}{=}\;\Delta I[i, j] \cdot W[m, n]

This operation both removes the old contribution from IprevI_{\mathrm{prev}} and adds the new contribution from IcurrI_{\mathrm{curr}}. After update, IprevI_{\mathrm{prev}} is overwritten with IcurrI_{\mathrm{curr}} for the next stride.

3. Scatter-Based Convolution with Data Reuse

For each convolutional layer, the pipeline preserves:

  • Previous feature tensor, IprevI_{\mathrm{prev}}
  • Previous change map, CprevC_{\mathrm{prev}}
  • Kernel weights, WW

The following pseudocode outlines the forward pass with data reuse:

1
2
3
4
5
6
7
8
9
10
11
12
initialize Icurr by scattering only new pillar features into the pseudo-image
compute C by marking all input-layer positions that have changed
for each site (i, j) where C[i, j] == 1:
    Δ = Icurr[i, j] - Iprev[i, j]
    for m in 0…K–1, n in 0…K–1:
        out_i = i – m + floor(K/2)
        out_j = j – n + floor(K/2)
        O[out_i, out_j] += Δ * W[m, n]
        if ReLU(O[out_i, out_j]) changes from non-zero to zero:
            Ccurr[out_i, out_j] = 1
Iprev ← Icurr
Cprev ← Ccurr

Only sites flagged in the change map CC are iterated. The output change map CcurrC_{\mathrm{curr}} is updated post-ReLU to propagate site-level change information through the network stack. All other spatial regions are skipped entirely, resulting in extreme computational sparsity.

4. Complexity Analysis and Empirical Speedup

Let NN denote the total number of active sites (non-empty pillars) in a frame and MM the number of changed sites per stride. For standard sparse convolution the per-layer complexity is O(NK2)\mathcal{O}(N K^2); in SSCATeR, only O(MK2)\mathcal{O}(M K^2) computation is needed. In representative datasets, M/N0.27M/N \approx 0.27 (a 72.8%72.8\% reduction), predicting a theoretical speedup of N/M3.7×N/M \approx 3.7\times.

Empirical evaluation reports per-layer speedup up to 6.61×6.61\times for a 3×33 \times 3 convolution layer (64 channels) on the NVIDIA Jetson AGX Orin. Across the full three-block backbone, mean speedup was 3.8×3.8\times (Orin) and 4.1×4.1\times (Xavier). End-to-end convolutional latency frequently falls below $10$ ms, supporting real-time $10$ ms stride operation. The observed feature maps and object detection outputs are bitwise-identical to standard sparse or scatter-based convolution implementations (Dow et al., 9 Dec 2025).

5. Data Structures and Memory Architecture

Efficient support for temporal data recycling demands specialized memory structures:

  • Each pillar in the grid holds a small FIFO (circular queue) for tracking points with up to 100 ms retention; this enables efficient expiry/removal and centroid update.
  • At every convolutional layer \ell, a Boolean change map CC_\ell of matched spatial resolution tracks sites requiring update.
  • Full-precision feature tensors IprevI_{\mathrm{prev}\ell} per layer are cached in GPU memory and updated in place each stride.
  • Output feature tensors OO_\ell are also held in GPU memory and incrementally modified, with no need to recompute unaffected regions.

This design reduces both the memory bandwidth and computational footprint by ensuring only spatial-temporal change propagates through the backbone.

6. Experimental Evaluation and Quantitative Results

Evaluation employed two custom drone LiDAR datasets (Newcastle bridge and Hamburg lock; 10,00010{,}000 frames each, T=100T=100 ms window, $5$–$6$k points per window). The network backbone used a PointPillars ("Aerial-PointPillars") architecture with a $100$ ms PFN, $1$D convolution, followed by the SSCATeR-based scatter backbone and SSD-style detection head. Only the backbone was replaced; all other network modules were left unchanged.

Tests ran on NVIDIA Jetson AGX Xavier (512 CUDA) and AGX Orin (2,048 CUDA), both at maximum power. Results demonstrate:

  • Precision and recall remain constant (mAP 89.74%\approx 89.74\%, recall 91.28%\approx 91.28\% on the Newcastle split), with feature maps and outputs bitwise-identical to baselines.
  • PFN operation count drops by 70.97%70.97\%; PFN latency on Orin is approximately $0.2$ ms.
  • For a 3×33\times3 convolution (64 channels), per-layer acceleration reached up to 6.61×6.61\times (mean 3.83×3.83\times) on Orin.
  • The entire backbone yields average acceleration of 3.48×3.48\times (Orin) and 4.13×4.13\times (Xavier).
  • Real-time object detection at sub-$10$ ms backbone latency is now feasible for $10$ ms stride input increments.

7. Context, Significance, and Implications

SSCATeR demonstrates that leveraging temporal continuity in LiDAR streams can deliver extreme computational sparsity and large reductions in runtime without accuracy compromise. By acting strictly on altered spatial regions and maintaining activation state at every backbone layer, SSCATeR avoids $72$–84%84\% of redundant convolution operations relative to standard approaches, while ensuring detection fidelity is not impacted. A plausible implication is that similar incremental sparse update paradigms may be applicable to other temporally-dense domains or sensor modalities with localized change patterns, subject to the availability of high-resolution timestamps and appropriate memory management.

The algorithm situates itself within the scatter-based convolutional literature, extending prior methods to include explicit data reuse and temporal change tracking. Unlike mask-based or classical sparse-convolution frameworks, SSCATeR uniquely fuses sliding-window change detection with incremental scatter convolution, achieving both architectural generality and practical deployment advantages, especially for embedded GPU applications (Dow et al., 9 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling (SSCATeR).