Superquadric-to-Voxel Splatting Scheme

Updated 29 January 2026

The paper introduces a novel splatting scheme that converts sparse superquadric representations into dense voxel grids for effective 3D occupancy and semantic mapping.
It leverages analytic superquadric occupancy fields and a probabilistic union method to fuse overlapping contributions while optimizing computations via spatial binning and primitive pruning.
The approach achieves significant runtime and memory efficiency improvements, demonstrating state-of-the-art performance on benchmarks like nuScenes, SurroundOcc, and Occ3D.

The superquadric-to-voxel splatting scheme is a computational framework for converting a sparse and expressive superquadric scene representation into dense voxel grids for 3D occupancy prediction and semantic mapping, with applications in autonomous driving and scene understanding. Superquadrics, parametrically defined geometric primitives encompassing ellipsoids, cuboids, cylinders, and their intermediates, provide a compact, shape-adaptive basis for spatial modeling, overcoming the rigidity and inefficiency of purely voxel- or Gaussian-based scene descriptors. The splatting procedure involves evaluating analytic superquadric occupancy fields at voxel centers, fusing overlapping contributions through a probabilistic union, and optimizing computations via spatial binning and primitive pruning or splitting. The principle has been developed and refined in works such as QuadricFormer (Zuo et al., 12 Jun 2025) and SuperOcc (&&&1&&&), where it achieves state-of-the-art efficiency for high-resolution 3D scene perception.

1. Superquadric Primitives and Occupancy Fields

A superquadric primitive is defined by a center $\mathbf{m}\in\mathbb{R}^3$ , orientation $\mathbf{R}\in SO(3)$ , scale axes $\mathbf{s}=(s_x,s_y,s_z)$ , squareness exponents $\epsilon_1$ , $\epsilon_2$ , and an optional learned opacity parameter (denoted $a$ or $\sigma$ ). The canonical implicit surface equation,

$f(\mathbf{u}) = \left(\left|u_x/s_x\right|^{2/\epsilon_2} + \left|u_y/s_y\right|^{2/\epsilon_2}\right)^{\epsilon_2/\epsilon_1} + \left|u_z/s_z\right|^{2/\epsilon_1},$

characterizes the superquadric in its local coordinate frame. A world-space point $\mathbf{x}$ is mapped into this frame via: $\mathbf{x}_Q = \mathbf{R} (\mathbf{x} - \mathbf{m}).$ A smooth occupancy function is obtained by applying an exponential decay: $p_i(\mathbf{x}) = \exp\left(-\lambda f(\mathbf{x}_Q)\right)\, a_i,$ where $\lambda$ is a global temperature (decay) parameter and $a_i$ the per-primitive opacity. This produces a soft, differentiable field suitable for probabilistic occupancy estimation at any spatial location.

2. Splatting: From Superquadrics to Voxel Grids

The core objective of splatting is to estimate for each voxel center $\mathbf{v}$ the aggregated occupancy $p_{\text{occ}}(\mathbf{v})$ arising from all $N$ scene superquadrics. Employing the assumption of primitive independence, the probabilistic union ("union-of-experts") formulation is used: $p_{\text{occ}}(\mathbf{v}) = 1 - \prod_{i=1}^N \bigl[1 - p_i(\mathbf{v})\bigr].$ For semantic voxel labeling, each superquadric carries a class logit vector $\mathbf{c}_i\in\mathbb{R}^C$ . The per-voxel semantic prediction is a mixture weighted by local primitive influence: $p_{\mathbf{c}}(\mathbf{v}) = \frac{\sum_{i=1}^N p_i(\mathbf{v}) a_i \mathbf{c}_i}{\sum_{i=1}^N p_i(\mathbf{v}) a_i}.$ These equations are evaluated at all voxel centers in the 3D grid, producing dense occupancy and semantic maps.

3. Computational Pipeline and GPU-Efficient Splatting

A naïve approach, evaluating all $N$ superquadrics at all $V$ voxels, incurs $O(NV)$ computational cost. SuperOcc (Yu et al., 22 Jan 2026) introduces an optimization by spatially binning superquadrics into cubic tiles (e.g., $4^3$ voxels per tile). Each tile maintains a list of contributing superquadrics, determined by their spatial support (bounding boxes scaled by a fixed radius factor).

The splatting process consists of:

Tile-level binning: Assign each superquadric to overlapping tile bins in parallel.
Per-tile CUDA kernel: Load relevant superquadric parameters into on-chip shared memory for the tile.
Per-voxel evaluation: Each thread computes $p_i(\mathbf{v})$ for all local superquadrics and fuses results using the expressions above.

This approach reduces memory bandwidth requirements and ensures that voxels are only influenced by nearby primitives, substantially improving both speed and scalability.

Optimization	Effect (SuperOcc, Tab. 7)	Magnitude
Training time	Reduced by efficient splatting pipeline	$82\,\text{h}\to20\,\text{h}$ ( $-76\%$ )
Training memory	Lower by improved data reuse	$21.4\,\text{GB}\to17.3\,\text{GB}$ ( $-19\%$ )
Splatting latency	Significantly reduced at inference	$6.2\,\text{ms}\to1.3\,\text{ms}$ ( $-79\%$ )
Inference FPS	Increased throughput	$25.2\to30.3$ ( $+20\%$ )

This enables superquadric-to-voxel splatting that is suitable for real-time, high-resolution applications.

4. Differentiability and Backpropagation

All elements of the splatting pipeline prior to occupancy and semantic fusion are analytic and differentiable. For the union-of-experts fusion, gradients propagate using: $\frac{\partial p_{\text{occ}}(\mathbf{v})}{\partial p^i_{\text{occ}}(\mathbf{v})} = \prod_{j\ne i} [1 - p_{\text{occ}}^j(\mathbf{v})].$ Chain rule derivatives for parameters (e.g., $s_x$ , $m$ , $R$ , $\epsilon$ ) are computed for each superquadric. Tile-level binning is implemented using a detached bound and does not receive gradients, but this does not affect the learning of the superquadric parameters themselves.

All occupancy and semantic predictions are supervised at the voxel level via cross-entropy and Lovász-Softmax losses (Zuo et al., 12 Jun 2025), ensuring that primitive placement and shape are learned end-to-end with respect to ground-truth 3D occupancy and semantics.

5. Pruning, Splitting, and Adaptive Scene Coverage

To optimize the representational budget of primitives and computational efficiency, both QuadricFormer (Zuo et al., 12 Jun 2025) and SuperOcc (Yu et al., 22 Jan 2026) apply adaptive schemes:

Pruning: Primitives with vanishing scales ( $\|\mathbf{s}_i\|\to 0$ ), typically located in unoccupied space, are removed after training epochs.
Splitting: Overly large primitives that cover broad, complex structures are split into multiple children with offset positions and reinitialized shapes. The total number of active primitives $N$ remains fixed, but their allocation is increasingly concentrated in locally occupied or geometrically complex regions.

This strategy focuses model capacity and compute on salient scene regions, improving both accuracy and runtime performance.

6. Comparison with Dense and Gaussian-Based Approaches

Traditional dense voxel occupancy models, while simple, are computationally inefficient due to the redundancy inherent in dense grids for spatially sparse real-world scenes. Gaussian mixture models, previously adopted for sparse object-centric occupancy, are limited by their ellipsoidal prior, often requiring many overlapping Gaussians to model non-ellipsoidal structures, leading to inefficiency.

Superquadrics provide a far richer geometric prior, enabling efficient modeling of cuboids, cylinders, and in-between forms with fewer primitives (Zuo et al., 12 Jun 2025). The closed-form occupancy functions and splatting scheme result in superior performance and speed, as supported by empirical results on nuScenes, SurroundOcc, and Occ3D benchmarks (Zuo et al., 12 Jun 2025, Yu et al., 22 Jan 2026).

A plausible implication is that the adoption of superquadric-based splatting fundamentally alters the efficiency/expressiveness trade-off in 3D dense scene modeling.

7. Future Directions and Impact

The superquadric-to-voxel splatting paradigm has demonstrated strong results in large-scale, high-throughput semantic occupancy prediction for autonomous perception. The GPU-aligned pipeline and analytic differentiability make it attractive for real-time systems and scalable learning. Key avenues for future research include further temporal integration for dynamic scenes (Yu et al., 22 Jan 2026), hierarchical multi-scale superquadric sets, and further acceleration for ultra-dense voxel resolutions. The impact is especially pronounced in perception pipelines for autonomous vehicles, where real-time performance and compactness are mission critical.

Markdown Report Issue Upgrade to Chat

References (2)

QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction (2025)

SuperOcc: Toward Cohesive Temporal Modeling for Superquadric-based Occupancy Prediction (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Superquadric-to-Voxel Splatting Scheme.