Papers
Topics
Authors
Recent
Search
2000 character limit reached

Slice Pooling Layer in 3D Point Cloud Segmentation

Updated 28 January 2026
  • Slice pooling layer is a neural network module that converts unordered point cloud features into spatially ordered slice-level representations for sequence modeling.
  • It uses max pooling within slices to robustly aggregate point features, allowing bidirectional RNNs to capture local dependencies effectively.
  • Empirical results on the S3DIS dataset indicate that optimal performance is achieved with a slice thickness of 2 cm, demonstrating its practical impact on segmentation accuracy.

A slice pooling layer is a neural network module developed for processing unordered and unstructured point clouds, enabling the imposition of a partial order required by sequence-based models such as Recurrent Neural Networks (RNNs). It serves as the crucial entrypoint of the local dependency modeling module in the RSNet framework, facilitating efficient and accurate 3D segmentation by transforming point-wise features into spatially ordered slice-level representations (Huang et al., 2018).

1. Purpose and Role in 3D Point Cloud Segmentation

The primary function of the slice pooling layer is to project unordered point-wise features from the input point cloud into an ordered sequence of slice-level features. This process enables the application of conventional sequence models, specifically bidirectional RNNs, for capturing local dependencies across spatial neighborhoods. In the RSNet pipeline, the slice pooling layer constitutes the first stage of a three-step local dependency module, followed by RNN layers for context exchange among adjacent slices and a slice unpooling operation, which maps slice-level output features back to the original points for further network processing (Huang et al., 2018).

2. Mathematical Formulation and Pipeline

Given a point cloud

P={(xi,fi)}i=1n,P = \{ (x_i, f_i) \}_{i=1}^n,

where xiR3x_i \in \mathbb{R}^3 are the 3D coordinates and fiRdinf_i \in \mathbb{R}^{d^{in}} are features, slice pooling is performed as follows:

  • Fix an axis (e.g., zz) and a slice thickness rr.
  • Compute the bounds:
    • zmin=minixi,zz_{min} = \min_i x_{i,z},
    • zmax=maxixi,zz_{max} = \max_i x_{i,z},
    • N=(zmaxzmin)/rN = \lceil (z_{max} - z_{min}) / r \rceil.
  • Assign each point to a slice:
    • ki=(xi,zzmin)/rk_i = \lfloor (x_{i,z} - z_{min}) / r \rfloor, 0ki<N0 \leq k_i < N.
    • Define slice sets: Sk={iki=k}S_k = \{ i \mid k_i = k \} for k=0,...,N1k = 0, ..., N-1.
  • Aggregate point features for each slice by max-pooling:
    • fks=maxiSkfif_k^s = \max_{i \in S_k} f_i.
  • Produce the ordered output:
    • Fs=[f0s,f1s,...,fN1s]RN×dinF^s = [f_0^s, f_1^s, ..., f_{N-1}^s] \in \mathbb{R}^{N\times d^{in}}.

This transformation restructures the input into a sequence of length NN, suitable for bidirectional RNN processing (Huang et al., 2018).

3. Pooling Operation, Tensor Shapes, and Efficiency

Within each slice, an element-wise max-pooling operator is applied over the dind^{in} feature channels, yielding a slice feature vector that summarizes its local context. The number of points per slice varies depending on the spatial distribution, but the pooling itself is data-independent in terms of learnable parameters: no explicit normalization or per-slice weighting is used; pure max-pooling is chosen for robustness to varying point counts.

The tensors at each stage adopt the following shapes:

  • Input: FinRB×n×dinF^{in} \in \mathbb{R}^{B \times n \times d^{in}} (batch size BB, nn points, dind^{in} features).
  • After slice pooling: FsRB×N×dinF^s \in \mathbb{R}^{B \times N \times d^{in}}.
  • After bidirectional RNN: FrRB×N×drF^r \in \mathbb{R}^{B \times N \times d^r}.
  • After slice unpooling: FsuRB×n×drF^{su} \in \mathbb{R}^{B \times n \times d^r}.

The pooling and unpooling are performed in O(Bnd)O(B n d) time, independent of slice resolution parameters (i.e., O(1)O(1) with respect to rr) and require no neighbor search or hierarchical data structure construction (Huang et al., 2018).

4. Hyperparameterization and Empirical Observations

The slice pooling layer introduces slice thickness parameters (rx,ry,rz)(r_x, r_y, r_z), which control the granularity of the local context aggregation. Adjusting rr balances the trade-off between fine local detail (smaller rr, longer sequences) and computational tractability (larger rr, coarser context). Empirical evaluation on the S3DIS dataset demonstrates that rz=2r_z = 2 cm achieves the best results, with mean Intersection over Union (mIOU) of 51.93 and mean accuracy (mAcc) of 59.42. Setting rzr_z to 1 cm or 5 cm yields lower or similar performance, indicating sensitivity to this hyperparameter. A block size of 1m×1m1 \mathrm{m} \times 1 \mathrm{m} and a non-overlapping stride are typically used to keep sequence lengths moderate (approximately 50 slices in the xx or yy directions), which ensures RNN effectiveness while maintaining context (Huang et al., 2018).

5. Pseudocode and Computational Flow

The following pseudocode outlines the core operations of the slice pooling and unpooling procedures using standard tensor operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def slice_pool(F, Z, r):
    z_min = min_over_batch_and_points(Z)
    N = ceil((max(Z) - z_min) / r)
    F_s = full((B, N, d), -inf)  # Initialization for max pool
    K = floor((Z - z_min) / r).astype(int)  # Slice indices
    for b in 0 ... B-1:
        for i in 0 ... n-1:
            k = K[b, i]
            F_s[b, k, :] = max(F_s[b, k, :], F[b, i, :])
    return F_s
def slice_unpool(F_r, K):
    F_out = zeros((B, n, d_r))
    for b in 0 ... B-1:
        for i in 0 ... n-1:
            k = K[b, i]
            F_out[b, i, :] = F_r[b, k, :]
    return F_out
The slice pooling layer does not require additional learnable parameters and incurs minimal memory overhead, needing only the storage of slice indices KK and a compact slice tensor per batch sample (Huang et al., 2018).

6. Context, Significance, and Distinctiveness

The slice pooling layer addresses the challenge of applying RNNs to point clouds, which are inherently unordered and unstructured, by imposing an explicit spatial ordering along a selected axis. It differs from methods that involve neighbor search or explicit clustering, offering computational advantages by avoiding costly tree builds and enabling linear-time processing in the number of points. Once the point set is organized into slices, RNNs model dependencies between adjacent slices, capturing local context lost by strictly point-wise models. The output is then restored to the per-point domain via slice unpooling, augmenting each point’s features with local spatial context. Empirical validation demonstrates that this lightweight mechanism significantly improves segmentation accuracy on standard 3D benchmarks, outperforming previous state-of-the-art approaches (Huang et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Slice Pooling Layer.