Papers
Topics
Authors
Recent
Search
2000 character limit reached

Grouped Selective State Space Model (GS6)

Updated 12 November 2025
  • GS6 is a parameter-sharing enhancement that groups S6 state-space parameters to improve efficiency and reduce overfitting in point cloud analysis.
  • It partitions channels into groups to tie neighboring dynamics, which streamlines model complexity without sacrificing expressiveness.
  • Empirical studies on benchmarks like ModelNet40 and ScanObjectNN confirm that GS6 delivers higher accuracy with a lower parameter count.

The grouped selective state space model (GS6) is a parameter-sharing modification of the selective state space model (S6) within Mamba-type architectures, designed to mitigate overfitting and improve generalization for point cloud analysis. First introduced within the CloudMamba network, GS6 partitions the per-channel state-space parameters of S6 into groups, tying the state-space dynamics of neighboring channels while reducing the overall free parameter count. Empirical results on ModelNet40, ScanObjectNN, ShapeNet, and S3DIS demonstrate that GS6 strengthens modeling accuracy, regularizes learning, and curtails parameter redundancy, all without compromising runtime efficiency (Qu et al., 11 Nov 2025).

1. Mathematical Formulation: GS6 Versus S6

The vanilla S6 architecture applies distinct, input-dependent state-space parameters to each feature dimension dd:

ht=Aˉtht1+Bˉtxt yt=Ctht Δt(d)=softplus(WΔ,(d)xt(d)) Aˉt(d)=exp[Δt(d)A(d)] Bˉt(d)=(Δt(d)A(d))1(exp[Δt(d)A(d)]I)B(d) Ct(d)=WC,(d)xt(d)\begin{align*} h_t &= \bar{A}_t h_{t-1} + \bar{B}_t x_t \ y_t &= C_t h_t \ \Delta_t(d) &= \text{softplus}(W^{\Delta,(d)} x_t(d)) \ \bar{A}_t(d) &= \exp[\Delta_t(d) A^{(d)}] \ \bar{B}_t(d) &= (\Delta_t(d) A^{(d)})^{-1}(\exp[\Delta_t(d)A^{(d)}]-I)B^{(d)} \ C_t(d) &= W^{C,(d)} x_t(d) \end{align*}

Here, each channel dd is endowed with its own state matrix A(d)A^{(d)}, input vector B(d)B^{(d)}, and projection weights WΔ,(d)W^{\Delta,(d)}, WC,(d)W^{C,(d)}. For large DD, this results in a quadratic explosion in parameter count, precipitating overfitting.

GS6 remediates this by partitioning the DD channels into G=D/gG=D/g groups of size gg. All channels in group kk share a unique set {Ak,Bk,WkΔ}\{A_k, B_k, W^{\Delta}_k\}:

G(d)=d/g,d1DG(d) = \lceil d/g \rceil,\quad \forall d\in 1 \ldots D

Δt(group)=softplus(WkΔxt,group)\Delta_t(\text{group}) = \text{softplus}\left(W^{\Delta}_k x_{t,\text{group}}\right)

Aˉt(group)=exp[Δt(group)Ak]\bar{A}_t(\text{group}) = \exp\left[\Delta_t(\text{group})A_k\right]

Bˉt(group)=(Δt(group)Ak)1(exp[Δt(group)Ak]I)Bk\bar{B}_t(\text{group}) = (\Delta_t(\text{group})A_k)^{-1}\left(\exp[\Delta_t(\text{group})A_k]-I\right)B_k

The outputs of each group are then “repeated” gg times to cover all channels in the group, ensuring each channel in group kk utilizes the same state-space dynamics.

In procedural terms (see Algorithm 1 in (Qu et al., 11 Nov 2025)):

1
2
3
4
5
6
7
8
9
10
Input: x  ℝ^{B×L×D}, grouping rate g
Let N be the hidden state-dim.

B_proj = Linear_N(x)            # ∈ ℝ^{B×L×N}
C_proj = Linear_N(x)            # ∈ ℝ^{B×L×N}
Δ_grouped = softplus(Linear_{D/g}(x) + Param_Δ)         # ∈ ℝ^{B×L×(D/g)}
Ā_grouped = Δ_grouped  Param_A                         # ∈ ℝ^{B×L×(D/g)×N}
B̄_grouped = Δ_grouped  Param_B                        # ∈ ℝ^{B×L×(D/g)×N}
Ā, B̄ = Repeat(Ā_grouped, g), Repeat(B̄_grouped, g)     # both ∈ ℝ^{B×L×D×N}
y = SSM(Ā, B̄, C_proj)(x)

For g=1g=1, the scheme degenerates to standard S6.

2. Integration of GS6 in CloudMamba Architecture

In CloudMamba, GS6 is embedded within each hexa-orientation Mamba block, serving as the principal feature aggregation mechanism in both encoder and decoder hierarchies. Each block executes:

  • Sequence Expanding: Point features TT are sorted along the XX, YY, ZZ axes, each sequence prepended with an axis-specific prompt and positional embedding, yielding three causal streams TxT_x, TyT_y, TzT_z.
  • ChainedMamba Processing: Each sequence operates in a bidirectional Mamba configuration, with the forward pass outputs chained into the backward pass for higher-order geometric insight. The GS6 module provides the recurrent and input-to-state parameterization within these scans.
  • Sequence Merging: Axis prompts are removed and outputs are restored to their original point order, concatenated channel-wise, and projected back to latent dimension HH via an MLP γ\gamma.

This tri-axial, GS6-integrated pipeline preserves point cloud permutation invariance, fosters causal modeling, and hierarchically consolidates learned features. All state-space evolution within Mamba blocks is governed by GS6, replacing vanilla S6 entirely.

3. Theoretical Motivation for Grouped Parameterization

In vanilla S6, learning separate state-space parameters for each output channel leads to prohibitively high parameterization: O(DN2)O(DN^2) for state matrices alone, where DD is often large after MLP expansions. On smaller point cloud tasks, this overparameterization empirically results in overfitting, as spurious per-channel variations are learned that do not correspond to underlying data structure.

Grouping, as in GS6, enforces shared dynamics across contiguous feature channels, reducing parameter redundancy while acting as a structural regularizer. Unlike total parameter tying (i.e., sharing across all channels), GS6 maintains modeling flexibility by leveraging input-dependent Δt\Delta_t(group) terms and by controlling group size gg. Empirical results identify g=3g=3 as an optimal compromise, maximizing accuracy while minimizing overfitting.

A plausible implication is that GS6 achieves a form of inductive bias beneficial for unordered, permutation-invariant representations such as those found in point clouds, by biasing the model away from learning channel-specific effects that are not data-grounded.

4. Parameter Efficiency and Computational Complexity

GS6 maintains the linear time and memory complexity of S6 and Mamba, O(LDN)O(LDN), where LL is the sequence length, DD the feature count, and NN the internal state size. The grouping structure only adds negligible overhead: shallow linear projections for grouped parameter computation and a channel-wise repeat. The FlashAttention-style parallel scan for efficient SSM propagation is retained without modification.

Parameter savings are substantial owing to the grouping. In the CloudMamba backbone, with D256D\approx256 and g=3g=3, the model reduces per-channel state-space parameter sets from $256$ to 256/385256/3\approx85, yielding a total parameter count of $9.95$ million versus $10.35$ million for ungrouped S6 (a reduction of approximately $400$k). Notably, this reduction occurs at constant FLOPs: $1.150$ GFLOPs in both group and no-group settings.

Group Size (gg) Parameter Count (M) ModelNet40 OA (%)
1 10.35 93.11
3 9.95 93.65

Peak empirical performance is realized at g=3g=3; larger groups (g=6,9g=6,9) begin underfitting.

5. Empirical Evaluation

Ablation studies conducted in (Qu et al., 11 Nov 2025) provide quantifiable evidence for GS6's efficacy. On ModelNet40:

  • Without grouping: $10.35$M params, 93.11%93.11\% accuracy.
  • With GS6 (g=3): $9.95$M params, 93.65%93.65\% accuracy.

This constitutes a 1.54%1.54\% absolute increase in overall accuracy (OA) alongside a parameter reduction.

A detailed sweep over group sizes yields:

g OA (%)
1 93.11
2 93.35
3 93.65
6 92.75
9 92.14

Across diverse benchmarks, CloudMamba with GS6 consistently matches or surpasses state-of-the-art SSM-based point cloud models in accuracy and computational efficiency:

  • ModelNet40: 93.7%93.7\% OA (matching Point Transformer) at only $1.15$ GFLOPs, $9.95$M params.
  • ScanObjectNN: 88.3%88.3\% OA, outperforming prior SSM models.
  • ShapeNet part segmentation: $86.6$ mIoU at $1.234$ GFLOPs.
  • S3DIS scene-segmentation: $73.6$ mIoU, surpassing Point Transformer V3 at lower FLOPs.

The empirical findings affirm that GS6’s grouped sharing can provide both regularization and expressiveness beyond standard S6 parameterization for point cloud tasks.

6. Summary and Significance

The grouped selective state space model (GS6) constitutes a targeted regularization and efficiency enhancement for SSMs in point cloud learning. By orchestrating parameter sharing among subsets of feature channels, GS6:

  • Reduces the SSM parameter budget by a factor of 1/g\sim 1/g,
  • Regularizes against overfitting by constraining per-channel redundancy,
  • Retains linear runtime complexity (O(LDN)O(LDN)),
  • Yields superior or equivalent downstream task performance compared to standard S6.

Within the CloudMamba framework, GS6 is central to achieving state-of-the-art point cloud analysis across multiple datasets with a minimal parameter and computational footprint. These results indicate that localized parameter sharing may have broader applicability for other state space and sequence modeling domains where overparameterization is a concern and permutation invariance is desirable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Grouped Selective State Space Model (GS6).