Papers
Topics
Authors
Recent
Search
2000 character limit reached

RWKV Seed Generator for Scene Completion

Updated 15 November 2025
  • RWKV-SG is a specialized module that transforms partial 3D point clouds into coarse, feature-rich outputs using a linear RWKV-based mechanism.
  • It employs a modular architecture—with PointNet encoding, PRWKV stacks, and RWKV-ATTN—to effectively fuse local and global features while enhancing computational efficiency.
  • Empirical evaluations demonstrate that RWKV-SG improves completion accuracy by about 25% in Chamfer Distance and significantly reduces model size compared to conventional baselines.

The RWKV Seed Generator (RWKV-SG) constitutes a specialized module for generating coarse, feature-rich point clouds from partial input data within the context of point cloud semantic scene completion. It is central to the architecture of RWKV-PCSSC, leveraging the Receptance Weighted Key Value (RWKV) mechanism to improve parameter and memory efficiency while delivering competitive or superior accuracy. RWKV-SG operates exclusively on geometry, eschewing auxiliary modalities such as color or normal vectors.

1. Architectural Overview

RWKV-SG transforms an input partial point cloud PinRN×3P_\text{in} \in \mathbb{R}^{N\times3} into a coarse, completed point set with associated semantic logits and features. The module is highly modular, with each sub-block processing tensors of defined shape:

Stage Input Shape/Type Output Shape/Type
PointNet Encoding PinRN×3P_\text{in}\in\mathbb{R}^{N\times3} F0RN×CfF_0\in\mathbb{R}^{N\times C_f}
PRWKV Stack (4 layers) F0RN×CfF_{0}\in\mathbb{R}^{N\times C_f} FinRN×CfF_{in}\in\mathbb{R}^{N\times C_f}
Global Feature (SA) FinRN×CfF_{in}\in\mathbb{R}^{N\times C_f} fRCgf\in\mathbb{R}^{C_g}
Query Generation [Pinf]RN×(3+Cg)[P_{in}\Vert f]\in\mathbb{R}^{N\times(3+C_g)} qinRN×Cqq_{in}\in\mathbb{R}^{N\times C_q}
RWKV-ATTN Pin,qin,kinP_{in}, q_{in}, k_{in} HmissRN×ChH_{miss}\in\mathbb{R}^{N\times C_h}
Deconvolution HmissH_{miss} FmissRN×CfF_{miss}\in\mathbb{R}^{N\times C_f}
Rebuild Head FmissF_{miss} ΔPRN×3\Delta P\in\mathbb{R}^{N\times3}
Coarse Point Sampling [Pin;Pmiss],[F0;Fmiss][P_{in};P_{miss}], [F_0;F_{miss}] PcoarseRK×3,FcoarseRK×CfP_{coarse}\in\mathbb{R}^{K\times3}, F_{coarse}\in\mathbb{R}^{K\times C_f}
Segment Head FcoarseF_{coarse} LcoarseRK×CL_{coarse}\in\mathbb{R}^{K\times C}

Following the preliminary feature extraction F0F_0 via a PointNet-style encoder, a four-layer PointRWKV (PRWKV) stack abstracts context from local and global neighborhoods, producing FinF_\text{in}. Global context ff is pooled from FinF_\text{in} and broadcast to each point to form queries qinq_\text{in} using an MLP. RWKV-ATTN fuses queries, keys, and spatial neighborhoods to estimate missing-region features HmissH_\text{miss}, which are deconvolved and reparameterized as position offsets ΔP\Delta P. Sampled coarse points PcoarseP_\text{coarse} and their features FcoarseF_\text{coarse} form the output, fed to a semantic segmentation head for coarse per-point class logits LcoarseL_\text{coarse}.

2. Core RWKV Mechanism and Equations

RWKV modules substitute the quadratic O(N2)\mathcal{O}(N^2) softmax self-attention with linear-complexity “Receptance Weighted Key-Value” (RWKV) aggregation. For input point features XRT×dX\in\mathbb{R}^{T\times d}:

  • P-Shift (per-channel local reordering):

Xr=P-ShiftR(X),Xk=P-ShiftK(X),Xv=P-ShiftV(X)X'_r = \text{P-Shift}_R(X), \quad X'_k = \text{P-Shift}_K(X), \quad X'_v = \text{P-Shift}_V(X)

(μR\mu_R, μK\mu_K, μVRd\mu_V\in\mathbb{R}^d are learnable.)

  • Linear Projections:

R=XrWR,K=XkWK,V=XvWV,W=Rd×dR = X'_r W_R,\quad K = X'_k W_K,\quad V = X'_v W_V,\qquad W_*= \mathbb{R}^{d \times d}

For output index t[1,T]t\in[1,T],

at,i={exp(u+Kt)if i=t exp(ti1Tω+Ki)if ita_{t,i} = \begin{cases} \exp(u + K_t) & \text{if}\ i=t \ \exp\left(-\frac{|t-i|-1}{T}\omega + K_i\right) & \text{if}\ i\neq t \end{cases}

wkvt=i=1Tat,iVii=1Tat,i\text{wkv}_t = \frac{\sum_{i=1}^T a_{t,i} V_i}{\sum_{i=1}^T a_{t,i}}

where u,ωRu,\,\omega\in\mathbb{R} are learnable scalars.

  • Receptance Gating:

r=σ(R),v^=rwkvr = \sigma(R),\quad \hat v = r \odot \text{wkv}

  • Output:

O=LayerNorm(v^WO),WORd×dO = \text{LayerNorm}(\hat v\,W_O),\quad W_O\in\mathbb{R}^{d\times d}

Within RWKV-ATTN, a hybrid of global PRWKV output and local kk-NN attention is used: - Local values: vij=MLP([qi;kj])v_{ij} = \text{MLP}([q_i; k_j]) - Gated: v^ij=σ(PRWKV(vij))PRWKV(vij)\hat v_{ij} = \sigma(\text{PRWKV}(v_{ij})) \odot \text{PRWKV}(v_{ij}) - Weights: Ai,j=exp(MLP(qikj+αi,j))jL(i)exp(MLP(qikj+αi,j))A_{i,j} = \frac{\exp(\mathrm{MLP}(q_i-k_j+\alpha_{i,j}))}{\sum_{j\in L(i)} \exp(\mathrm{MLP}(q_i - k_j + \alpha_{i,j}))} - Output: Hi=jL(i)Ai,j(v^iiv^ij+αi,j)+viiH_i = \sum_{j\in L(i)} A_{i,j} (\hat v_{ii} - \hat v_{ij} + \alpha_{i,j}) + v_{ii}

This structure enables global context aggregation with linear complexity and maintains spatial discrimination through local attention.

3. Feature Aggregation Workflow

The processing steps of RWKV-SG are as follows:

  1. Preliminary Feature Extraction: F0=PointNet(Pin)F_0 = \text{PointNet}(P_\text{in}).
  2. Contextual Abstraction: FinF_{in} computed via four PRWKV layers.
  3. Global Context Gathering: ff pooled using Set Abstraction; combined per-point with PinP_{in} and processed to qinq_{in}.
  4. Local and Global Feature Fusion: kink_{in} set to FinF_{in}; RWKV-ATTN computes HmissH_\text{miss} per point within each kk-NN neighborhood.
  5. Missing Feature Deconvolution: HmissH_\text{miss} upsampled to FmissF_{miss} through a Snowflake-style deconvolution.
  6. Coarse Completion: ΔP\Delta P is regressed; Pmiss=Pin+ΔPP_{miss} = P_{in} + \Delta P.
  7. Farthest Point Sampling: [Pin;Pmiss][P_{in}; P_{miss}] and [F0;Fmiss][F_0; F_{miss}] sampled to KK coarse points.
  8. Coarse Semantic Segmentation: Per-point logits LcoarseL_{coarse} computed from FcoarseF_{coarse}.

This pipeline delivers plausible coarse geometry and features filling input holes, while maintaining efficiency through linear mechanisms.

4. Learnable Parameters and Model Efficiency

RWKV-SG is parameterized for compactness and speed. For typical feature dimensionality Cf=256C_f = 256:

  • PointNet encoder: \sim41K parameters.
  • PRWKV stack (4 layers): 4(4Cf2+3Cf+2)1.054 \cdot (4 C_f^2 + 3 C_f + 2) \approx 1.05M.
  • Query-generation MLP: \sim0.26M.
  • RWKV-ATTN internals: \sim0.40M.
  • Deconvolution: \sim0.15M.
  • Rebuild head: \sim0.03M.
  • Segment head: \sim0.6M.

Total: \sim2.5M parameters, accounting for 50–60% of the full RWKV-PCSSC network. The linear-complexity RWKV structure enables the entire model (RWKV-SG + RWKV-PD) to remain \sim4M parameters—yielding a 4.18×4.18\times reduction relative to the PointSSC baseline (\sim17M).

5. Empirical Performance and Impact

RWKV-SG and its accompanying network modules offer significant improvements in parameter and memory efficiency over softmax-attention-based dense architectures:

  • Parameter reduction: Full model size \sim76.1% smaller than PointSSC.
  • Memory efficiency: Peak GPU memory reduced by \sim27% (training, batch size 8, RTX3090).
  • Ablation study: Removal of RWKV-SG in SSC-PC increases Chamfer Distance from 0.265 to 0.353 (×1.33\times 1.33 worse) and lowers mean accuracy from 97.99% to 97.49%.
  • Qualitative output: PcoarseP_{coarse} clouds generated by RWKV-SG already reconstruct large missing areas plausibly.
  • Downstream effect: RWKV-PD refinements act primarily on edges; RWKV-SG provides the structural estimate.
  • Overall effect: RWKV-SG improves completion by \sim25% in Chamfer Distance over non-RWKV baselines while preserving or exceeding state-of-the-art SSC accuracy.

A plausible implication is that the majority of completion accuracy and efficiency gains in RWKV-PCSSC can be attributed directly to the design of RWKV-SG.

6. Context and Significance within Point Cloud Completion

RWKV-SG exemplifies a new paradigm in 3D point cloud completion, replacing resource-intensive attention with a linear, context-aware mechanism. By forgoing auxiliary cues (color, normals) and reducing overparameterization, it delivers competitive semantic scene completion on both standard datasets (SSC-PC, NYUCAD-PC, PointSSC) and new benchmarks (NYUCAD-PC-V2, 3D-FRONT-PC), as developed in "RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion" (He et al., 13 Nov 2025). This suggests further investigation of RWKV-style mechanisms is warranted for scalable 3D scene understanding in memory- and compute-constrained environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RWKV Seed Generator (RWKV-SG).