Papers
Topics
Authors
Recent
Search
2000 character limit reached

VN Missing Anchor Transformer

Updated 20 January 2026
  • The paper introduces a transformer that utilizes VN networks and channel-wise subtraction attention to predict missing anchor positions with strict rotation equivariance.
  • The model encodes point clouds as compact skeletons of learned 3D anchors with invariant VN features, ensuring stability under arbitrary rotations.
  • Empirical results on datasets like MVP and KITTI confirm its state-of-the-art performance in point cloud completion without requiring pose alignment or augmentation.

A VN Missing Anchor Transformer is a Transformer architecture built atop Vector Neuron (VN) networks and specifically designed to predict missing anchor positions and features in equivariant point cloud completion tasks. This class of models employs equivariant anchor representations and specialized attention mechanisms to guarantee rotation equivariance and robust performance under arbitrary input poses, as formalized in the REVNET framework (Ni et al., 13 Jan 2026). The central innovation is the use of anchor-based queries and channel-wise subtraction attention, enabling information aggregation and completion while strictly preserving equivariant structure.

1. Equivariant Anchor Representation

Partial point clouds are encoded not as raw point sets but as compact skeletons of learned 3D anchor points, each associated with local VN features. For NN observed anchors, positions Pa={pa,i∈R3}P_a = \{p_{a,i}\in\mathbb{R}^3\} and VN features Xa={Xa,i∈RC×3}X_a = \{X_{a,i}\in\mathbb{R}^{C\times 3}\} serve as the input. The VN feature is a CC-channel tensor of $3$D vectors, ensuring rotation-equivariance: under R∈SO(3)R \in SO(3),

pa,i↦Rpa,i,Xa,i↦Xa,iRp_{a,i} \mapsto R p_{a,i}, \quad X_{a,i} \mapsto X_{a,i} R

where RR is applied as a right-multiplication on each feature vector channel. This design guarantees stability and semantic fidelity under arbitrary point cloud orientation.

2. Anchor Query Construction

Prediction of missing anchors involves defining MM candidate positions {p^a,j}\{\hat{p}_{a,j}\} and generating corresponding VN anchor embeddings. Each missing anchor’s coordinate is lifted via a VN-EdgeConv operation over its kak_a nearest observed anchors:

Xemb,j=VN-EdgeConv(p^a,j;ka)∈RC×3X_{\mathrm{emb},j} = \mathrm{VN\text{-}EdgeConv}(\hat{p}_{a,j}; k_a) \in \mathbb{R}^{C\times 3}

A global feature XgX_g is extracted by pooling all observed VN features. These are concatenated and transformed by a VN-MLP to yield the query:

Qj=VN-MLP([Xg;Xemb,j])∈RC×3Q_j = \mathrm{VN\text{-}MLP}([X_g ; X_{\mathrm{emb},j}]) \in \mathbb{R}^{C\times 3}

This ensures each query incorporates both local and global geometric context in an equivariant manner.

3. VN Self- and Cross-Attention Mechanisms

The encoder applies NencN_{\rm enc} repeated VN self-attention blocks on XaX_a to produce refined keys and values (K,V)∈RN×C×3(K, V) \in \mathbb{R}^{N\times C\times 3}. The decoder then performs cross-attention over these, updating queries for missing anchors through channel-wise subtraction attention (CWSA):

  • For each head hh:
    • Relative feature: Dij=Qj(h)−Ki(h)D_{ij} = Q_j^{(h)} - K_i^{(h)}
    • Invariant mapping: Sij=MLP(VN-Inv(Dij))∈RChS_{ij} = \mathrm{MLP}(\mathrm{VN\text{-}Inv}(D_{ij})) \in \mathbb{R}^{C_h}
    • Softmax normalization: αij=softmaxiSij∈RCh\alpha_{ij} = \mathrm{softmax}_i S_{ij} \in \mathbb{R}^{C_h}
    • Aggregation: Aj(h)=∑i=1Nαij⊙Vi(h)A^{(h)}_j = \sum_{i=1}^N \alpha_{ij} \odot V_i^{(h)}

All attention, linear, and normalization layers are strictly equivariant: if inputs are rotated by RR, all outputs are accordingly rotated. This is mathematically ensured by bias and normalization formulations such as the rotation-equivariant bias in VN-LinearB_B and ZCA-based layer normalization.

4. Fine Point Cloud Completion

After X^a,j\hat{X}_{a,j} VN features are decoded for missing anchors, they are mapped to invariant codes via VN-Inv\mathrm{VN\text{-}Inv}, processed by an MLP to generate local point offsets ΔPj∈Rs×3\Delta P_j \in \mathbb{R}^{s\times 3}, then rotated into the proper frame and combined with anchor position:

Output points={p^a,j+ΔPj}\text{Output points} = \{\hat{p}_{a,j} + \Delta P_j\}

This yields completed point clouds in the same equivariant space, eliminating the need for rotation augmentation.

5. Mathematical Guarantees of Equivariance

All VN operations in the Missing Anchor Transformer, including attention aggregation, channel-wise scaling, VN-MLP transformations, and layer normalization, commute with right-multiplication by any R∈SO(3)R \in SO(3). Specifically,

∑αij⊙(ViR)=(∑αij⊙Vi)R\sum \alpha_{ij} \odot (V_i R) = \left(\sum \alpha_{ij} \odot V_i\right) R

VN-LinearB(XR;… )=VN-LinearB(X;… )R\mathrm{VN\text{-}Linear}_B(XR;\dots) = \mathrm{VN\text{-}Linear}_B(X;\dots) R

Thus, the output of the network under rotated input is identically the rotated output, ensuring strict equivariance.

6. Training and Architectural Details

The VN Missing Anchor Transformer is trained end-to-end using a Chamfer-L1L_1 loss between the predicted dense point set and ground truth. All VN-attention, bias, and normalization blocks are parameterized for equivariance. Training is robust under sparse inputs, with ZCA-based VN LayerNorm and bias terms stabilizing gradients. Prediction of missing anchors does not depend on pose alignment and requires no data augmentation.

7. Empirical Performance and Applications

REVNET, implementing the VN Missing Anchor Transformer, exhibits state-of-the-art completion performance on the MVP dataset in the equivariant setting, and competitive results on real-world datasets (KITTI) without pose pre-alignment (Ni et al., 13 Jan 2026). The architecture is particularly suited for point cloud completion tasks in robotics, autonomous sensing, and any context requiring robustness to arbitrary SO(3)SO(3) rotations. The design enables stable local detail preservation, reliable missing anchor inference, and efficient decoding into completed clouds. Codes and models for REVNET are publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to VN Missing Anchor Transformer.