VN Missing Anchor Transformer
- The paper introduces a transformer that utilizes VN networks and channel-wise subtraction attention to predict missing anchor positions with strict rotation equivariance.
- The model encodes point clouds as compact skeletons of learned 3D anchors with invariant VN features, ensuring stability under arbitrary rotations.
- Empirical results on datasets like MVP and KITTI confirm its state-of-the-art performance in point cloud completion without requiring pose alignment or augmentation.
A VN Missing Anchor Transformer is a Transformer architecture built atop Vector Neuron (VN) networks and specifically designed to predict missing anchor positions and features in equivariant point cloud completion tasks. This class of models employs equivariant anchor representations and specialized attention mechanisms to guarantee rotation equivariance and robust performance under arbitrary input poses, as formalized in the REVNET framework (Ni et al., 13 Jan 2026). The central innovation is the use of anchor-based queries and channel-wise subtraction attention, enabling information aggregation and completion while strictly preserving equivariant structure.
1. Equivariant Anchor Representation
Partial point clouds are encoded not as raw point sets but as compact skeletons of learned 3D anchor points, each associated with local VN features. For observed anchors, positions and VN features serve as the input. The VN feature is a -channel tensor of $3$D vectors, ensuring rotation-equivariance: under ,
where is applied as a right-multiplication on each feature vector channel. This design guarantees stability and semantic fidelity under arbitrary point cloud orientation.
2. Anchor Query Construction
Prediction of missing anchors involves defining candidate positions and generating corresponding VN anchor embeddings. Each missing anchor’s coordinate is lifted via a VN-EdgeConv operation over its nearest observed anchors:
A global feature is extracted by pooling all observed VN features. These are concatenated and transformed by a VN-MLP to yield the query:
This ensures each query incorporates both local and global geometric context in an equivariant manner.
3. VN Self- and Cross-Attention Mechanisms
The encoder applies repeated VN self-attention blocks on to produce refined keys and values . The decoder then performs cross-attention over these, updating queries for missing anchors through channel-wise subtraction attention (CWSA):
- For each head :
- Relative feature:
- Invariant mapping:
- Softmax normalization:
- Aggregation:
All attention, linear, and normalization layers are strictly equivariant: if inputs are rotated by , all outputs are accordingly rotated. This is mathematically ensured by bias and normalization formulations such as the rotation-equivariant bias in VN-Linear and ZCA-based layer normalization.
4. Fine Point Cloud Completion
After VN features are decoded for missing anchors, they are mapped to invariant codes via , processed by an MLP to generate local point offsets , then rotated into the proper frame and combined with anchor position:
This yields completed point clouds in the same equivariant space, eliminating the need for rotation augmentation.
5. Mathematical Guarantees of Equivariance
All VN operations in the Missing Anchor Transformer, including attention aggregation, channel-wise scaling, VN-MLP transformations, and layer normalization, commute with right-multiplication by any . Specifically,
Thus, the output of the network under rotated input is identically the rotated output, ensuring strict equivariance.
6. Training and Architectural Details
The VN Missing Anchor Transformer is trained end-to-end using a Chamfer- loss between the predicted dense point set and ground truth. All VN-attention, bias, and normalization blocks are parameterized for equivariance. Training is robust under sparse inputs, with ZCA-based VN LayerNorm and bias terms stabilizing gradients. Prediction of missing anchors does not depend on pose alignment and requires no data augmentation.
7. Empirical Performance and Applications
REVNET, implementing the VN Missing Anchor Transformer, exhibits state-of-the-art completion performance on the MVP dataset in the equivariant setting, and competitive results on real-world datasets (KITTI) without pose pre-alignment (Ni et al., 13 Jan 2026). The architecture is particularly suited for point cloud completion tasks in robotics, autonomous sensing, and any context requiring robustness to arbitrary rotations. The design enables stable local detail preservation, reliable missing anchor inference, and efficient decoding into completed clouds. Codes and models for REVNET are publicly available.