RL-AD-Net: RL-Driven 3D Point Cloud Refinement
- The paper proposes an RL-based refinement framework that adaptively adjusts latent vectors to reduce Chamfer Distance and improve local geometric fidelity.
- It integrates a category-specific point autoencoder with a deterministic TD3 agent to modify incomplete 3D shapes without re-training the base completion network.
- A non-parametric selector using PointNN ensures that only geometrically consistent refinements are chosen, as validated by improved metrics across ShapeNet categories.
RL-AD-Net is a reinforcement learning-based refinement framework for point cloud completion that operates in the latent space of a pretrained point autoencoder. Recent completion models, including transformer- and denoising-based approaches, typically reconstruct globally plausible 3D shapes from partial inputs but often introduce local geometric inconsistencies. RL-AD-Net addresses these inconsistencies by leveraging a reinforcement learning (RL) agent to perform continuous adaptive displacements within a compact latent representation, producing refined completions with improved local geometric fidelity. The method integrates a geometric consistency selector to ensure the output preserves or enhances the plausibility of the original completion, and is designed to be lightweight, modular, and agnostic to the underlying completion network (Paregi et al., 21 Nov 2025).
1. Underlying Architecture and Latent Space
At the core of RL-AD-Net is a category-specific point autoencoder (AE) inspired by PointNet. The encoder is applied independently to each point in a 3D shape using a shared MLP, followed by a max-pooling operation to aggregate per-point features into a 128-dimensional global feature vector (GFV):
This GFV serves as a compact and semantically meaningful representation of the input’s global geometry. The decoder is a simple multi-layer perceptron that upsamples the GFV back to a dense 3D point cloud. The autoencoder is trained per category on complete shapes for 400 epochs using the Adam optimizer (learning rate , , , batch size 32). The objective is to minimize the bidirectional Chamfer Distance (CD), which enforces fidelity between the reconstructed and ground-truth point clouds and encourages the latent manifold to capture category-specific geometric priors.
2. RL-Based Adaptive Displacement in Latent Code
Given a pretrained completion network (e.g., AdaPoinTr) that generates a dense but possibly imperfect output , RL-AD-Net encodes this using the autoencoder to obtain . Refinement is posed as a Markov Decision Process (MDP), where:
- The RL agent’s state is the current GFV .
- The action is a continuous displacement vector in .
- The agent proposes a refined latent code , where is typically 0.1, and the corresponding refined completion is .
The RL policy is deterministic and implemented using Twin Delayed Deep Deterministic Policy Gradient (TD3), which is well-suited to high-dimensional, continuous action spaces. TD3 employs two critic networks and an actor network , along with slow-moving target networks. The reward function, available only during training with ground truth , is given by the improvement in Chamfer Distance:
This setup encourages the agent to make modifications that locally reduce geometric error with respect to the ground-truth shape.
3. Geometric Consistency and Inference-Time Selection
At inference time, ground truth is unavailable. RL-AD-Net integrates a non-parametric consistency module, PointNN, to decide whether the RL-refined or the original completion is geometrically superior. PointNN computes a normalized consistency score for each candidate point cloud, utilizing hierarchical furthest-point sampling, local -NN grouping, positional encodings, and global pooling. The mechanism is as follows:
- If , output ; otherwise, fallback to .
- At evaluation, selection is further restricted to completions that do not increase the Chamfer Distance.
This rule ensures that refinement cannot degrade geometric plausibility, and strictly improves or maintains the quality of the result.
4. Training Protocols and Category-Specificity
Single autoencoder and RL agent models across all ShapeNet classes were empirically observed to collapse due to divergent category-specific priors; the RL agent failed to find effective local refinements. Therefore, RL-AD-Net employs separate models for each object category. Each category-specific agent is trained for fewer than 100,000 iterations owing to the low dimensionality (128D) of the action space and the strong CD-based reward signal. Training combines Chamfer Distance and a consistency penalty for robust geometric learning.
TD3 is preferred over DDPG and PPO, as ablation studies indicate that PPO fails to improve Chamfer metrics and DDPG shows only marginal, variable gains. Hyperparameters include , replay buffer size , smoothing noise , action clipping , policy delay , and target soft-update rate , with 64-sample batches per update.
5. Experimental Evaluation and Ablations
Evaluation is conducted on five categories (Airplane, Chair, Lamp, Table, Car) from ShapeNetCore-2048. Two occlusion protocols stress-test the completion pipeline:
- Spherical cropping removes the points closest to a randomly chosen direction vector, simulating view-dependent occlusion (tested at 25% and 50% removal).
- Seed-point proximity cropping removes 40% of points within the minimal radius of a random seed, creating localized holes.
RL-AD-Net is applied as a refinement over transformer-based AdaPoinTr completions trained for 300 epochs on Chamfer Distance. The results demonstrate that RL-AD-Net reduces CD-L2 by 0.02–0.08 under spherical cropping, and nearly 1.0 under random cropping. Concrete examples include:
| Category | Protocol | Baseline CD | RL-AD-Net CD | Baseline F-score@1% | RL-AD-Net F-score@1% |
|---|---|---|---|---|---|
| Airplane | 50% spherical | 1.321 | 1.235 | 0.236 | 0.348 |
| Chair | 40% random | 5.574 | 4.658 | 0.221 | 0.279 |
Across all settings, the PointNN selector ensures no detrimental refinements are chosen, and the compact RL actor (0.21M parameters, 0.1 ms forward-pass latency) suffices for learning localized geometric corrections in the latent space without requiring retraining or modification of the base completion network.
6. Modularity, Generalization, and Limitations
RL-AD-Net is model-agnostic and applicable to broad classes of base completion networks due to its decoupled, modular design; adaptation to a new backend requires only training of the category-specific autoencoder and RL agent, not alterations to the base model. The method does not require retraining the base completion architecture, making it lightweight and practical for diverse deployment contexts.
A limitation, as established by ablation results, is that pooling all object categories within a single AE+RL agent framework leads to mode collapse and under-exploitation of category-specific geometry. Extension to multi-category refinement remains an area for future investigation.
7. Context and Significance
RL-AD-Net demonstrates that RL-driven latent space refinement can achieve significant improvements in local geometric fidelity over transformer-based and denoising models in both typical and atypical occlusion scenarios. The approach confirms the effectiveness of combining compact latent representations with continuous-control RL and non-parametric geometric selectors in 3D point cloud processing. Experimental results validate its capacity to boost both Chamfer Distance and F-score metrics, suggesting a promising avenue for post-hoc refinement in structured 3D reconstruction pipelines (Paregi et al., 21 Nov 2025).