Papers
Topics
Authors
Recent
Search
2000 character limit reached

RL-AD-Net: RL-Driven 3D Point Cloud Refinement

Updated 26 November 2025
  • The paper proposes an RL-based refinement framework that adaptively adjusts latent vectors to reduce Chamfer Distance and improve local geometric fidelity.
  • It integrates a category-specific point autoencoder with a deterministic TD3 agent to modify incomplete 3D shapes without re-training the base completion network.
  • A non-parametric selector using PointNN ensures that only geometrically consistent refinements are chosen, as validated by improved metrics across ShapeNet categories.

RL-AD-Net is a reinforcement learning-based refinement framework for point cloud completion that operates in the latent space of a pretrained point autoencoder. Recent completion models, including transformer- and denoising-based approaches, typically reconstruct globally plausible 3D shapes from partial inputs but often introduce local geometric inconsistencies. RL-AD-Net addresses these inconsistencies by leveraging a reinforcement learning (RL) agent to perform continuous adaptive displacements within a compact latent representation, producing refined completions with improved local geometric fidelity. The method integrates a geometric consistency selector to ensure the output preserves or enhances the plausibility of the original completion, and is designed to be lightweight, modular, and agnostic to the underlying completion network (Paregi et al., 21 Nov 2025).

1. Underlying Architecture and Latent Space

At the core of RL-AD-Net is a category-specific point autoencoder (AE) inspired by PointNet. The encoder EθE_\theta is applied independently to each point in a 3D shape P∈R2048×3P \in \mathbb{R}^{2048\times 3} using a shared MLP, followed by a max-pooling operation to aggregate per-point features into a 128-dimensional global feature vector (GFV):

z=Eθ(P)∈R128z = E_\theta(P) \in \mathbb{R}^{128}

This GFV serves as a compact and semantically meaningful representation of the input’s global geometry. The decoder Dϕ:R128→R2048×3D_\phi : \mathbb{R}^{128} \to \mathbb{R}^{2048\times 3} is a simple multi-layer perceptron that upsamples the GFV back to a dense 3D point cloud. The autoencoder is trained per category on complete shapes for 400 epochs using the Adam optimizer (learning rate 1×10−41\times 10^{-4}, β1=0.9\beta_1 = 0.9, β2=0.999\beta_2 = 0.999, batch size 32). The objective is to minimize the bidirectional Chamfer Distance (CD), which enforces fidelity between the reconstructed and ground-truth point clouds and encourages the latent manifold to capture category-specific geometric priors.

2. RL-Based Adaptive Displacement in Latent Code

Given a pretrained completion network (e.g., AdaPoinTr) that generates a dense but possibly imperfect output PbaseP_{\mathrm{base}}, RL-AD-Net encodes this using the autoencoder to obtain z=Eθ(Pbase)z = E_\theta(P_{\mathrm{base}}). Refinement is posed as a Markov Decision Process (MDP), where:

  • The RL agent’s state is the current GFV zz.
  • The action at=Δza_t = \Delta z is a continuous displacement vector in R128\mathbb{R}^{128}.
  • The agent proposes a refined latent code z′=z+αΔzz' = z + \alpha \Delta z, where α\alpha is typically 0.1, and the corresponding refined completion is Pref=DÏ•(z′)P_{\mathrm{ref}} = D_\phi(z').

The RL policy is deterministic and implemented using Twin Delayed Deep Deterministic Policy Gradient (TD3), which is well-suited to high-dimensional, continuous action spaces. TD3 employs two critic networks Qθ1,Qθ2Q_{\theta_1}, Q_{\theta_2} and an actor network μψ\mu_\psi, along with slow-moving target networks. The reward function, available only during training with ground truth PgtP_{\mathrm{gt}}, is given by the improvement in Chamfer Distance:

rt=CD(Pbase,Pgt)−CD(Pref,Pgt)r_t = \mathrm{CD}(P_{\mathrm{base}}, P_{\mathrm{gt}}) - \mathrm{CD}(P_{\mathrm{ref}}, P_{\mathrm{gt}})

This setup encourages the agent to make modifications that locally reduce geometric error with respect to the ground-truth shape.

3. Geometric Consistency and Inference-Time Selection

At inference time, ground truth is unavailable. RL-AD-Net integrates a non-parametric consistency module, PointNN, to decide whether the RL-refined or the original completion is geometrically superior. PointNN computes a normalized consistency score q∈[0,1]q \in [0,1] for each candidate point cloud, utilizing hierarchical furthest-point sampling, local kk-NN grouping, positional encodings, and global pooling. The mechanism is as follows:

  • If qref>qbaseq_{\mathrm{ref}} > q_{\mathrm{base}}, output PrefP_{\mathrm{ref}}; otherwise, fallback to PbaseP_{\mathrm{base}}.
  • At evaluation, selection is further restricted to completions that do not increase the Chamfer Distance.

This rule ensures that refinement cannot degrade geometric plausibility, and strictly improves or maintains the quality of the result.

4. Training Protocols and Category-Specificity

Single autoencoder and RL agent models across all ShapeNet classes were empirically observed to collapse due to divergent category-specific priors; the RL agent failed to find effective local refinements. Therefore, RL-AD-Net employs separate models for each object category. Each category-specific agent is trained for fewer than 100,000 iterations owing to the low dimensionality (128D) of the action space and the strong CD-based reward signal. Training combines Chamfer Distance and a consistency penalty for robust geometric learning.

TD3 is preferred over DDPG and PPO, as ablation studies indicate that PPO fails to improve Chamfer metrics and DDPG shows only marginal, variable gains. Hyperparameters include γ=0.99\gamma = 0.99, replay buffer size 10510^5, smoothing noise σ=0.1\sigma = 0.1, action clipping c=0.5c = 0.5, policy delay d=2d=2, and target soft-update rate τ=0.005\tau=0.005, with 64-sample batches per update.

5. Experimental Evaluation and Ablations

Evaluation is conducted on five categories (Airplane, Chair, Lamp, Table, Car) from ShapeNetCore-2048. Two occlusion protocols stress-test the completion pipeline:

  • Spherical cropping removes the kk points closest to a randomly chosen direction vector, simulating view-dependent occlusion (tested at 25% and 50% removal).
  • Seed-point proximity cropping removes 40% of points within the minimal radius of a random seed, creating localized holes.

RL-AD-Net is applied as a refinement over transformer-based AdaPoinTr completions trained for 300 epochs on Chamfer Distance. The results demonstrate that RL-AD-Net reduces CD-L2 by 0.02–0.08 under spherical cropping, and nearly 1.0 under random cropping. Concrete examples include:

Category Protocol Baseline CD RL-AD-Net CD Baseline F-score@1% RL-AD-Net F-score@1%
Airplane 50% spherical 1.321 1.235 0.236 0.348
Chair 40% random 5.574 4.658 0.221 0.279

Across all settings, the PointNN selector ensures no detrimental refinements are chosen, and the compact RL actor (0.21M parameters, 0.1 ms forward-pass latency) suffices for learning localized geometric corrections in the latent space without requiring retraining or modification of the base completion network.

6. Modularity, Generalization, and Limitations

RL-AD-Net is model-agnostic and applicable to broad classes of base completion networks due to its decoupled, modular design; adaptation to a new backend requires only training of the category-specific autoencoder and RL agent, not alterations to the base model. The method does not require retraining the base completion architecture, making it lightweight and practical for diverse deployment contexts.

A limitation, as established by ablation results, is that pooling all object categories within a single AE+RL agent framework leads to mode collapse and under-exploitation of category-specific geometry. Extension to multi-category refinement remains an area for future investigation.

7. Context and Significance

RL-AD-Net demonstrates that RL-driven latent space refinement can achieve significant improvements in local geometric fidelity over transformer-based and denoising models in both typical and atypical occlusion scenarios. The approach confirms the effectiveness of combining compact latent representations with continuous-control RL and non-parametric geometric selectors in 3D point cloud processing. Experimental results validate its capacity to boost both Chamfer Distance and F-score metrics, suggesting a promising avenue for post-hoc refinement in structured 3D reconstruction pipelines (Paregi et al., 21 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RL-AD-Net.