FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction

Published 7 Aug 2025 in cs.RO and cs.AI | (2508.05153v1)

Abstract: Category-level generalization for robotic garment manipulation, such as bimanual smoothing, remains a significant hurdle due to high dimensionality, complex dynamics, and intra-category variations. Current approaches often struggle, either overfitting with concurrently learned visual features for a specific instance or, despite category-level perceptual generalization, failing to predict the value of synergistic bimanual actions. We propose the Feature-Conditioned Bimanual Value Network (FCBV-Net), operating on 3D point clouds to specifically enhance category-level policy generalization for garment smoothing. FCBV-Net conditions bimanual action value prediction on pre-trained, frozen dense geometric features, ensuring robustness to intra-category garment variations. Trainable downstream components then learn a task-specific policy using these static features. In simulated GarmentLab experiments with the CLOTH3D dataset, FCBV-Net demonstrated superior category-level generalization. It exhibited only an 11.5% efficiency drop (Steps80) on unseen garments compared to 96.2% for a 2D image-based baseline, and achieved 89% final coverage, outperforming an 83% coverage from a 3D correspondence-based baseline that uses identical per-point geometric features but a fixed primitive. These results highlight that the decoupling of geometric understanding from bimanual action value learning enables better category-level generalization.

Abstract PDF Upgrade to Chat

Summary

The paper introduces FCBV-Net, a novel architecture that decouples dense geometric feature extraction from action-value learning for improved garment smoothing.
It leverages pre-trained PointNet++ features to guide bimanual actions, achieving 89% garment coverage and reducing required steps on unseen garments.
Simulation results highlight significant improvements over 2D image-based methods, demonstrating superior category-level generalization and efficiency.

FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction

Introduction

The manipulation of deformable objects, particularly garments, remains a significant challenge in the field of robotics due to the complex dynamics and the high variability found within categories. Techniques currently available either tend to overfit to specific instances of garments or are unable to appropriately predict the value of cooperative bimanual actions that enable effective garment smoothing. The paper "FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction" proposes a solution named Feature-Conditioned Bimanual Value Network (FCBV-Net) which aims to enhance the ability of a robotic system to generalize its garment smoothing strategies across unseen instances within the same category by utilizing a 3D point cloud-based approach.

Problem Statement

Garment smoothing is a critical preprocessing step in various applications such as automated dressing systems and tasks involving cloth manipulation. The inherent deformability of garments, along with their complex dynamic properties and considerable intra-category variations, results in significant generalization challenges for robotic manipulators. The primary challenge tackled in this work is crafting a policy to enable efficient bimanual smoothing across a category of garments, promoting robust generalization beyond specific instances encountered during training. This is operationalized by transforming arbitrarily crumpled garments into a flattened state while maintaining robustness to variations in shape, size, and material within the same garment category.

FCBV-Net leverages 3D point clouds to achieve this generalization. A central aspect of this approach is the decoupling of fundamental geometric understanding from task-specific interaction value learning by conditioning the latter on robust pre-trained geometric feature representations. Decoupling these components permits the robotic system to better anticipate and respond to the synergistic effects of bimanual actions across novel instances.

Method

The proposed FCBV-Net operates on the principle of utilizing pre-trained and frozen dense geometric features to condition action-value functions, targeting improved category-level generalization for garment smoothing.

Dense Geometric Feature Extraction

A pre-trained dense geometric feature extractor based on PointNet++ is employed, which processes the 3D point cloud to assign a robust, deformation-invariant feature vector to each point. These extracted features enhance the network's comprehension of the garment's geometry.

Action Proposal and Value Prediction

The network's architecture comprises several key components:

ValueDecoderPN++ Network: This component processes the input point cloud through shared encoder-decoder architecture to produce initial grasp quality estimates and per-point embeddings.
Primitive Selection Head: This module utilizes global feature aggregation techniques like set abstraction and subsequently predicts the best manipulation primitive with the highest probability.
Candidate Action Sampling and Descriptor Construction: This step involves sampling candidate grasp points and their orientations. It constructs descriptors that encode unconditioned grasp quality and the selected manipulation primitive.
Final Conditioned Bimanual Value Head: An MLP-based component computes the conditioned action value, representing the expected reward for executing a specific bimanual action, allowing a deterministic policy to select the optimal action during execution.

Training Procedure

The FCBV-Net architecture is trained via an approach that marries initial human-annotated action datasets with substantial self-supervised experiences in a simulation environment. The training leverages a reward-based learning signal, with a focus on maximizing garment coverage and smoothing probability improvements. The learning process has been done in phases, iteratively training different network components to ensure optimal performance and robust category-level generalization.

Results and Analysis

In simulated experiments conducted within the GarmentLab environment using the CLOTH3D dataset, FCBV-Net demonstrated impressive category-level generalization performance, particularly on unseen garments. As shown in Table~\ref{tab:quantitative_results}, the FCBV-Net reported a minor efficiency drop of just 11.5% on unseen instances, compared to a 96.2% decline demonstrated by a conventional 2D image-based baseline.

FCBV-Net achieved a final garment coverage of 89% and required 2.9 steps on average to attain 80% coverage on unseen garments, significantly outperforming the 2D image-based Sim-SF (79% coverage, 5.1 steps on unseen garments) and performing better in terms of final coverage compared to a correspondence-based policy transfer method (UGM-PolicyTransfer), which achieved just 83% coverage. This underscores the efficacy of FCBV-Net's decoupling strategy, allowing the system to generalize its garment smoothing performance on unseen garments effectively.

Conclusion

The study presented FCBV-Net, a novel architecture for advancing category-level generalization in bimanual robotic garment smoothing tasks via feature-conditioned action-value prediction on 3D point clouds. FCBV-Net achieves improved generalization by incorporating pre-trained dense geometric feature conditioning into its value network, decoupling the learning of geometric features from action-value assessment. The simulation results substantiate the proposed method's advantages over established baselines, which showcase FCBV-Net's consistent efficiency and superior performance, particularly on unseen garment instances. The contribution here lies not only in the immediate performance gains but in suggesting a novel direction for advancing category-level generalization in robotic garment manipulation through the use of separated feature-conditioning strategies. Future directions may explore real-world implementations, addressing challenges associated with sim-to-real transfer, enriching feature semantic understanding, and extending this approach to encompass a broader array of garment typologies and robotic manipulation scenarios.

Markdown Report Issue