Perceiving and manipulating objects in a generalizable way has been actively studied by the computer vision and robotics communities, where cross-category generalizable manipulation skills are highly desired yet underexplored. In this work, we propose to learn such generalizable perception and manipulation via Generalizable and Actionable Parts (GAParts). By identifying and defining 9 GAPart classes (e.g. buttons, handles, etc), we show that our part-centric approach allows our method to learn object perception and manipulation skills from seen object categories and directly generalize to unseen categories. Following the GAPart definition, we construct a large-scale part-centric interactive dataset, GAPartNet, where rich, part-level annotations (semantics, poses) are provided for 1166 objects and 8489 part instances. Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and part-based object manipulation. Given the large domain gaps between seen and unseen object categories, we propose a strong 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques. Our method outperforms all existing methods by a large margin, no matter on seen or unseen categories. Furthermore, with part segmentation and pose estimation results, we leverage the GAPart pose definition to design part-based manipulation heuristics that can generalize well to unseen object categories in both simulation and real world. The dataset and code will be released.
translated by 谷歌翻译