Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
许多涉及某种形式的3D视觉感知的机器人任务极大地受益于对工作环境的完整知识。但是,机器人通常必须应对非结构化的环境,并且由于工作空间有限,混乱或对象自我划分,它们的车载视觉传感器只能提供不完整的信息。近年来,深度学习架构的形状完成架构已开始将牵引力作为从部分视觉数据中推断出完整的3D对象表示的有效手段。然而,大多数现有的最新方法都以体素电网形式提供了固定的输出分辨率,这与神经网络输出阶段的大小严格相关。尽管这足以完成某些任务,例如导航,抓握和操纵的障碍需要更精细的分辨率,并且简单地扩大神经网络输出在计算上是昂贵的。在本文中,我们通过基于隐式3D表示的对象形状完成方法来解决此限制,该方法为每个重建点提供了置信值。作为第二个贡献,我们提出了一种基于梯度的方法,用于在推理时在任意分辨率下有效地采样这种隐式函数。我们通过将重建的形状与地面真理进行比较,并通过在机器人握把管道中部署形状完成算法来实验验证我们的方法。在这两种情况下,我们将结果与最先进的形状完成方法进行了比较。
translated by 谷歌翻译
Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
translated by 谷歌翻译
成功掌握对象的能力在机器人中是至关重要的,因为它可以实现多个交互式下游应用程序。为此,大多数方法要么计算兴趣对象的完整6D姿势,要么学习预测一组掌握点。虽然前一种方法对多个对象实例或类没有很好地扩展,但后者需要大的注释数据集,并且受到新几何形状的普遍性能力差的阻碍。为了克服这些缺点,我们建议教授一个机器人如何用简单而简短的人类示范掌握一个物体。因此,我们的方法既不需要许多注释图像,也不限于特定的几何形状。我们首先介绍了一个小型RGB-D图像,显示人对象交互。然后利用该序列来构建表示所描绘的交互的相关手和对象网格。随后,我们完成重建对象形状的缺失部分,并估计了场景中的重建和可见对象之间的相对变换。最后,我们从物体和人手之间的相对姿势转移a-prioriz知识,随着当前对象在场景中的估计到机器人的必要抓握指令。与丰田的人类支持机器人(HSR)在真实和合成环境中的详尽评估证明了我们所提出的方法的适用性及其优势与以前的方法相比。
translated by 谷歌翻译
Generating grasp poses is a crucial component for any robot object manipulation task. In this work, we formulate the problem of grasp generation as sampling a set of grasps using a variational autoencoder and assess and refine the sampled grasps using a grasp evaluator model. Both Grasp Sampler and Grasp Refinement networks take 3D point clouds observed by a depth camera as input. We evaluate our approach in simulation and real-world robot experiments. Our approach achieves 88% success rate on various commonly used objects with diverse appearances, scales, and weights. Our model is trained purely in simulation and works in the real world without any extra steps. The video of our experiments can be found here.
translated by 谷歌翻译
抓握是通过在一组触点上施加力和扭矩来挑选对象的过程。深度学习方法的最新进展允许在机器人对象抓地力方面快速进步。我们在过去十年中系统地调查了出版物,特别感兴趣使用最终效果姿势的所有6度自由度抓住对象。我们的综述发现了四种用于机器人抓钩的常见方法:基于抽样的方法,直接回归,强化学习和示例方法。此外,我们发现了围绕抓握的两种“支持方法”,这些方法使用深入学习来支持抓握过程,形状近似和负担能力。我们已经将本系统评论(85篇论文)中发现的出版物提炼为十个关键要点,我们认为对未来的机器人抓握和操纵研究至关重要。该调查的在线版本可从https://rhys-newbury.github.io/projects/6dof/获得
translated by 谷歌翻译
6-DOF GRASP姿势检测多盖和多对象是智能机器人领域的挑战任务。为了模仿人类的推理能力来抓住对象,广泛研究了数据驱动的方法。随着大规模数据集的引入,我们发现单个物理度量通常会产生几个离散水平的掌握置信分数,这无法很好地区分数百万的掌握姿势并导致不准确的预测结果。在本文中,我们提出了一个混合物理指标来解决此评估不足。首先,我们定义一个新的度量标准是基于力闭合度量的,并通过对象平坦,重力和碰撞的测量来补充。其次,我们利用这种混合物理指标来产生精致的置信度评分。第三,为了有效地学习新的置信度得分,我们设计了一个称为平面重力碰撞抓氏(FGC-Graspnet)的多分辨率网络。 FGC-GRASPNET提出了多个任务的多分辨率特征学习体系结构,并引入了新的关节损失函数,从而增强了GRASP检测的平均精度。网络评估和足够的实际机器人实验证明了我们混合物理指标和FGC-GraspNet的有效性。我们的方法在现实世界中混乱的场景中达到了90.5 \%的成功率。我们的代码可在https://github.com/luyh20/fgc-graspnet上找到。
translated by 谷歌翻译
我们提出了多视图表演者(MVP) - 从一系列时间顺序的视图中完成3D形状完成的新体系结构。MVP通过使用称为表演者的线性注意变压器来完成此任务。我们的模型允许当前对场景的观察到以前的观察,以更准确地填充。过去观察的历史通过紧凑的关联内存来压缩,该记忆近似于现代连续的霍普菲尔德内存,但至关重要的是与历史长度无关。我们将模型与几个基线进行比较,以便随着时间的推移完成形状完成,这证明了MVP提供的概括。据我们所知,MVP是第一个多重视图体素重建方法,它不需要对多个深度视图进行注册,也需要第一个基于因果变压器的模型进行3D形状完成。
translated by 谷歌翻译
从点云输入中的6-DOF GRASP学习中取得了巨大的成功,但是由于点集无秩序而引起的计算成本仍然是一个令人关注的问题。另外,我们从本文中的RGB-D输入中探讨了GRASP的生成。提出的解决方案Kepoint-GraspNet检测图像空间中Gripper Kepoint的投影,然后用PNP算法恢复SE(3)姿势。建立了基于原始形状和抓住家族的合成数据集来检查我们的想法。基于公制的评估表明,我们的方法在掌握建议的准确性,多样性和时间成本方面优于基准。最后,机器人实验显示出很高的成功率,证明了在现实世界应用中的想法的潜力。
translated by 谷歌翻译
Grasp learning has become an exciting and important topic in robotics. Just a few years ago, the problem of grasping novel objects from unstructured piles of clutter was considered a serious research challenge. Now, it is a capability that is quickly becoming incorporated into industrial supply chain automation. How did that happen? What is the current state of the art in robotic grasp learning, what are the different methodological approaches, and what machine learning models are used? This review attempts to give an overview of the current state of the art of grasp learning research.
translated by 谷歌翻译
如今,机器人在我们的日常生活中起着越来越重要的作用。在以人为本的环境中,机器人经常会遇到成堆的对象,包装的项目或孤立的对象。因此,机器人必须能够在各种情况下掌握和操纵不同的物体,以帮助人类进行日常任务。在本文中,我们提出了一种多视图深度学习方法,以处理以人为中心的域中抓住强大的对象。特别是,我们的方法将任意对象的点云作为输入,然后生成给定对象的拼字图。获得的视图最终用于估计每个对象的像素抓握合成。我们使用小对象抓住数据集训练模型端到端,并在模拟和现实世界数据上对其进行测试,而无需进行任何进一步的微调。为了评估所提出方法的性能,我们在三种情况下进行了广泛的实验集,包括孤立的对象,包装的项目和一堆对象。实验结果表明,我们的方法在所有仿真和现实机器人方案中都表现出色,并且能够在各种场景配置中实现新颖对象的可靠闭环抓握。
translated by 谷歌翻译
在机器人操作中,以前未见的新物体的自主抓住是一个持续的挑战。在过去的几十年中,已经提出了许多方法来解决特定机器人手的问题。最近引入的Unigrasp框架具有推广到不同类型的机器人抓手的能力。但是,此方法不适用于具有闭环约束的抓手,并且当应用于具有MultiGRASP配置的机器人手时,具有数据范围。在本文中,我们提出了有效绘制的,这是一种独立于抓手模型规范的广义掌握合成和抓地力控制方法。有效绘制利用抓地力工作空间功能,而不是Unigrasp的抓属属性输入。这在训练过程中将记忆使用量减少了81.7%,并可以推广到更多类型的抓地力,例如具有闭环约束的抓手。通过在仿真和现实世界中进行对象抓住实验来评估有效绘制的有效性;结果表明,所提出的方法在仅考虑没有闭环约束的抓手时也胜过Unigrasp。在这些情况下,有效抓取在产生接触点的精度高9.85%,模拟中的握把成功率提高了3.10%。现实世界实验是用带有闭环约束的抓地力进行的,而Unigrasp无法处理,而有效绘制的成功率达到了83.3%。分析了该方法的抓地力故障的主要原因,突出了增强掌握性能的方法。
translated by 谷歌翻译
In this paper, we focus on the problem of feature learning in the presence of scale imbalance for 6-DoF grasp detection and propose a novel approach to especially address the difficulty in dealing with small-scale samples. A Multi-scale Cylinder Grouping (MsCG) module is presented to enhance local geometry representation by combining multi-scale cylinder features and global context. Moreover, a Scale Balanced Learning (SBL) loss and an Object Balanced Sampling (OBS) strategy are designed, where SBL enlarges the gradients of the samples whose scales are in low frequency by apriori weights while OBS captures more points on small-scale objects with the help of an auxiliary segmentation network. They alleviate the influence of the uneven distribution of grasp scales in training and inference respectively. In addition, Noisy-clean Mix (NcM) data augmentation is introduced to facilitate training, aiming to bridge the domain gap between synthetic and raw scenes in an efficient way by generating more data which mix them into single ones at instance-level. Extensive experiments are conducted on the GraspNet-1Billion benchmark and competitive results are reached with significant gains on small-scale cases. Besides, the performance of real-world grasping highlights its generalization ability. Our code is available at https://github.com/mahaoxiang822/Scale-Balanced-Grasp.
translated by 谷歌翻译
形状通知如何将对象掌握,无论是如何以及如何。因此,本文介绍了一种基于分割的架构,用于将用深度摄像机进行分解为多个基本形状的对象,以及用于机器人抓握的后处理管道。分段采用深度网络,称为PS-CNN,在具有6个类的原始形状和使用模拟引擎生成的合成数据上培训。每个原始形状都设计有参数化掌握家族,允许管道识别每个形状区域的多个掌握候选者。掌握是排序的排名,选择用于执行的第一个可行的。对于无任务掌握单个对象,该方法达到94.2%的成功率将其放置在顶部执行掌握方法中,与自上而下和SE(3)基础相比。涉及变量观点和杂波的其他测试展示了设置的鲁棒性。对于面向任务的掌握,PS-CNN实现了93.0%的成功率。总体而言,结果支持该假设,即在抓地管道内明确地编码形状原语应该提高掌握性能,包括无任务和任务相关的掌握预测。
translated by 谷歌翻译
在本文中,我们提出了一个基于变压器的架构,即TF-Grasp,用于机器人Grasp检测。开发的TF-Grasp框架具有两个精心设计的设计,使其非常适合视觉抓握任务。第一个关键设计是,我们采用本地窗口的注意来捕获本地上下文信息和可抓取对象的详细特征。然后,我们将跨窗户注意力应用于建模遥远像素之间的长期依赖性。对象知识,环境配置和不同视觉实体之间的关系汇总以进行后续的掌握检测。第二个关键设计是,我们构建了具有跳过连接的层次编码器架构,从编码器到解码器提供了浅特征,以启用多尺度功能融合。由于具有强大的注意力机制,TF-Grasp可以同时获得局部信息(即对象的轮廓),并建模长期连接,例如混乱中不同的视觉概念之间的关系。广泛的计算实验表明,TF-GRASP在康奈尔(Cornell)和雅克(Jacquard)握把数据集上分别获得了较高的结果与最先进的卷积模型,并获得了97.99%和94.6%的较高精度。使用7DOF Franka Emika Panda机器人进行的现实世界实验也证明了其在各种情况下抓住看不见的物体的能力。代码和预培训模型将在https://github.com/wangshaosun/grasp-transformer上找到
translated by 谷歌翻译
在本文中,我们探讨了机器人是否可以学会重新应用一组多样的物体以实现各种所需的掌握姿势。只要机器人的当前掌握姿势未能执行所需的操作任务,需要重新扫描。具有这种能力的赋予机器人具有在许多领域中的应用,例如制造或国内服务。然而,由于日常物体中的几何形状和状态和行动空间的高维度,这是一个具有挑战性的任务。在本文中,我们提出了一种机器人系统,用于将物体的部分点云和支持环境作为输入,输出序列和放置操作的序列来转换到所需的对象掌握姿势。关键技术包括神经稳定放置预测器,并通过利用和改变周围环境来引发基于图形的解决方案。我们介绍了一个新的和具有挑战性的合成数据集,用于学习和评估所提出的方法。我们展示了我们提出的系统与模拟器和现实世界实验的有效性。我们的项目网页上有更多视频和可视化示例。
translated by 谷歌翻译
A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources. Prior works either extract information from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications. In this work, we present DenseFusion, a generic framework for estimating 6D pose of a set of known objects from RGB-D images. DenseFusion is a heterogeneous architecture that processes the two data sources individually and uses a novel dense fusion network to extract pixel-wise dense feature embedding, from which the pose is estimated. Furthermore, we integrate an end-to-end iterative pose refinement procedure that further improves the pose estimation while achieving near real-time inference. Our experiments show that our method outperforms state-of-the-art approaches in two datasets, YCB-Video and LineMOD. We also deploy our proposed method to a real robot to grasp and manipulate objects based on the estimated pose. Our code and video are available at https://sites.google.com/view/densefusion/.
translated by 谷歌翻译
对于机器人来说,在混乱的场景中抓住检测是一项非常具有挑战性的任务。生成合成抓地数据是训练和测试抓握方法的流行方式,DEX-NET和GRASPNET也是如此。然而,这些方法在3D合成对象模型上生成了训练掌握,但是在具有不同分布的图像或点云上进行评估,从而降低了由于稀疏的掌握标签和协变量移位而在真实场景上的性能。为了解决现有的问题,我们提出了一种新型的policy抓取检测方法,该方法可以用RGB-D图像生成的密集像素级抓握标签对相同的分布进行训练和测试。提出了一种并行深度的掌握生成(PDG生成)方法,以通过并行的投射点的新成像模型生成平行的深度图像;然后,该方法为每个像素生成多个候选抓地力,并通过平坦检测,力闭合度量和碰撞检测获得可靠的抓地力。然后,构建并释放了大型综合像素级姿势数据集(PLGP数据集)。该数据集使用先前的数据集和稀疏的Grasp样品区分开,是第一个像素级掌握数据集,其上的分布分布基于深度图像生成了grasps。最后,我们建立和测试了一系列像素级的抓地力检测网络,并通过数据增强过程进行不平衡训练,该过程以输入RGB-D图像的方式学习抓握姿势。广泛的实验表明,我们的policy掌握方法可以在很大程度上克服模拟与现实之间的差距,并实现最新的性能。代码和数据可在https://github.com/liuchunsense/plgp-dataset上提供。
translated by 谷歌翻译
Recent 3D-based manipulation methods either directly predict the grasp pose using 3D neural networks, or solve the grasp pose using similar objects retrieved from shape databases. However, the former faces generalizability challenges when testing with new robot arms or unseen objects; and the latter assumes that similar objects exist in the databases. We hypothesize that recent 3D modeling methods provides a path towards building digital replica of the evaluation scene that affords physical simulation and supports robust manipulation algorithm learning. We propose to reconstruct high-quality meshes from real-world point clouds using state-of-the-art neural surface reconstruction method (the Real2Sim step). Because most simulators take meshes for fast simulation, the reconstructed meshes enable grasp pose labels generation without human efforts. The generated labels can train grasp network that performs robustly in the real evaluation scene (the Sim2Real step). In synthetic and real experiments, we show that the Real2Sim2Real pipeline performs better than baseline grasp networks trained with a large dataset and a grasp sampling method with retrieval-based reconstruction. The benefit of the Real2Sim2Real pipeline comes from 1) decoupling scene modeling and grasp sampling into sub-problems, and 2) both sub-problems can be solved with sufficiently high quality using recent 3D learning algorithms and mesh-based physical simulation techniques.
translated by 谷歌翻译
掌握姿势估计是机器人与现实世界互动的重要问题。但是,大多数现有方法需要事先可用的精确3D对象模型或大量的培训注释。为了避免这些问题,我们提出了transrasp,一种类别级别的rasp姿势估计方法,该方法通过仅标记一个对象实例来预测一类对象的掌握姿势。具体而言,我们根据其形状对应关系进行掌握姿势转移,并提出一个掌握姿势细化模块,以进一步微调抓地力姿势,以确保成功的掌握。实验证明了我们方法对通过转移的抓握姿势实现高质量抓地力的有效性。我们的代码可在https://github.com/yanjh97/transgrasp上找到。
translated by 谷歌翻译