6D在杂乱的场景中抓住是机器人操纵中的长期存在。由于状态估计不准确,开环操作管道可能会失败,而大多数端到端的掌握方法尚未缩放到具有障碍物的复杂场景。在这项工作中,我们提出了一种新的杂乱场景掌握的最终学习方法。我们的分层框架基于部分点云观测学习无碰撞目标驱动的抓取性。我们学习嵌入空间来编码培训期间的专家掌握计划和一个变形式自动化器,以在测试时间上采样不同的抓握轨迹。此外,我们培训批评网络的计划选择和选项分类器,用于通过分层加强学习切换到实例掌握策略。我们评估我们的方法并与仿真中的几个基线进行比较,并证明我们的潜在规划可以概括为真实的杂乱场景掌握任务。我们的视频和代码可以在https://sites.google.com/view/latent-grasping中找到。
translated by 谷歌翻译
在现实世界中,教授多指的灵巧机器人在现实世界中掌握物体,这是一个充满挑战的问题,由于其高维状态和动作空间。我们提出了一个机器人学习系统,该系统可以进行少量的人类示范,并学会掌握在某些被遮挡的观察结果的情况下掌握看不见的物体姿势。我们的系统利用了一个小型运动捕获数据集,并为多指的机器人抓手生成具有多种多样且成功的轨迹的大型数据集。通过添加域随机化,我们表明我们的数据集提供了可以将其转移到策略学习者的强大抓地力轨迹。我们训练一种灵活的抓紧策略,该策略将对象的点云作为输入,并预测连续的动作以从不同初始机器人状态掌握对象。我们在模拟中评估了系统对22多伏的浮动手的有效性,并在现实世界中带有kuka手臂的23多杆Allegro机器人手。从我们的数据集中汲取的政策可以很好地概括在模拟和现实世界中的看不见的对象姿势
translated by 谷歌翻译
Robot learning provides a number of ways to teach robots simple skills, such as grasping. However, these skills are usually trained in open, clutter-free environments, and therefore would likely cause undesirable collisions in more complex, cluttered environments. In this work, we introduce an affordance model based on a graph representation of an environment, which is optimised during deployment to find suitable robot configurations to start a skill from, such that the skill can be executed without any collisions. We demonstrate that our method can generalise a priori acquired skills to previously unseen cluttered and constrained environments, in simulation and in the real world, for both a grasping and a placing task.
translated by 谷歌翻译
操纵铰接对象通常需要多个机器人臂。使多个机器人武器能够在铰接物体上协作地完成操纵任务是一项挑战性。在本文中,我们呈现$ \ textbf {v-mao} $,这是一个学习铰接物体的多臂操纵的框架。我们的框架包括一个变分生成模型,可以为每个机器人臂的物体刚性零件学习接触点分布。从与模拟环境的交互获得训练信号,该模拟环境是通过规划和用于铰接对象的对象控制的新颖制定的新颖制定。我们在定制的Mujoco仿真环境中部署了我们的框架,并证明我们的框架在六种不同的对象和两个不同的机器人上实现了高成功率。我们还表明,生成建模可以有效地学习铰接物体上的接触点分布。
translated by 谷歌翻译
在本文中,我们研究了可以从原始图像中学习低级技能的曲目的问题,这些曲目可以测序以完成长效的视觉运动任务。强化学习(RL)是一种自主获取短疗法技能的有前途的方法。但是,RL算法的重点很大程度上是这些个人技能的成功,而不是学习和扎根大量的技能曲目,这些技能可以对这些技能进行测序,这些技能可以对完成扩展的多阶段任务进行测序。后者需要稳健性和持久性,因为技能的错误会随着时间的流逝而复杂,并且可能要求机器人在其曲目中具有许多原始技能,而不仅仅是一个。为此,我们介绍了Ember,Ember是一种基于模型的RL方法,用于学习原始技能,适合完成长途视觉运动任务。 Ember使用学识渊博的模型,评论家和成功分类器学习和计划,成功分类器既可以作为RL的奖励功能,又是一种基础机制,可连续检测机器人在失败或扰动下是否应重试技能。此外,学到的模型是任务不合时宜的,并使用来自所有技能的数据进行了培训,从而使机器人能够有效地学习许多不同的原语。这些视觉运动原始技能及其相关的前后条件可以直接与现成的符号计划者结合在一起,以完成长途任务。在Franka Emika机器人部门上,我们发现Ember使机器人能够以85%的成功率完成三个长马视觉运动任务,例如组织办公桌,文件柜和抽屉,需要排序多达12个技能,这些技能最多需要12个技能,涉及14个独特的学识渊博,并要求对新物体进行概括。
translated by 谷歌翻译
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.
translated by 谷歌翻译
长摩根和包括一系列隐性子任务的日常任务仍然在离线机器人控制中构成了重大挑战。尽管许多先前的方法旨在通过模仿和离线增强学习的变体来解决这种设置,但学习的行为通常是狭窄的,并且经常努力实现可配置的长匹配目标。由于这两个范式都具有互补的优势和劣势,因此我们提出了一种新型的层次结构方法,结合了两种方法的优势,以从高维相机观察中学习任务无关的长胜压策略。具体而言,我们结合了一项低级政策,该政策通过模仿学习和从离线强化学习中学到的高级政策学习潜在的技能,以促进潜在的行为先验。各种模拟和真实机器人控制任务的实验表明,我们的配方使以前看不见的技能组合能够通过“缝制”潜在技能通过目标链条,并在绩效上提高绩效的顺序,从而实现潜在的目标。艺术基线。我们甚至还学习了一个多任务视觉运动策略,用于现实世界中25个不同的操纵任务,这既优于模仿学习和离线强化学习技术。
translated by 谷歌翻译
我们建议学习使用隐式功能通过灵巧的手来产生抓握运动来操纵。通过连续的时间输入,该模型可以生成连续且平滑的抓握计划。我们命名了建议的模型连续掌握函数(CGF)。 CGF是通过使用3D人类演示的有条件变异自动编码器的生成建模来学习的。我们将首先通过运动重试将大规模的人类对象相互作用轨迹转换为机器人演示,然后使用这些演示训练CGF。在推断期间,我们使用CGF进行采样,以在模拟器中生成不同的抓握计划,并选择成功的抓握计划以转移到真实的机器人中。通过对不同人类数据的培训,我们的CGF允许概括来操纵多个对象。与以前的计划算法相比,CGF更有效,并且在转移到真正的Allegro手抓住的情况下,成功率显着提高。我们的项目页面位于https://jianglongye.com/cgf
translated by 谷歌翻译
抓握是通过在一组触点上施加力和扭矩来挑选对象的过程。深度学习方法的最新进展允许在机器人对象抓地力方面快速进步。我们在过去十年中系统地调查了出版物,特别感兴趣使用最终效果姿势的所有6度自由度抓住对象。我们的综述发现了四种用于机器人抓钩的常见方法:基于抽样的方法,直接回归,强化学习和示例方法。此外,我们发现了围绕抓握的两种“支持方法”,这些方法使用深入学习来支持抓握过程,形状近似和负担能力。我们已经将本系统评论(85篇论文)中发现的出版物提炼为十个关键要点,我们认为对未来的机器人抓握和操纵研究至关重要。该调查的在线版本可从https://rhys-newbury.github.io/projects/6dof/获得
translated by 谷歌翻译
在现实世界中的机器人在现实环境中的许多可能的应用领域都铰接机器人掌握物体的能力。因此,机器人Grasping多年来一直是有效的研究领域。通过我们的出版物,我们有助于使机器人能够掌握,特别关注垃圾桶采摘应用。垃圾拣选尤其挑战,由于经常杂乱和非结构化的物体排列以及通过简单的顶部掌握的物体的频繁避免的避神。为了解决这些挑战,我们提出了一种基于软演员 - 评论家(SAC)的混合离散调整的完全自我监督的强化学习方法。我们使用参数化运动原语来推动和抓握运动,以便为我们考虑的困难设置启用灵活的适应行为。此外,我们使用数据增强来提高样本效率。我们证明了我们提出的关于具有挑战性的采摘情景的方法,其中平面掌握学习或行动离散化方法会面临很大困难
translated by 谷歌翻译
We formulate grasp learning as a neural field and present Neural Grasp Distance Fields (NGDF). Here, the input is a 6D pose of a robot end effector and output is a distance to a continuous manifold of valid grasps for an object. In contrast to current approaches that predict a set of discrete candidate grasps, the distance-based NGDF representation is easily interpreted as a cost, and minimizing this cost produces a successful grasp pose. This grasp distance cost can be incorporated directly into a trajectory optimizer for joint optimization with other costs such as trajectory smoothness and collision avoidance. During optimization, as the various costs are balanced and minimized, the grasp target is allowed to smoothly vary, as the learned grasp field is continuous. In simulation benchmarks with a Franka arm, we find that joint grasping and planning with NGDF outperforms baselines by 63% execution success while generalizing to unseen query poses and unseen object shapes. Project page: https://sites.google.com/view/neural-grasp-distance-fields.
translated by 谷歌翻译
Generating grasp poses is a crucial component for any robot object manipulation task. In this work, we formulate the problem of grasp generation as sampling a set of grasps using a variational autoencoder and assess and refine the sampled grasps using a grasp evaluator model. Both Grasp Sampler and Grasp Refinement networks take 3D point clouds observed by a depth camera as input. We evaluate our approach in simulation and real-world robot experiments. Our approach achieves 88% success rate on various commonly used objects with diverse appearances, scales, and weights. Our model is trained purely in simulation and works in the real world without any extra steps. The video of our experiments can be found here.
translated by 谷歌翻译
每个房屋都是不同的,每个人都喜欢以特殊方式完成的事情。因此,未来的家庭机器人需要既需要理由就日常任务的顺序性质,又要推广到用户的偏好。为此,我们提出了一个变压器任务计划者(TTP),该计划通过利用基于对象属性的表示来从演示中学习高级动作。TTP可以在多个偏好上进行预训练,并显示了使用单个演示作为模拟洗碗机加载任务中的提示的概括性的概括。此外,我们使用TTP与Franka Panda机器人臂一起展示了现实世界中的重排,并使用单一的人类示范引起了这种情况。
translated by 谷歌翻译
Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures out of a single RGB-D image based on high-level language goals, such as "set the table." Our method shows how diffusion models can be used for complex multi-step 3D planning tasks. StructDiffusion improves success rate on assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model, while allowing us to use one multi-task model to produce a wider range of different structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. For videos and additional results, check out our website: http://weiyuliu.com/StructDiffusion/.
translated by 谷歌翻译
强化学习可以培训有效执行复杂任务的政策。然而,对于长地平线任务,这些方法的性能与地平线脱落,通常需要推理和构成较低级别的技能。等级强化学习旨在通过为行动抽象提供一组低级技能来实现这一点。通过抽象空间状态,层次结构也可以进一步提高这一点。我们对适当的状态抽象应取决于可用的较低级别策略的功能。我们提出了价值函数空间:通过使用与每个较低级别的技能对应的值函数来产生这种表示的简单方法。这些价值函数捕获场景的可取性,从而形成了紧凑型摘要任务相关信息的表示,并强大地忽略了分散的人。迷宫解决和机器人操纵任务的实证评估表明,我们的方法提高了长地平的性能,并且能够比替代的无模型和基于模型的方法能够更好的零拍泛化。
translated by 谷歌翻译
Solving real-world sequential manipulation tasks requires robots to have a repertoire of skills applicable to a wide range of circumstances. To acquire such skills using data-driven approaches, we need massive and diverse training data which is often labor-intensive and non-trivial to collect and curate. In this work, we introduce Active Task Randomization (ATR), an approach that learns visuomotor skills for sequential manipulation by automatically creating feasible and novel tasks in simulation. During training, our approach procedurally generates tasks using a graph-based task parameterization. To adaptively estimate the feasibility and novelty of sampled tasks, we develop a relational neural network that maps each task parameter into a compact embedding. We demonstrate that our approach can automatically create suitable tasks for efficiently training the skill policies to handle diverse scenarios with a variety of objects. We evaluate our method on simulated and real-world sequential manipulation tasks by composing the learned skills using a task planner. Compared to baseline methods, the skills learned using our approach consistently achieve better success rates.
translated by 谷歌翻译
可变形的物体操纵在我们的日常生活中具有许多应用,例如烹饪和洗衣折叠。操纵弹性塑料对象(例如面团)特别具有挑战性,因为面团缺乏紧凑的状态表示,需要接触丰富的相互作用。我们考虑将面团从RGB-D图像中变成特定形状的任务。尽管该任务对于人类来说似乎是直观的,但对于诸如幼稚轨迹优化之类的常见方法,存在局部最佳选择。我们提出了一种新型的轨迹优化器,该优化器通过可区分的“重置”模块进行优化,将单阶段的固定定位轨迹转换为多阶段的多阶段多启动轨迹,其中所有阶段均已共同优化。然后,我们对轨迹优化器生成的演示进行训练闭环政策。我们的策略将部分点云作为输入,从而使从模拟到现实世界的转移易于转移。我们表明,我们的政策可以执行现实世界的面团操纵,将面团的球弄平到目标形状。
translated by 谷歌翻译
3D视觉输入的对象操纵对构建可宽大的感知和政策模型构成了许多挑战。然而,现有基准中的3D资产主要缺乏与拓扑和几何中的现实世界内复杂的3D形状的多样性。在这里,我们提出了Sapien操纵技能基准(Manishill)以在全物理模拟器中的各种物体上基准操纵技巧。 Manishill中的3D资产包括大型课堂内拓扑和几何变化。仔细选择任务以涵盖不同类型的操纵挑战。 3D Vision的最新进展也使我们认为我们应该定制基准,以便挑战旨在邀请研究3D深入学习的研究人员。为此,我们模拟了一个移动的全景摄像头,返回以自我为中心的点云或RGB-D图像。此外,我们希望Manishill是为一个对操纵研究感兴趣的广泛研究人员提供服务。除了支持从互动的政策学习,我们还支持学习 - 从演示(LFD)方法,通过提供大量的高质量演示(〜36,000个成功的轨迹,总共〜1.5米点云/ RGB-D帧)。我们提供使用3D深度学习和LFD算法的基线。我们的基准(模拟器,环境,SDK和基线)的所有代码都是开放的,并且将基于基准举办跨学科研究人员面临的挑战。
translated by 谷歌翻译
As the basis for prehensile manipulation, it is vital to enable robots to grasp as robustly as humans. In daily manipulation, our grasping system is prompt, accurate, flexible and continuous across spatial and temporal domains. Few existing methods cover all these properties for robot grasping. In this paper, we propose a new methodology for grasp perception to enable robots these abilities. Specifically, we develop a dense supervision strategy with real perception and analytic labels in the spatial-temporal domain. Additional awareness of objects' center-of-mass is incorporated into the learning process to help improve grasping stability. Utilization of grasp correspondence across observations enables dynamic grasp tracking. Our model, AnyGrasp, can generate accurate, full-DoF, dense and temporally-smooth grasp poses efficiently, and works robustly against large depth sensing noise. Embedded with AnyGrasp, we achieve a 93.3% success rate when clearing bins with over 300 unseen objects, which is comparable with human subjects under controlled conditions. Over 900 MPPH is reported on a single-arm system. For dynamic grasping, we demonstrate catching swimming robot fish in the water.
translated by 谷歌翻译
强化学习是机器人抓握的一种有前途的方法,因为它可以在困难的情况下学习有效的掌握和掌握政策。但是,由于问题的高维度,用精致的机器人手来实现类似人类的操纵能力是具有挑战性的。尽管可以采用奖励成型或专家示范等补救措施来克服这个问题,但它们通常导致过分简化和有偏见的政策。我们介绍了Dext-Gen,这是一种在稀疏奖励环境中灵巧抓握的强化学习框架,适用于各种抓手,并学习无偏见和复杂的政策。通过平滑方向表示实现了抓地力和物体的完全方向控制。我们的方法具有合理的培训时间,并提供了包括所需先验知识的选项。模拟实验证明了框架对不同方案的有效性和适应性。
translated by 谷歌翻译