与人一起工作的协作机器人(配件)必须能够快速学习新技能并适应新的任务配置。从演示中学习(LFD)使柯伯特能够学习并适应不同的使用条件。但是,最先进的LFD方法需要手动调整固有参数,并且很少在没有专家的工业环境中使用。在本文中,介绍了与幼稚用户的工业应用程序开发和实施。我们提出了一种基于概率运动基础的无参数方法,其中所有参数均使用Jensen-Shannon Divergence和贝叶斯优化进行预定。因此,用户不必执行手动参数调整。该方法从用户演示的小数据集中学习动作,并将运动推广到各种情况和条件。我们在两个现场测试中广泛评估了该方法:一个在电梯门维护上工作的方法是一个在其中,其中三名辛德勒工人教授Cobot任务对其工作流程有用。 Cobot最终效果和目标位置之间的错误范围从$ 0 $到$ 1.48 \ pm0.35 $ mm。对于所有测试,没有任何任务失败报告。 Schindler工人完成的问卷突出了该方法的易用性,安全性和重复运动的准确性。我们的代码和记录的轨迹可在线提供以进行复制。
translated by 谷歌翻译
在本次调查中,我们介绍了执行需要不同于环境的操作任务的机器人的当前状态,使得机器人必须隐含地或明确地控制与环境的接触力来完成任务。机器人可以执行越来越多的人体操作任务,并且在1)主题上具有越来越多的出版物,其执行始终需要联系的任务,并且通过利用完美的任务来减轻环境来缓解不确定性信息,可以在没有联系的情况下进行。最近的趋势已经看到机器人在留下的人类留给人类,例如按摩,以及诸如PEG孔的经典任务中,对其他类似任务的概率更有效,更好的误差容忍以及更快的规划或学习任务。因此,在本调查中,我们涵盖了执行此类任务的机器人的当前阶段,从调查开始所有不同的联系方式机器人可以执行,观察这些任务是如何控制和表示的,并且最终呈现所需技能的学习和规划完成这些任务。
translated by 谷歌翻译
Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior in case of versatile demonstrations. Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility. To facilitate generalization to novel task configurations, we do not directly match the agent's and expert's trajectory distributions but rather work with concise geometric descriptors which generalize well to unseen task configurations. We empirically validate our method on various robot tasks using versatile human demonstrations and compare to imitation learning algorithms in a state-action setting as well as a trajectory-based setting. We find that the geometric descriptors greatly help in generalizing to new task configurations and that combining them with our distribution-matching objective is crucial for representing and reproducing versatile behavior.
translated by 谷歌翻译
Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years, however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations; without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction and computer games to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this paper, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications and highlight current and future research directions.
translated by 谷歌翻译
从演示中学习(LFD)是一种从人提供的演示中复制和概括机器人技能的流行方法。在本文中,我们提出了一种基于优化的新型LFD方法,该方法将演示描述为弹性图。弹性图是通过弹簧网格连接的节点的图。我们通过将弹性地图拟合到一组演示中来构建技能模型。我们方法中的公式优化问题包括三个具有自然和物理解释的目标。主术语奖励笛卡尔坐标中的平方误差。第二项惩罚了导致最佳轨迹总长度的点的非等应存在分布。第三学期奖励平滑度,同时惩罚非线性。这些二次目标形成了凸问题,可以通过局部优化器有效地解决。我们研究了九种用于构建和加权弹性图并研究其在机器人任务中的性能的方法。我们还使用UR5E操纵器组在几个模拟和现实世界中评估了所提出的方法,并将其与其他LFD方法进行比较,以证明其在各种指标中的好处和灵活性。
translated by 谷歌翻译
将机器人放置在受控条件外,需要多功能的运动表示,使机器人能够学习新任务并使其适应环境变化。在工作区中引入障碍或额外机器人的位置,由于故障或运动范围限制导致的关节范围的修改是典型的案例,适应能力在安全地执行机器人任务的关键作用。已经提出了代表适应性运动技能的概率动态(PROMP),其被建模为轨迹的高斯分布。这些都是在分析讲道的,可以从少数演示中学习。然而,原始PROMP制定和随后的方法都仅为特定运动适应问题提供解决方案,例如障碍避免,以及普遍的,统一的适应概率方法缺失。在本文中,我们开发了一种用于调整PROMP的通用概率框架。我们统一以前的适应技术,例如,各种类型的避避,通过一个框架,互相避免,在一个框架中,并将它们结合起来解决复杂的机器人问题。另外,我们推导了新颖的适应技术,例如时间上未结合的通量和互相避免。我们制定适应作为约束优化问题,在那里我们最小化适应的分布与原始原始的分布之间的kullback-leibler发散,而我们限制了与不希望的轨迹相关的概率质量为低电平。我们展示了我们在双机器人手臂设置中的模拟平面机器人武器和7-DOF法兰卡 - Emika机器人的若干适应问题的方法。
translated by 谷歌翻译
Imitation learning approaches achieve good generalization within the range of the training data, but tend to generate unpredictable motions when querying outside this range. We present a novel approach to imitation learning with enhanced extrapolation capabilities that exploits the so-called Equation Learner Network (EQLN). Unlike conventional approaches, EQLNs use supervised learning to fit a set of analytical expressions that allows them to extrapolate beyond the range of the training data. We augment the task demonstrations with a set of task-dependent parameters representing spatial properties of each motion and use them to train the EQLN. At run time, the features are used to query the Task-Parameterized Equation Learner Network (TP-EQLN) and generate the corresponding robot trajectory. The set of features encodes kinematic constraints of the task such as desired height or a final point to reach. We validate the results of our approach on manipulation tasks where it is important to preserve the shape of the motion in the extrapolation domain. Our approach is also compared with existing state-of-the-art approaches, in simulation and in real setups. The experimental results show that TP-EQLN can respect the constraints of the trajectory encoded in the feature parameters, even in the extrapolation domain, while preserving the overall shape of the trajectory provided in the demonstrations.
translated by 谷歌翻译
Robots need to be able to adapt to unexpected changes in the environment such that they can autonomously succeed in their tasks. However, hand-designing feedback models for adaptation is tedious, if at all possible, making data-driven methods a promising alternative. In this paper we introduce a full framework for learning feedback models for reactive motion planning. Our pipeline starts by segmenting demonstrations of a complete task into motion primitives via a semi-automated segmentation algorithm. Then, given additional demonstrations of successful adaptation behaviors, we learn initial feedback models through learning from demonstrations. In the final phase, a sample-efficient reinforcement learning algorithm fine-tunes these feedback models for novel task settings through few real system interactions. We evaluate our approach on a real anthropomorphic robot in learning a tactile feedback task.
translated by 谷歌翻译
在现实世界中行为的自治工人的核心挑战是调整其曲目的技能来应对其嘈杂的感知和动态。为了将技能缩放到长地平线任务,机器人应该能够通过轨迹以结构化方式学习,然后在每次步骤中单独做出瞬间决策。为此,我们提出了软演员 - 评论家高斯混合模型(SAC-GMM),一种新型混合方法,通过动态系统学习机器人技巧,并通过与环境的互动来适应自己的轨迹分配空间中的学习技巧。我们的方法结合了经典的机器人技术与深度加强学习框架的演示和利用他们的互补性。我们表明,我们的方法仅在执行初步学习技能期间使用的传感器,以提取导致更快的技能细化的相关功能。模拟和现实世界环境的广泛评估展示了我们通过利用物理交互,高维感官数据和稀疏任务完成奖励来精炼机器人技能的方法的有效性。视频,代码和预先训练的模型可用于\ url {http://sac-gmm.cs.uni-freiburg.de}。
translated by 谷歌翻译
通过学习可变阻抗控制策略,机器人助手可以智能地调整其操纵合规性,以确保在人机交互环境中操作时安全交互和适当的任务完成。在本文中,我们提出了一种基于DMP的框架,其学习和概括人类示范的可变阻抗操纵技能。该框架改善了对环境变化的机器人$'$适应性(即抓地机器人末端效应器上的抓握对象的重量和形状变化)并继承了基于演示 - 方差的刚度估计方法的效率。此外,利用我们的刚度估计方法,我们不仅产生翻译刚度型材,而且产生旋转刚度轮廓,这些轮廓在大多数学习可变阻抗控制论文中被忽略或不完整。已经进行了7 DOF冗余机器人操纵器的现实世界实验,以验证我们框架的有效性。
translated by 谷歌翻译
Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.
translated by 谷歌翻译
在过去的二十年中,机器人社区见证了各种运动表示的出现,这些动作表示已被广泛使用,尤其是在经常克隆中,以紧凑和概括技能。其中,概率方法由于编码变化,相关性和对新任务条件的适应性的编码而获得了相关位置。但是,由于需要重新优化的参数重视经常需要计算昂贵的操作,因此调节此类原语通常很麻烦。在本文中,我们得出了一个非参数运动原始配方,其中包含空空间投影仪。我们表明,这种公式允许使用计算复杂性O(N2)快速有效地产生运动,而不涉及矩阵反转,其复杂性为O(n3)。这是通过使用零空间来跟踪辅助目标的精度来实现的。使用与时间输入相关的2D示例,我们表明我们的非参数解决方案与最先进的参数方法进行了比较。对于具有高维投入的演示技能,我们表明它也允许在线适应。
translated by 谷歌翻译
在本文中,我们讨论了通过模仿教授双人操作任务的框架。为此,我们提出了一种从人类示范中学习合规和接触良好的机器人行为的系统和算法。提出的系统结合了入学控制和机器学习的见解,以提取控制政策,这些政策可以(a)从时空和空间中恢复并适应各种干扰,同时(b)有效利用与环境的物理接触。我们使用现实世界中的插入任务证明了方法的有效性,该任务涉及操纵对象和插入钉之间的多个同时接触。我们还研究了为这种双人设置收集培训数据的有效方法。为此,我们进行了人类受试者的研究,并分析用户报告的努力和精神需求。我们的实验表明,尽管很难提供,但在遥控演示中可用的其他力/扭矩信息对于阶段估计和任务成功至关重要。最终,力/扭矩数据大大提高了操纵鲁棒性,从而在多点插入任务中获得了90%的成功率。可以在https://bimanualmanipulation.com/上找到代码和视频
translated by 谷歌翻译
机器人技术中最重要的挑战之一是产生准确的轨迹并控制其动态参数,以便机器人可以执行不同的任务。提供此类运动控制的能力与此类运动的编码方式密切相关。深度学习的进步在发展动态运动原语的新方法的发展方面产生了强烈的影响。在这项工作中,我们调查了与神经动态运动原始素有关的科学文献,以补充有关动态运动原语的现有调查。
translated by 谷歌翻译
虽然视觉模仿学习提供了从视觉演示中学习最有效的方法之一,但从它们中概括需要数百个不同的演示,任务特定的前瞻或大型难以列车的参数模型。此类复杂性出现的一个原因是因为标准的视觉模仿框架尝试一次解决两个耦合问题:从不同的视觉数据中学习简洁但良好的表示,同时学习将显示的动作与这样的表示相关联。这种联合学习导致这两个问题之间的相互依存,这通常会导致需要大量的学习演示。为了解决这一挑战,我们建议与对视觉模仿的行为学习的表现脱钩。首先,我们使用标准监督和自我监督的学习方法从离线数据中学习视觉表示编码器。培训表示,我们使用非参数局部加权回归来预测动作。我们通过实验表明,与目视模仿的先前工作相比,这种简单的去耦可提高离线演示数据集和实际机器人门开口的视觉模仿模型的性能。我们所有生成的数据,代码和机器人视频都在https://jyopari.github.io/vinn/处公开提供。
translated by 谷歌翻译
在本文中,我们提出了一种学习稳定的动力学系统的方法,该系统在里曼尼亚歧管上不断发展。该方法利用数据效率的程序来学习差异转换,该过程将简单的稳定动力系统映射到复杂的机器人技能上。通过从差异几何形状中利用数学工具,该方法可确保学习的技能满足基础歧管所施加的几何约束,例如用于方向和SPD的刚度矩阵,同时将逆转性保留到给定的目标。首先在公共基准上的模拟中测试了所提出的方法,该方法通过将笛卡尔数据投射到UQ和SPD歧管中,并与现有方法进行了比较。除了评估公共基准测试的方法外,还对在不同条件下进行瓶子的真正机器人进行了几项实验,并与人类操作员合作进行了钻井任务。评估在学习准确性和任务适应能力方面显示出令人鼓舞的结果。
translated by 谷歌翻译
Dynamic movement primitives are widely used for learning skills which can be demonstrated to a robot by a skilled human or controller. While their generalization capabilities and simple formulation make them very appealing to use, they possess no strong guarantees to satisfy operational safety constraints for a task. In this paper, we present constrained dynamic movement primitives (CDMP) which can allow for constraint satisfaction in the robot workspace. We present a formulation of a non-linear optimization to perturb the DMP forcing weights regressed by locally-weighted regression to admit a Zeroing Barrier Function (ZBF), which certifies workspace constraint satisfaction. We demonstrate the proposed CDMP under different constraints on the end-effector movement such as obstacle avoidance and workspace constraints on a physical robot. A video showing the implementation of the proposed algorithm using different manipulators in different environments could be found here https://youtu.be/hJegJJkJfys.
translated by 谷歌翻译
Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on offpolicy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.
translated by 谷歌翻译
Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.
translated by 谷歌翻译
学习细粒度的运动是机器人技术中最具挑战性的主题之一。这尤其是机器人手。机器人的手语获取或更具体地说,机器人中的手指手语获取可以被视为这种挑战的特定实例。在本文中,我们提出了一种从视频示例中学习灵巧的运动模仿的方法,而无需使用任何其他信息。我们为每个关节构建一个机器人手的乌尔德FF模型。通过利用预先训练的深视力模型,我们从RGB视频中提取手的3D姿势。然后,使用最新的强化学习算法进行运动模仿(即,近端政策优化),我们训练一项政策,以重现从演示中提取的运动。我们确定最佳的超参数集以基于参考运动执行模仿。此外,我们演示了我们的方法能够概括超过6个不同的手指字母的能力。
translated by 谷歌翻译