In reinforcement learning (RL) research, simulations enable benchmarks between algorithms, as well as prototyping and hyper-parameter tuning of agents. In order to promote RL both in research and real-world applications, frameworks are required which are on the one hand efficient in terms of running experiments as fast as possible. On the other hand, they must be flexible enough to allow the integration of newly developed optimization techniques, e.g. new RL algorithms, which are continuously put forward by an active research community. In this paper, we introduce Karolos, a RL framework developed for robotic applications, with a particular focus on transfer scenarios with varying robot-task combinations reflected in a modular environment architecture. In addition, we provide implementations of state-of-the-art RL algorithms along with common learning-facilitating enhancements, as well as an architecture to parallelize environments across multiple processes to significantly speed up experiments. The code is open source and published on GitHub with the aim of promoting research of RL applications in robotics.
translated by 谷歌翻译
众所周知,很难拥有一个可靠且强大的框架来将多代理深入强化学习算法与实用的多机器人应用联系起来。为了填补这一空白,我们为称为MultiroBolearn1的多机器人系统提出并构建了一个开源框架。该框架构建了统一的模拟和现实应用程序设置。它旨在提供标准的,易于使用的模拟方案,也可以轻松地将其部署到现实世界中的多机器人环境中。此外,该框架为研究人员提供了一个基准系统,以比较不同的强化学习算法的性能。我们使用不同类型的多代理深钢筋学习算法在离散和连续的动作空间中使用不同类型的多代理深钢筋学习算法来证明框架的通用性,可扩展性和能力。
translated by 谷歌翻译
本文展示了熊猫健身房,一套加固学习(RL)环境,适用于与Openai健身房一体化的弗兰卡·埃米卡熊猫机器人。包括五项任务:达到,推,幻灯片,拾取和堆叠。它们都遵循多目标RL框架,允许使用面向目标的RL算法。为了促进开放式研究,我们选择使用开源物理引擎Pybullet。为此包选择的实现允许定义非常容易的新任务或新机器人。本文还介绍了通过最先进的无模式脱核算法获得的结果。熊猫健身房是开源,在https://github.com/qgallouedec/panda-gym上免费提供。
translated by 谷歌翻译
In order to avoid conventional controlling methods which created obstacles due to the complexity of systems and intense demand on data density, developing modern and more efficient control methods are required. In this way, reinforcement learning off-policy and model-free algorithms help to avoid working with complex models. In terms of speed and accuracy, they become prominent methods because the algorithms use their past experience to learn the optimal policies. In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks in MuJoCo simulation environment. All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions. In the current study, the efficiency and the speed of these three algorithms are analyzed in a controlled environment.
translated by 谷歌翻译
深度强化学习(RL)导致了许多最近和开创性的进步。但是,这些进步通常以培训的基础体系结构的规模增加以及用于训练它们的RL算法的复杂性提高,而均以增加规模的成本。这些增长反过来又使研究人员更难迅速原型新想法或复制已发表的RL算法。为了解决这些问题,这项工作描述了ACME,这是一个用于构建新型RL算法的框架,这些框架是专门设计的,用于启用使用简单的模块化组件构建的代理,这些组件可以在各种执行范围内使用。尽管ACME的主要目标是为算法开发提供一个框架,但第二个目标是提供重要或最先进算法的简单参考实现。这些实现既是对我们的设计决策的验证,也是对RL研究中可重复性的重要贡献。在这项工作中,我们描述了ACME内部做出的主要设计决策,并提供了有关如何使用其组件来实施各种算法的进一步详细信息。我们的实验为许多常见和最先进的算法提供了基准,并显示了如何为更大且更复杂的环境扩展这些算法。这突出了ACME的主要优点之一,即它可用于实现大型,分布式的RL算法,这些算法可以以较大的尺度运行,同时仍保持该实现的固有可读性。这项工作提出了第二篇文章的版本,恰好与模块化的增加相吻合,对离线,模仿和从演示算法学习以及作为ACME的一部分实现的各种新代理。
translated by 谷歌翻译
多代理深度增强学习(Marl)缺乏缺乏共同使用的评估任务和标准,使方法之间的比较困难。在这项工作中,我们提供了一个系统评估,并比较了三种不同类别的Marl算法(独立学习,集中式多代理政策梯度,价值分解)在各种协作多智能经纪人学习任务中。我们的实验是在不同学习任务中作为算法的预期性能的参考,我们为不同学习方法的有效性提供了见解。我们开源EPYMARL,它将Pymarl CodeBase扩展到包括其他算法,并允许灵活地配置算法实现细节,例如参数共享。最后,我们开源两种环境,用于多智能经纪研究,重点关注稀疏奖励下的协调。
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
人类通常通过将它们分解为更容易的子问题,然后结合子问题解决方案来解决复杂的问题。这种类型的组成推理允许在解决共享一部分基础构图结构的未来任务时重复使用子问题解决方案。在持续或终身的强化学习(RL)设置中,将知识分解为可重复使用的组件的能力将使代理通过利用积累的组成结构来快速学习新的RL任务。我们基于神经模块探索一种特定形式的组成形式,并提出了一组RL问题,可以直观地接受组成溶液。从经验上讲,我们证明了神经组成确实捕获了问题空间的基本结构。我们进一步提出了一种构图终身RL方法,该方法利用累积的神经成分来加速学习未来任务的学习,同时通过离线RL通过离线RL保留以前的RL,而不是重播经验。
translated by 谷歌翻译
Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches. A new approach in the field of shape optimization is the utilization of Reinforcement Learning (RL) as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem. In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called Free-Form Deformation, a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.
translated by 谷歌翻译
在包装交付,交通监控,搜索和救援操作以及军事战斗订婚等不同应用中,对使用无人驾驶汽车(UAV)(无人机)的需求越来越不断增加。在所有这些应用程序中,无人机用于自动导航环境 - 没有人类互动,执行特定任务并避免障碍。自主无人机导航通常是使用强化学习(RL)来完成的,在该学习中,代理在域中充当专家在避免障碍的同时导航环境。了解导航环境和算法限制在选择适当的RL算法以有效解决导航问题方面起着至关重要的作用。因此,本研究首先确定了无人机导航任务,并讨论导航框架和仿真软件。接下来,根据环境,算法特征,能力和不同无人机导航问题的应用程序对RL算法进行分类和讨论,这将帮助从业人员和研究人员为其无人机导航使用情况选择适当的RL算法。此外,确定的差距和机会将推动无人机导航研究。
translated by 谷歌翻译
Progress in continual reinforcement learning has been limited due to several barriers to entry: missing code, high compute requirements, and a lack of suitable benchmarks. In this work, we present CORA, a platform for Continual Reinforcement Learning Agents that provides benchmarks, baselines, and metrics in a single code package. The benchmarks we provide are designed to evaluate different aspects of the continual RL challenge, such as catastrophic forgetting, plasticity, ability to generalize, and sample-efficient learning. Three of the benchmarks utilize video game environments (Atari, Procgen, NetHack). The fourth benchmark, CHORES, consists of four different task sequences in a visually realistic home simulator, drawn from a diverse set of task and scene parameters. To compare continual RL methods on these benchmarks, we prepare three metrics in CORA: Continual Evaluation, Isolated Forgetting, and Zero-Shot Forward Transfer. Finally, CORA includes a set of performant, open-source baselines of existing algorithms for researchers to use and expand on. We release CORA and hope that the continual RL community can benefit from our contributions, to accelerate the development of new continual RL algorithms.
translated by 谷歌翻译
我们提出了Composuite,这是一种用于组成多任务增强学习(RL)的开源模拟机器人操纵基准。每个复合仪任务都需要特定的机器人组来操纵一个单独的对象,以实现一个任务目标,同时避免障碍物。该任务的这种组成定义赋予了Composuite具有两个非凡的属性。首先,改变机器人/对象/客观/障碍元素会导致数百个RL任务,每个任务都需要有意义的不同行为。其次,可以专门评估RL方法,以了解其学习任务的组成结构的能力。后者在功能上分解问题的能力将使智能代理能够识别和利用学习任务之间的共同点,以处理大量高度多样化的问题。我们在各种培训环境中基准了现有的单任务,多任务和组成学习算法,并评估其在构图上概括到看不见的任务的能力。我们的评估暴露了现有RL方法在组成方面的缺点,并为调查开辟了新的途径。
translated by 谷歌翻译
稀疏奖励学习通常在加强学习(RL)方面效率低下。 Hindsight Experience重播(她)已显示出一种有效的解决方案,可以处理低样本效率,这是由于目标重新标记而导致的稀疏奖励效率。但是,她仍然有一个隐含的虚拟阳性稀疏奖励问题,这是由于实现目标而引起的,尤其是对于机器人操纵任务而言。为了解决这个问题,我们提出了一种新型的无模型连续RL算法,称为Relay-HER(RHER)。提出的方法首先分解并重新布置原始的长马任务,以增量复杂性为新的子任务。随后,多任务网络旨在以复杂性的上升顺序学习子任务。为了解决虚拟阳性的稀疏奖励问题,我们提出了一种随机混合的探索策略(RME),在该策略中,在复杂性较低的人的指导下,较高复杂性的子任务的实现目标很快就会改变。实验结果表明,在五个典型的机器人操纵任务中,与香草盖相比,RHER样品效率的显着提高,包括Push,Pickandplace,抽屉,插入物和InstaclePush。提出的RHER方法还应用于从头开始的物理机器人上的接触式推送任务,成功率仅使用250集达到10/10。
translated by 谷歌翻译
本文介绍了一些最先进的加强学习算法的基准研究,用于解决两个模拟基于视觉的机器人问题。本研究中考虑的算法包括软演员 - 评论家(SAC),近端政策优化(PPO),内插政策梯度(IPG),以及与后敏感体验重播(她)的变体。将这些算法的性能与Pybullet的两个仿真环境进行比较,称为KukadiverseObjectenV和raceCarzedgymenv。这些环境中的状态观察以RGB图像的形式提供,并且动作空间是连续的,使得它们难以解决。建议许多策略提供在基本上单目标环境的这些问题上实施算法所需的中级后敏感目标。另外,提出了许多特征提取架构在学习过程中纳入空间和时间关注。通过严格的模拟实验,建立了这些组分实现的改进。据我们所知,这种基准测试的基础基础是基于视觉的机器人问题的基准研究,使其成为该领域的新贡献。
translated by 谷歌翻译
本文详细介绍了我们对2021年真正机器人挑战的第一阶段提交的提交;三指机器人必须沿指定目标轨迹携带立方体的挑战。为了解决第1阶段,我们使用一种纯净的增强学习方法,该方法需要对机器人系统或机器人抓握的最少专家知识。与事后的经验重播一起采用了稀疏,基于目标的奖励,以教导控制立方体将立方体移至目标的X和Y坐标。同时,采用了基于密集的距离奖励来教授将立方体提升到目标的Z坐标(高度组成部分)的政策。该策略在将域随机化的模拟中进行培训,然后再转移到真实的机器人进行评估。尽管此次转移后的性能往往会恶化,但我们的最佳政策可以通过有效的捏合掌握能够成功地沿目标轨迹提升真正的立方体。我们的方法表现优于所有其他提交,包括那些利用更传统的机器人控制技术的提交,并且是第一个解决这一挑战的纯学习方法。
translated by 谷歌翻译
近年来,机器人的操纵和控制的重要性增加了。但是,在现实世界应用中需要操作时,最新技术仍然存在局限性。本文探讨了在模拟环境和真实环境中重播的事后观看经验,突出了其弱点,并根据奖励和目标塑造提出了基于加强学习的替代方案。此外,还发现了一些研究问题以及可以探索以解决这些问题的潜在研究方向。
translated by 谷歌翻译
Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multitask learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 7 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods. 1
translated by 谷歌翻译
通过加强学习(RL)掌握机器人操纵技巧通常需要设计奖励功能。该地区的最新进展表明,使用稀疏奖励,即仅在成功完成任务时奖励代理,可能会导致更好的政策。但是,在这种情况下,国家行动空间探索更困难。最近的RL与稀疏奖励学习的方法已经为任务提供了高质量的人类演示,但这些可能是昂贵的,耗时甚至不可能获得的。在本文中,我们提出了一种不需要人类示范的新颖有效方法。我们观察到,每个机器人操纵任务都可以被视为涉及从被操纵对象的角度来看运动的任务,即,对象可以了解如何自己达到目标状态。为了利用这个想法,我们介绍了一个框架,最初使用现实物理模拟器获得对象运动策略。然后,此策略用于生成辅助奖励,称为模拟的机器人演示奖励(SLDRS),使我们能够学习机器人操纵策略。拟议的方法已在增加复杂性的13个任务中进行了评估,与替代算法相比,可以实现更高的成功率和更快的学习率。 SLDRS对多对象堆叠和非刚性物体操作等任务特别有益。
translated by 谷歌翻译
长期以来,可变形的物体操纵任务被视为具有挑战性的机器人问题。但是,直到最近,对这个主题的工作很少,大多数机器人操纵方法正在为刚性物体开发。可变形的对象更难建模和模拟,这限制了对模型的增强学习(RL)策略的使用,因为它们需要仅在模拟中满足的大量数据。本文提出了针对可变形线性对象(DLOS)的新形状控制任务。更值得注意的是,我们介绍了有关弹性塑性特性对这种类型问题的影响的第一个研究。在各种应用中发现具有弹性性的物体(例如金属线),并且由于其非线性行为而挑战。我们首先强调了从RL角度来解决此类操纵任务的挑战,尤其是在定义奖励时。然后,基于差异几何形状的概念,我们提出了使用离散曲率和扭转的固有形状表示。最后,我们通过一项实证研究表明,为了成功地使用深层确定性策略梯度(DDPG)成功解决所提出的任务,奖励需要包括有关DLO形状的内在信息。
translated by 谷歌翻译
SKRL是一个开源模块化库,用于用Python编写的加固学习,设计着专注于算法实现的可读性,简单性和透明度。除了使用OpenAi Gym和DeepMind的传统接口的支持环境外,它还提供了装载,配置和操作NVIDIA ISAAC健身房和Nvidia Omniverse Isaac Gym Gym Gunt环境的设施。此外,它可以同时对几个具有可定制范围的代理(所有可用环境的子集)进行培训,这些代理在同一运行中可能会或可能不会共享资源。可以在https://skrl.readthedocs.io上找到该库的文档,其源代码可在https://github.com/toni-sm/skrl上找到。
translated by 谷歌翻译