深度强化学习和机器人技术的最新进展是由于越来越现实和复杂的仿真环境的存在而推动的。然而,现有的许多平台提供了不切实际的视觉效果,不准确的物理特性,低任务复杂性或者人工代理的交互能力有限。此外,许多平台缺乏适当配置模拟的能力,因此从学习系统的角度将模拟环境转变为黑盒子。在这里,我们描述了一个新的开源工具包,用于使用Unity平台创建和与模拟环境交互:Unity ML-Agents Toolkit。通过利用Unity作为仿真平台的优势,该工具包可以开发具有丰富的感官和物理复杂性的学习环境,提供引人注目的认知挑战,并支持动态的多代理交互。我们详细介绍了平台设计,通信协议,一组示例环境以及通过该工具包实现的各种培训方案。
translated by 谷歌翻译
我们提供Toribash学习环境(ToriLLE),这是一个与视频游戏Toribash的接口,用于培训机器学习代理。 Toribash是一种类似MuJoCo的环境,由两个人形角色相互斗争,通过改变身体关节的状态来控制。 Toribash的竞争本质本身就是双代理实验,而主动玩家基础可以用于人类基线。本白皮书以其优点,缺点和局限性描述了环境,并通过成功培训强化学习代理实验性地显示了ToriLLE作为学习环境的适用性。代码可以通过以下网址获得://github.com/Miffyli/ToriLLE。
translated by 谷歌翻译
强化学习(RL)是近年来蓬勃发展的一个研究领域,并且已经显示出基于人工智能的电脑游戏对手的巨大潜力。这一成功主要归功于卷积神经网络的巨大功能,它可以从嘈杂和复杂的数据中提取有用的功能。游戏是测试和推动新型RL算法界限的绝佳工具,因为它们可以提供有价值的信息,说明算法在隔离环境中的性能,而不会产生实际后果。即时战略游戏(RTS)是一种具有巨大复杂性并在短期和长期内挑战玩家的类型。有许多研究侧重于在RTS游戏中应用RL,因此在不久的将来可以预见到新的进步。然而,迄今为止,很少有用于测试RTS AI的环境。文献中的环境往往过于简单化,例如microRTS或复杂,并且没有加速学习像星际争霸II这样的消费者硬件的可能性。本文介绍了Deep RTS游戏环境,用于测试RTS游戏的尖端人工智能算法。 DeepRTS是一款专为人工智能研究而设计的高性能RTS游戏。它支持加速学习,这意味着它可以比现有的RTS游戏快50万倍.Deep RTS具有灵活的配置,可以在几种不同的RTS场景中进行研究,包括部分可观察的状态空间和地图复杂性。我们证明了Deep RTS通过将其性能与microRTS,ELF和StarCraft II在高端消费硬件上进行比较,实现了我们的承诺。使用DeepRTS,我们发现Deep Q-Network代理在超过70%的时间内击败了随机播放代理。 Deep RTS可在https://github.com/cair/DeepRTS上公开获取。
translated by 谷歌翻译
The recent advances in deep neural networks have led to effective vision-based reinforcement learning methods that have been employed to obtain human-level controllers in Atari 2600 games from pixel data. Atari 2600 games, however, do not resemble real-world tasks since they involve non-realistic 2D environments and the third-person perspective. Here, we propose a novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world. The software, called ViZDoom, is based on the classical first-person shooter video game, Doom. It allows developing bots that play the game using the screen buffer. ViZDoom is lightweight, fast, and highly customizable via a convenient mechanism of user scenarios. In the experimental part, we test the environment by trying to learn bots for two scenarios: a basic move-and-shoot task and a more complex maze-navigation problem. Using convolutional deep neural networks with Q-learning and experience replay, for both scenarios, we were able to train competent bots, which exhibit human-like behaviors. The results confirm the utility of ViZDoom as an AI research platform and imply that visual reinforcement learning in 3D realistic first-person perspective environments is feasible.
translated by 谷歌翻译
本文介绍了2016年和2017年举办的Visual Doom AI竞赛的前两个版本。面临的挑战是在第一人称射击(FPS)游戏Doom中创建参与多玩家死亡竞赛的机器人。 Botshad仅根据视觉信息做出决定,即原始屏幕缓冲区。为了好好玩,机器人需要同时了解周围环境,导航,探索和处理对手。这些方面以及游戏中具有竞争力的多智能体方面使竞争成为评估最新技术增强学习算法的独特平台。本文讨论了可以深入了解代理人行为的规则,解决方案,结果和统计数据。表现最佳的代理商将更详细地描述。比赛的结果导致了这样的结论:虽然强化学习可以产生有能力的末日机器人,但它们仍然无法在这个游戏中成功地与人类竞争。本文还重新审视了ViZDoom环境,这是一个灵活,易用,高效的3D平台,用于基于视觉的第一人称视角Doom的基于视觉的执行学习的研究。
translated by 谷歌翻译
We detail the motivation and design decisions underpinning Flow, a computational framework integrating SUMO with the deep reinforcement learning libraries rllab and RLlib, allowing researchers to apply deep reinforcement learning (RL) methods to traffic scenarios, and permitting vehicle and infrastructure control in highly varied traffic environments. Users of Flow can rapidly design a wide variety of traffic scenarios in SUMO, enabling the development of controllers for autonomous vehicles and intelligent infrastructure across a broad range of settings. Flow facilitates the use of policy optimization algorithms to train controllers that can optimize for highly customizable traffic metrics, such as traffic flow or system-wide average velocity. Training reinforcement learning agents using such methods requires a massive amount of data, thus simulator reliability and scalability were major challenges in the development of Flow. A contribution of this work is a variety of practical techniques for overcoming such challenges with SUMO, including parallelizing policy rollouts, smart exception and collision handling, and leveraging subscriptions to reduce computational overhead. To demonstrate the resulting performance and reliability of Flow, we introduce the canonical single-lane ring road benchmark and briefly discuss prior work regarding that task. We then pose a more complex and challenging multi-lane setting and present a trained controller for a single vehicle that stabilizes the system. Flow is an open-source tool and available online at https://github.com/cathywu/flow.
translated by 谷歌翻译
In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards.
translated by 谷歌翻译
角色动画的一个长期目标是将行为的数据驱动指定与可以在物理模拟中执行类似行为的系统相结合,从而实现对扰动和环境变化的真实响应。我们展示了众所周知的强化学习(RL)方法可以适用于学习能够模仿国外范围的示例运动剪辑的强大控制策略,同时还学习复杂的恢复,适应形态的变化,并实现用户指定的目标。我们的方法处理关键帧运动,高度动态的动作,如运动捕捉翻转和旋转,以及重新定位的动作。通过将模仿目标与任务目标相结合,我们可以训练在交互设置中智能地反应的角色,例如,通过在期望的方向上行走或者在用户指定的目标上投掷球。因此,这种方法结合了使用运动剪辑来定义所需风格和外观的便利性和运动质量,以及由RL方法和基于物理的动画提供的灵活性和通用性。我们进一步探索了将多个剪辑集成到学习过程中的多种方法,以开发能够执行丰富多样技能的多技能代理。我们使用多个角色(人类,阿特拉斯机器人,双足恐龙,龙)以及各种各样的角色演示结果。技能,包括运动,杂技和武术。
translated by 谷歌翻译
Composable Controllers for Physics-Based Character Animation An ambitious goal in the area of physics-based computer animation is the creation of virtual actors that autonomously synthesize realistic human motions and possess a broad repertoire of lifelike motor skills. To this end, the control of dynamic, anthropomorphic figures subject to gravity and contact forces remains a difficult open problem. We propose a framework for composing controllers in order to enhance the motor abilities of such figures. A key contribution of our composition framework is an explicit model of the "pre-conditions" under which motor controllers are expected to function properly. We demonstrate controller composition with preconditions determined not only manually, but also automatically based on Support Vector Machine (SVM) learning theory. We evaluate our composition framework using a family of controllers capable of synthesizing basic actions such as balance, protective stepping when balance is disturbed, protective arm reactions when falling, and multiple ways of standing up after a fall. We furthermore demonstrate these basic controllers working in conjunction with more dynamic motor skills within a two-dimensional and a three-dimensional prototype virtual stuntperson. Our composition framework promises to enable the community of physics-based animation practitioners to more easily exchange motor controllers and integrate them into dynamic characters. ii Dedication To my father, Nikolaos Faloutsos, my mother, Sofia Faloutsou, and my wife, Florine Tseu. iii Acknowledgements I am done! Phew! It feels great. I have to do one more thing and that is to write the acknowledgements, one of the most important parts of a PhD thesis. The educational process of working towards a PhD degree teaches you, among other things, how important the interaction and contributions of the other people are to your career and personal development. First, I would like to thank my supervisors, Michiel van de Panne and Demetri Terzopoulos, for everything they did for me. And it was a lot. You have been the perfect supervisors. THANK YOU! However, I will never forgive Michiel for beating me at a stair-climbing race during a charity event that required running up the CN Tower stairs. Michiel, you may have forgotten, but I haven't! I am grateful to my external appraiser, Jessica Hodgins, and the members of my supervisory committee, Ken Jackson, Alejo Hausner and James Stewart, for their contribution to the successful completion of my degree. I would like to thank my close collaborator, Victor Ng-Thow-Hing, for being the richest source of knowledge on graphics research, graphics technology, investing and martial arts movies. Too bad you do not like Jackie Chan, Victor. A great THANKS is due to Joe Laszlo, the heart and soul of our lab's community spirit. Joe practically ran our lab during some difficult times. He has spent hours of his time to ensure the smooth operation of the lab and its equip
translated by 谷歌翻译
Reinforcement learning algorithms can train agents that solve problems in complex , interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself. We also point out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty. This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple. The skills include behaviors such as running, blocking, ducking, tackling, fooling opponents, kicking, and defending using both arms and legs. A highlight of the learned behaviors can be found here: https://goo.gl/eR7fbX.
translated by 谷歌翻译
我们介绍了TextWorld,这是一个沙盒学习环境,用于培训和评估R​​L代理在基于文本的游戏上的价值。 TextWorld是一个Python库,可以处理文本游戏的交互式播放,以及状态跟踪和奖励分配等后端功能。它附带了一系列精选的游戏,我们分析了它们的特点和挑战。更重要的是,它使用户能够手工制作或自动生成新游戏。它的生成机制可以精确控制构建游戏的难度,范围和语言,并可用于放松商业文本游戏固有的挑战,如部分可观察性和稀疏奖励。通过生成各种但类似的游戏集,TextWorld也可用于研究泛化和转移学习。我们在强化学习形式中投入基于文本的游戏,使用我们的框架开发一组基准游戏,并在此集合和策划列表上评估几个基线代理。
translated by 谷歌翻译
In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a miniature version of StarCraft, captures key game dynamics and runs at 40K frame-per-second (FPS) per core on a laptop. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end-to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environments like ALE [4]. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU [17] and Batch Normalization [11] coupled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than 70% of the time in the full game of Mini-RTS. Strong performance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open sourced at https://github.com/facebookresearch/ELF.
translated by 谷歌翻译
强化学习算法依赖于仔细设计代理外在的环境因素。然而,用手工设计的密集奖励来注释每个环境是不可扩展的,这促使需要开发代理所固有的奖励功能。好奇心是内在奖励函数的类型,它使用预测误差作为奖励信号。本文中:(a)我们在包括Atari在内的54个标准基准环境中进行纯粹的好奇心驱动学习的第一次大规模研究,即没有任何外在奖励游戏套件。我们的结果表现出令人惊讶的良好表现,以及内在的好奇心目标和许多游戏环境的手工设计的外在奖励之间的高度一致性。 (b)我们研究了使用不同的特征空间来计算预测误差的效果,并且表明随机特征对于许多流行的RL游戏基准来说是足够的,但是学习的特征看起来更好(例如,超级马里奥兄弟中的新颖游戏等级)。 (c)我们证明了随机设置中基于预测的奖励的限制。游戏视频和代码是:@ patpatk22.github.io/large-scale-curiosity/
translated by 谷歌翻译
最近在强化学习方面的成功鼓励了RL研究人员快速发展的网络工作以及RL研究的一些突破。随着RL社区和RL工作的增长,对广泛应用的基准测试的需求也在增长,这些基准测试可以公平有效地评估各种RL算法。这种需求在Hierarchical ReinforcementLearning(HRL)领域尤为明显。虽然许多现有的测试域可能表现出层次结构或状态结构,但现代RL算法在解决需要分层建模和行动计划的域时仍然表现出很大的困难,即使这些域看起来微不足道。这些困难凸显了更多关注HRL算法本身的需求,以及对鼓励和验证HRL研究的新测试平台的需求。现有的HRL测试台出现了Goldilocks问题;它们通常要么太简单(例如出租车),要么太复杂(例如,Montezuma从Arcade学习环境中复仇)。在本文中,我们介绍了Escape Room Domain(ERD),这是一个新的灵活,可扩展且完全实现的HRL测试域,可以弥补现有备选方案留下的“中等复杂性”差距。 ERD是开源的,可通过GitHub免费获得,并符合广泛使用的公共测试接口,可通过各种公共RL代理实现进行简单集成和测试。我们表明,ERD提出了一系列具有可扩展难度的挑战,以提供从Taxi到Arcade学习环境的平滑学习梯度。
translated by 谷歌翻译
This paper presents the framework, rules, games, controllers and results of the first General Video Game Playing Competition, held at the IEEE Conference on Computational Intelligence and Games in 2014. The competition proposes the challenge of creating controllers for general video game play, where a single agent must be able to play many different games, some of them unknown to the participants at the time of submitting their entries. This test can be seen as an approximation of General Artificial Intelligence, as the amount of game-dependent heuristics needs to be severely limited. The games employed are stochastic real-time scenarios (where the time budget to provide the next action is measured in milliseconds) with different winning conditions, scoring mechanisms , sprite types and available actions for the player. It is a responsibility of the agents to discover the mechanics of each game, the requirements to obtain a high score and the requisites to finally achieve victory. This paper describes all controllers submitted to the competition, with an in-depth description of four of them by their authors, including the winner and the runner-up entries of the contest. The paper also analyzes the performance of the different approaches submitted, and finally proposes future tracks for the competition.
translated by 谷歌翻译
最近,研究人员在学习特征表示的深度学习和强化学习方面取得了重大进展。一些值得注意的例子包括基于原始像素数据玩Atari游戏的培训代理,以及使用原始感应输入获得高级操作技能。然而,由于缺乏普遍采用的基准,很难量化连续控制领域的进展。在这项工作中,我们提出了一套基准的连续控制任务,包括推车杆摆动,任务等经典任务具有非常高的状态和动作维度,例如3D人形运动,具有部分观察的任务和具有分层结构的任务。我们基于对一系列实施的强化学习算法的系统评估来报告新的发现。基准测试和参考实现都在https://github.com/rllab/rllab上发布,以促进实验的可重复性并鼓励其他研究人员采用。
translated by 谷歌翻译
通过最近在模拟方面取得的许多成功,无模型强化学习已成为解决连续控制机器人任务的有前途的方法。由于学习算法和模拟基准测试任务的开源实现,研究团队现在能够快速重现,分析和构建这些结果。为了将这些成功转化为现实世界的应用,保留利用模拟的独特优势至关重要,这些模拟不会转移到现实世界并直接与物理机器人进行实验。然而,由于缺乏基准任务和支持源代码,物理机器人的强化学习研究面临着巨大的阻力。在这项工作中,我们介绍了几种增强学习任务,其中包括多种商用机器人,这些机器人具有不同的学习难度,设置和可重复性。在这些任务中,我们测试四重强制学习算法的现成实现的学习性能,并分析对其超参数的敏感性,以确定他们对各种实际任务中的应用程序的准备情况。我们的结果表明,通过仔细设置任务接口和计算,其中一些实现可以很容易地应用于物理机器人。我们发现,最先进的学习算法对其超参数高度敏感,并且它们的相对排序不会跨任务传递,这表明为每项任务重新调整它们以获得最佳性能的必要性。另一方面,即使使用不同的机器人,来自一个任务的最佳参数配置也可能经常导致对保持任务的有效学习,从而提供合理的默认。我们公开提供基准任务,以提高现实世界强化学习的可重复性。
translated by 谷歌翻译
在当前最先进的商业第一人称射击游戏中,计算机控制的机器人(也称为非玩家角色)通常可以容易地与人类控制的机器人区分开来。诸如失败的航行,“第六感”知识,人类玩家的下落和确定性,脚本行为等故事标志是其中的一些原因。然而,我们建议,这些游戏中非人类行为的最大指标之一可以在机器人的武器射击能力中找到。始终如一的完美准确性和从任何距离“锁定”在视野中的对手都是人体运动中没有的机器人的指示能力。传统上,机器人在某种程度上是残障的,其时间延迟或随机扰动其目标,其随着时间的推移不适应或改进其技术。我们假设让机器人通过反复试验来学习射击技巧,就像人类玩家一样,将导致游戏中更大的变化并产生不那么可预测的非玩家角色。本文描述了一种增强学习射击机制,用于根据对对手造成的伤害量的动态向前信号,随时间调整射击。
translated by 谷歌翻译
对无人驾驶飞行器使用的低水平姿态飞行控制几乎没有创新,其仍然主要使用经典的PID控制器。在这项工作中,我们介绍了第一个开源神经飞行控制器固件Neuroflight。我们提出了用于在仿真中训练神经网络的工具链,并将其编译为在嵌入式硬件上运行。随着解决方案的出现,讨论了从仿真跳到现实的挑战。我们的评估显示神经网络可以在Arm Cortex-M7处理器上以超过2.67kHz的速度执行,并且飞行测试证明了四轮飞行的神经闪光可以实现稳定的飞行并执行特技飞行。
translated by 谷歌翻译
腿式机器人是机器人技术中最大的挑战之一。动物的动态和头部动作不能通过人类制作的现有方法来模仿。一个引人注目的替代方案是强化学习,它需要最小的工艺并促进控制政策的自然演变。然而,到目前为止,有腿机器人的强化学习研究主要局限于模拟,并且只有少数和相对简单的例子已经部署在实际系统上。主要原因是使用realrobots进行培训,特别是使用动态平衡系统,这种方法很复杂且成本高。在目前的工作中,我们介绍了一种在模拟中训练神经网络策略并将其转换为最先进的有腿系统的方法,从而利用快速,自动化和经济高效的数据生成方案。该方法适用于ANYmal机器人,一种复杂的中型犬大小的四足系统。使用模拟培训的政策,thequadrupedal机实现了运动技能,超越了什么用以前的方法beenachieved:ANYmal能够精确andenergy,高效以下高级别的车体速度的命令,运行前fasterthan,并从复杂的配置甚至下降中恢复过来。
translated by 谷歌翻译