With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
translated by 谷歌翻译
随着alphago的突破,人机游戏的AI已经成为一个非常热门的话题,吸引了世界各地的研究人员,这通常是测试人工智能的有效标准。已经开发了各种游戏AI系统(AIS),如Plibratus,Openai Five和AlphaStar,击败了专业人员。在本文中,我们调查了最近的成功游戏AIS,覆盖棋盘游戏AIS,纸牌游戏AIS,第一人称射击游戏AIS和实时战略游戏AIS。通过这项调查,我们1)比较智能决策领域的不同类型游戏之间的主要困难; 2)说明了开发专业水平AIS的主流框架和技术; 3)提高当前AIS中的挑战或缺点,以实现智能决策; 4)试图提出奥运会和智能决策技巧的未来趋势。最后,我们希望这篇简短的审查可以为初学者提供介绍,激发了在游戏中AI提交的研究人员的见解。
translated by 谷歌翻译
尽管将进化计算整合到增强学习中的新进展,但缺乏高性能平台可赋予合成性和大规模的并行性,这对与异步商业游戏相关的研究和应用造成了非平凡的困难。在这里,我们介绍了Lamarckian-一个开源平台,其支持进化增强学习可扩展到分布式计算资源的支持。为了提高训练速度和数据效率,拉马克人采用了优化的通信方法和异步进化增强学习工作流程。为了满足商业游戏和各种方法对异步界面的需求,Lamarckian量身定制了异步的马尔可夫决策过程界面,并设计了带有脱钩模块的面向对象的软件体系结构。与最先进的RLLIB相比,我们从经验上证明了Lamarckian在基准测试中具有多达6000 CPU核心的独特优势:i)i)在Google足球游戏上运行PPO时,采样效率和训练速度都翻了一番; ii)在乒乓球比赛中运行PBT+PPO时,训练速度的速度快13倍。此外,我们还提出了两种用例:i)如何将拉马克安应用于生成行为多样性游戏AI; ii)Lamarckian如何应用于游戏平衡测试的异步商业游戏。
translated by 谷歌翻译
多智能体增强学习任务对培训样本的体积提出了很高的需求。不同于其单代理对应物,基于分布式的超代理强化学习面临着苛刻的数据传输,流程间通信管理和勘探高要求的独特挑战。我们提出了一个容器化的学习框架来解决这些问题。我们打包了几个环境实例,本地学习者和缓冲区,以及仔细设计的多队列管理器,避免阻止容器。鼓励每个容器的本地政策尽可能多样,只有最优先考虑的轨迹被送到全球学习者。通过这种方式,我们实现了具有高系统吞吐量的可扩展,较效率和多样化的分布式Marl学习框架。要拥有知识,我们的方法是第一个解决挑战的谷歌研究足球全游戏$ 5 \ _v \ _5 $。在星际争霸II微型管理基准中,与最先进的非分布式MARL算法相比,我们的方法获得了4美元 - $ 18 \倍。
translated by 谷歌翻译
Starcraft II(SC2)对强化学习(RL)提出了巨大的挑战,其中主要困难包括巨大的状态空间,不同的动作空间和长期的视野。在这项工作中,我们研究了《星际争霸II》全长游戏的一系列RL技术。我们研究了涉及提取的宏观活动和神经网络的层次结构的层次RL方法。我们研究了课程转移培训程序,并在具有4个GPU和48个CPU线的单台计算机上训练代理。在64x64地图并使用限制性单元上,我们对内置AI的获胜率达到99%。通过课程转移学习算法和战斗模型的混合物,我们在最困难的非作战水平内置AI(7级)中获得了93%的胜利率。在本文的扩展版本中,我们改进了架构,以针对作弊水平训练代理商,并在8级,9级和10级AIS上达到胜利率,为96%,97%和94 %, 分别。我们的代码在https://github.com/liuruoze/hiernet-sc2上。为了为我们的工作以及研究和开源社区提供基线,我们将其复制了一个缩放版本的Mini-Alphastar(MAS)。 MAS的最新版本为1.07,可以在具有564个动作的原始动作空间上进行培训。它旨在通过使超参数可调节来在单个普通机器上进行训练。然后,我们使用相同的资源将我们的工作与MAS进行比较,并表明我们的方法更有效。迷你α的代码在https://github.com/liuruoze/mini-alphastar上。我们希望我们的研究能够阐明对SC2和其他大型游戏有效增强学习的未来研究。
translated by 谷歌翻译
深度强化学习(RL)导致了许多最近和开创性的进步。但是,这些进步通常以培训的基础体系结构的规模增加以及用于训练它们的RL算法的复杂性提高,而均以增加规模的成本。这些增长反过来又使研究人员更难迅速原型新想法或复制已发表的RL算法。为了解决这些问题,这项工作描述了ACME,这是一个用于构建新型RL算法的框架,这些框架是专门设计的,用于启用使用简单的模块化组件构建的代理,这些组件可以在各种执行范围内使用。尽管ACME的主要目标是为算法开发提供一个框架,但第二个目标是提供重要或最先进算法的简单参考实现。这些实现既是对我们的设计决策的验证,也是对RL研究中可重复性的重要贡献。在这项工作中,我们描述了ACME内部做出的主要设计决策,并提供了有关如何使用其组件来实施各种算法的进一步详细信息。我们的实验为许多常见和最先进的算法提供了基准,并显示了如何为更大且更复杂的环境扩展这些算法。这突出了ACME的主要优点之一,即它可用于实现大型,分布式的RL算法,这些算法可以以较大的尺度运行,同时仍保持该实现的固有可读性。这项工作提出了第二篇文章的版本,恰好与模块化的增加相吻合,对离线,模仿和从演示算法学习以及作为ACME的一部分实现的各种新代理。
translated by 谷歌翻译
未来的互联网涉及几种新兴技术,例如5G和5G网络,车辆网络,无人机(UAV)网络和物联网(IOT)。此外,未来的互联网变得异质并分散了许多相关网络实体。每个实体可能需要做出本地决定,以在动态和不确定的网络环境下改善网络性能。最近使用标准学习算法,例如单药强化学习(RL)或深入强化学习(DRL),以使每个网络实体作为代理人通过与未知环境进行互动来自适应地学习最佳决策策略。但是,这种算法未能对网络实体之间的合作或竞争进行建模,而只是将其他实体视为可能导致非平稳性问题的环境的一部分。多机构增强学习(MARL)允许每个网络实体不仅观察环境,还可以观察其他实体的政策来学习其最佳政策。结果,MAL可以显着提高网络实体的学习效率,并且最近已用于解决新兴网络中的各种问题。在本文中,我们因此回顾了MAL在新兴网络中的应用。特别是,我们提供了MARL的教程,以及对MARL在下一代互联网中的应用进行全面调查。特别是,我们首先介绍单代机Agent RL和MARL。然后,我们回顾了MAL在未来互联网中解决新兴问题的许多应用程序。这些问题包括网络访问,传输电源控制,计算卸载,内容缓存,数据包路由,无人机网络的轨迹设计以及网络安全问题。
translated by 谷歌翻译
The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project 1 .
translated by 谷歌翻译
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
translated by 谷歌翻译
深度加强学习(DRL)在复杂的视频游戏中取得了超级性能(例如,星际争霸II和DOTA II)。然而,目前的DRL系统仍然遭受多助手协调,稀疏奖励,随机环境等的挑战。在寻求解决这些挑战时,我们雇用了足球视频游戏,例如Google Research Football(GRF),如我们测试的开发基于端到端的学习的AI系统(表示为Tickick)以完成此具有挑战性的任务。在这项工作中,我们首先从联赛培训获得的单一代理专家的自我播放中生成了一个大型重播数据集。然后,我们开发了一个分布式学习系统和新的离线算法,以从固定的单个代理数据集中学习一个强大的多辅助AI。据我们所知,Tickick是第一个基于学习的AI系统,可以接管多个Agent Google Research Footful Game,而以前的工作可以控制单一代理或实验玩具学术情景。广泛的实验进一步表明,我们的预先训练的模型可以加速现代多功能算法的训练过程,我们的方法在各种学术方案上实现了最先进的性能。
translated by 谷歌翻译
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in singlemachine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach. The source code is publicly available at github.com/deepmind/scalable agent.
translated by 谷歌翻译
蒙特卡洛树搜索(MCT)是设计游戏机器人或解决顺序决策问题的强大方法。该方法依赖于平衡探索和开发的智能树搜索。MCT以模拟的形式进行随机抽样,并存储动作的统计数据,以在每个随后的迭代中做出更有教育的选择。然而,该方法已成为组合游戏的最新技术,但是,在更复杂的游戏(例如那些具有较高的分支因素或实时系列的游戏)以及各种实用领域(例如,运输,日程安排或安全性)有效的MCT应用程序通常需要其与问题有关的修改或与其他技术集成。这种特定领域的修改和混合方法是本调查的主要重点。最后一项主要的MCT调查已于2012年发布。自发布以来出现的贡献特别感兴趣。
translated by 谷歌翻译
DeepMind的游戏理论与多代理团队研究多学科学习的几个方面,从计算近似值到游戏理论中的基本概念,再到在富裕的空间环境中模拟社会困境,并在困难的团队协调任务中培训3-D类人动物。我们小组的一个签名目的是使用DeepMind在DeepMind中提供的资源和专业知识,以深入强化学习来探索复杂环境中的多代理系统,并使用这些基准来提高我们的理解。在这里,我们总结了我们团队的最新工作,并提出了一种分类法,我们认为这重点介绍了多代理研究中许多重要的开放挑战。
translated by 谷歌翻译
在包装交付,交通监控,搜索和救援操作以及军事战斗订婚等不同应用中,对使用无人驾驶汽车(UAV)(无人机)的需求越来越不断增加。在所有这些应用程序中,无人机用于自动导航环境 - 没有人类互动,执行特定任务并避免障碍。自主无人机导航通常是使用强化学习(RL)来完成的,在该学习中,代理在域中充当专家在避免障碍的同时导航环境。了解导航环境和算法限制在选择适当的RL算法以有效解决导航问题方面起着至关重要的作用。因此,本研究首先确定了无人机导航任务,并讨论导航框架和仿真软件。接下来,根据环境,算法特征,能力和不同无人机导航问题的应用程序对RL算法进行分类和讨论,这将帮助从业人员和研究人员为其无人机导航使用情况选择适当的RL算法。此外,确定的差距和机会将推动无人机导航研究。
translated by 谷歌翻译
培训代理商的训练人群在加强学习方面表现出了巨大的希望,可以稳定训练,改善探索和渐近性能以及产生各种解决方案。但是,从业人员通常不考虑基于人群的培训,因为它被认为是过速的(依次实施),或者在计算上昂贵(如果代理在独立加速器上并行训练)。在这项工作中,我们比较了实施和重新审视以前的研究,以表明对汇编和矢量化的明智使用允许与培训单个代理相比,在单台机器上进行基于人群的培训。我们还表明,当提供一些加速器时,我们的协议扩展到诸如高参数调谐等应用的大型人口大小。我们希望这项工作和公众发布我们的代码将鼓励从业者更频繁地使用基于人群的学习来进行研究和应用。
translated by 谷歌翻译
深度强化学习(DRL)和深度多机构的强化学习(MARL)在包括游戏AI,自动驾驶汽车,机器人技术等各种领域取得了巨大的成功。但是,众所周知,DRL和Deep MARL代理的样本效率低下,即使对于相对简单的问题设置,通常也需要数百万个相互作用,从而阻止了在实地场景中的广泛应用和部署。背后的一个瓶颈挑战是众所周知的探索问题,即如何有效地探索环境和收集信息丰富的经验,从而使政策学习受益于最佳研究。在稀疏的奖励,吵闹的干扰,长距离和非平稳的共同学习者的复杂环境中,这个问题变得更加具有挑战性。在本文中,我们对单格和多代理RL的现有勘探方法进行了全面的调查。我们通过确定有效探索的几个关键挑战开始调查。除了上述两个主要分支外,我们还包括其他具有不同思想和技术的著名探索方法。除了算法分析外,我们还对一组常用基准的DRL进行了全面和统一的经验比较。根据我们的算法和实证研究,我们终于总结了DRL和Deep Marl中探索的公开问题,并指出了一些未来的方向。
translated by 谷歌翻译
互联网连接系统的规模大大增加,这些系统比以往任何时候都更接触到网络攻击。网络攻击的复杂性和动态需要保护机制响应,自适应和可扩展。机器学习,或更具体地说,深度增强学习(DRL),方法已经广泛提出以解决这些问题。通过将深入学习纳入传统的RL,DRL能够解决复杂,动态,特别是高维的网络防御问题。本文提出了对为网络安全开发的DRL方法进行了调查。我们触及不同的重要方面,包括基于DRL的网络 - 物理系统的安全方法,自主入侵检测技术和基于多元的DRL的游戏理论模拟,用于防范策略对网络攻击。还给出了对基于DRL的网络安全的广泛讨论和未来的研究方向。我们预计这一全面审查提供了基础,并促进了未来的研究,探讨了越来越复杂的网络安全问题。
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译