2018-03-02
We propose a distributed architecture for deep reinforcement learning atscale, that enables agents to learn effectively from orders of magnitude moredata than previously possible. The algorithm decouples acting from learning:the actors interact with their own instances of the environment by selectingactions according to a shared neural network, and accumulate the resultingexperience in a shared experience replay memory; the learner replays samples ofexperience and updates the neural network. The architecture relies onprioritized experience replay to focus only on the most significant datagenerated by the actors. Our architecture substantially improves the state ofthe art on the Arcade Learning Environment, achieving better final performancein a fraction of the wall-clock training time.
translated by 谷歌翻译

2017-04-15

translated by 谷歌翻译

2015-11-18

translated by 谷歌翻译

2015-11-20

translated by 谷歌翻译

2017-06-30
We introduce NoisyNet, a deep reinforcement learning agent with parametricnoise added to its weights, and show that the induced stochasticity of theagent's policy can be used to aid efficient exploration. The parameters of thenoise are learned with gradient descent along with the remaining networkweights. NoisyNet is straightforward to implement and adds little computationaloverhead. We find that replacing the conventional exploration heuristics forA3C, DQN and dueling agents (entropy reward and $\epsilon$-greedy respectively)with NoisyNet yields substantially higher scores for a wide range of Atarigames, in some cases advancing the agent from sub to super-human performance.
translated by 谷歌翻译

2017-04-12
Deep reinforcement learning (RL) has achieved several high profile successesin difficult decision-making problems. However, these algorithms typicallyrequire a huge amount of data before they reach reasonable performance. Infact, their performance during learning can be extremely poor. This may beacceptable for a simulator, but it severely limits the applicability of deep RLto many real-world tasks, where the agent must learn in the real environment.In this paper we study a setting where the agent may access data from previouscontrol of the system. We present an algorithm, Deep Q-learning fromDemonstrations (DQfD), that leverages small sets of demonstration data tomassively accelerate the learning process even from relatively small amounts ofdemonstration data and is able to automatically assess the necessary ratio ofdemonstration data while learning thanks to a prioritized replay mechanism.DQfD works by combining temporal difference updates with supervisedclassification of the demonstrator's actions. We show that DQfD has betterinitial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN)as it starts with better scores on the first million steps on 41 of 42 gamesand on average it takes PDD DQN 83 million steps to catch up to DQfD'sperformance. DQfD learns to out-perform the best demonstration given in 14 of42 games. In addition, DQfD leverages human demonstrations to achievestate-of-the-art results for 11 games. Finally, we show that DQfD performsbetter than three related algorithms for incorporating demonstration data intoDQN.
translated by 谷歌翻译

translated by 谷歌翻译

2018-10-18

translated by 谷歌翻译

2016-02-04

translated by 谷歌翻译
The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In this article we take a big picture look at how the ALE is being used by the research community. We show how diverse the evaluation methodologies in the ALE have become with time, and highlight some key concerns when evaluating agents in the ALE. We use this discussion to present some methodological best practices and provide new benchmark results using these best practices. To further the progress in the field, we introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. We conclude this big picture look by revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open.
translated by 谷歌翻译

2015-07-15
We present the first massively distributed architecture for deepreinforcement learning. This architecture uses four main components: parallelactors that generate new behaviour; parallel learners that are trained fromstored experience; a distributed neural network to represent the value functionor behaviour policy; and a distributed store of experience. We used ourarchitecture to implement the Deep Q-Network algorithm (DQN). Our distributedalgorithm was applied to 49 games from Atari 2600 games from the ArcadeLearning Environment, using identical hyperparameters. Our performancesurpassed non-distributed DQN in 41 of the 49 games and also reduced thewall-time required to achieve these results by an order of magnitude on mostgames.
translated by 谷歌翻译

2016-02-15

translated by 谷歌翻译

2018-09-12

translated by 谷歌翻译

2015-09-22

translated by 谷歌翻译

translated by 谷歌翻译

2013-12-19

translated by 谷歌翻译

2018-12-06

translated by 谷歌翻译
Instability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.
translated by 谷歌翻译

2018-06-06

translated by 谷歌翻译
Please cite this article as: Elfwing, S., Uchibe, E., Doya, K., Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks (2018), https://doi. Abstract In recent years, neural networks have enjoyed a renaissance as function approxima-tors in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neu-ral network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10×10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units.
translated by 谷歌翻译
${authors} 分类：${tags}
${pubdate}${abstract_cn}
translated by 谷歌翻译