近年来,在强化学习中使用深度表示已经取得了很多成功。尽管如此,这些应用程序中的许多仍然使用常规架构,例如卷积网络,LSTM或自动编码器。在本文中,我们提出了一种新的神经网络架构,用于无模型增强学习。我们的决斗网络代表两个独立的估算器:一个用于状态值函数,一个用于状态依赖的动作优势函数。这种因子分解的主要好处是可以在不对基础强化学习算法进行任何改变的情况下概括整个行动。我们的结果表明,这种架构可以在存在许多类似值的行为的情况下进行更好的策略评估。此外,决斗架构使我们的RL代理能够超越Atari 2600域的最新技术。
translated by 谷歌翻译
When intelligent agents learn visuomotor behaviors from human demonstrations, they may benefit from knowing where the human is allocating visual attention, which can be inferred from their gaze. A wealth of information regarding intelligent decision making is conveyed by human gaze allocation; hence, exploiting such information has the potential to improve the agents' performance. With this motivation, we propose the AGIL (Attention Guided Imitation Learning) framework. We collect high-quality human action and gaze data while playing Atari games in a carefully controlled experimental setting. Using these data, we first train a deep neural network that can predict human gaze positions and visual attention with high accuracy (the gaze network) and then train another network to predict human actions (the policy network). Incorporating the learned attention model from the gaze network into the policy network significantly improves the action prediction accuracy and task performance.
translated by 谷歌翻译
The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In this article we take a big picture look at how the ALE is being used by the research community. We show how diverse the evaluation methodologies in the ALE have become with time, and highlight some key concerns when evaluating agents in the ALE. We use this discussion to present some methodological best practices and provide new benchmark results using these best practices. To further the progress in the field, we introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. We conclude this big picture look by revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open.
translated by 谷歌翻译
我们提出了第一个深度学习模型,使用强化学习直接从高维感觉输入成功学习控制政策。该模型是一个卷积神经网络,使用Q学习的变体进行训练,其输入是原始像素,其输出是估计未来奖励的值函数。我们将方法应用于Arcade学习环境中的七个Atari 2600游戏,而不调整架构或学习算法。我们发现它在六个游戏中的表现优于以前的所有方法,并且在三个游戏中超过了人类专家。
translated by 谷歌翻译
虽然深度强化学习(深度RL)代理在最大化奖励方面是有效的,但通常不清楚他们使用什么策略来做到这一点。在本文中,我们通过使用Atari 2600环境的案例研究向着解释深度RL代理迈出了一步。特别是,我们专注于使用显着性映射来了解代理如何学习和执行策略。我们介绍了一种用于生成有用显着性图的方法,并用它来表示1)强力参与者,2)代理人是否正在做出正确或错误的理由,以及3)代理人在学习过程中如何进化。我们还在非专家人类受试者身上测试我们的方法,并发现它提高了他们推理这些药剂的能力。总的来说,我们的结果表明,显着性信息可以提供对RL代理的决策和学习行为的重要洞察力。
translated by 谷歌翻译
The theory of reinforcement learning provides a normative account 1 , deeply rooted in psychological 2 and neuroscientific 3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems 4,5 , the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopa-minergic neurons and temporal difference reinforcement learning algorithms 3. While reinforcement learning agents have achieved some successes in a variety of domains 6-8 , their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks 9-11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games 12. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. We set out to create a single algorithm that would be able to develop a wide range of competencies on a varied range of challenging tasks-a central goal of general artificial intelligence 13 that has eluded previous efforts 8,14,15. To achieve this, we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network 16 known as deep neural networks. Notably, recent advances in deep neural networks 9-11 , in which several layers of nodes are used to build up progressively more abstract representations of the data, have made it possible for artificial neural networks to learn concepts such as object categories directly from raw sensory data. We use one particularly successful architecture, the deep convolutional network 17 , which uses hierarchical layers of tiled convolutional filters to mimic the effects of receptive fields-inspired by Hubel and Wiesel's seminal work on feedforward proce
translated by 谷歌翻译
深层强化学习近年来变得流行,在不同的视觉输入任务上表现出优势,如玩Atari游戏和机器人导航。虽然物体是重要的图像元素,但很少有工作人员利用物体特征加强深层强化学习。在本文中,我们提出了一种新方法,可以将对象识别处理结合到深层强化学习模型中。这种方法可以适用于任何现有的深层强化学习框架。在Atari游戏的实验中显示了最先进的结果。我们还提出了一种新的方法,称为“对象显着性图”,用于直观地解释深度强化学习代理所采取的行动。
translated by 谷歌翻译
The deep reinforcement learning community has made several independentimprovements to the DQN algorithm. However, it is unclear which of theseextensions are complementary and can be fruitfully combined. This paperexamines six extensions to the DQN algorithm and empirically studies theircombination. Our experiments show that the combination providesstate-of-the-art performance on the Atari 2600 benchmark, both in terms of dataefficiency and final performance. We also provide results from a detailedablation study that shows the contribution of each component to overallperformance.
translated by 谷歌翻译
In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards.
translated by 谷歌翻译
通过使用深度神经网络作为函数准直器直接从原始输入图像学习,深度强化学习(深度RL)已经实现了优越的性能不复杂的顺序任务。然而,从原始图像直接学习数据是低效的。除了学习策略之外,代理还必须学习复杂状态的特征表示。因此,深度RL通常会受到学习速度慢的影响,并且通常需要大量的培训时间和数据来达到合理的性能,这使得它不适用于数据昂贵的实际环境。在这项工作中,我们通过解决两个学习目标 - 特征学习中的一个来提高深度RL中的数据效率。我们利用监督学习对一小部分非专家人类演示进行预训练,并使用Atari领域中的异步优势行为者 - 关键算法(A3C)来验证我们的方法。我们的结果显示学习速度有了显着提高,即使所提供的演示是嘈杂和低质量的。
translated by 谷歌翻译
The resurgence of deep neural networks has resulted in impressive advances in natural language processing (NLP). This success, however, is contingent on access to large amounts of structured supervision, often manually constructed and unavailable for many applications and domains. In this thesis, I present novel computational models that integrate reinforcement learning with language understanding to induce grounded representations of semantics. Using unstructured feedback, these techniques not only enable task-optimized representations which reduce dependence on high quality annotations, but also exploit language in adapting control policies across different environments. First, I describe an approach for learning to play text-based games, where all interaction is through natural language and the only source of feedback is in-game rewards. Employing a deep reinforcement learning framework to jointly learn state representations and action policies, our model outperforms several baselines on different domains, demonstrating the importance of learning expressive representations. Second, I exhibit a framework for utilizing textual descriptions to tackle the challenging problem of cross-domain policy transfer for reinforcement learning (RL). We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively make use of text. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. Finally, I demonstrate how reinforcement learning can enhance traditional NLP systems in low resource scenarios. In particular, I describe an autonomous agent that can learn to acquire and integrate external information to enhance information extraction. Our experiments on two databases-shooting incidents and food adulteration cases-demonstrate that our system significantly improves over traditional extractors and a competitive meta-classifier baseline. Acknowledgements This thesis is the culmination of an exciting five-year journey and I am grateful for the support, guidance and love of several mentors, colleagues, friends and family members. The most amazing thing about MIT is the people here-brilliant, visionary, and dedicated to pushing the boundaries of science and technology. A perfect example is my advisor, Regina Barzilay. Her enthusiasm, vision and mentorship have played a huge role in my PhD journey and I am forever indebted to her for helping me evolve into a competent researcher. I would like to thank my wonderful thesis committee of Tommi Jaakkola and Luke Zettlemoyer. In addition to key technical insights, Tommi has always patiently provided sound advice, both research and career related. Luke has been a great mentor, and much of his research was a big source of inspiration during my initial exploration into computational semantics. I have also had some great mentors over the last few years. In particular, SRK B
translated by 谷歌翻译
One of the main challenges in reinforcement learning (RL) is generalisation.In typical deep RL methods this is achieved by approximating the optimal valuefunction with a low-dimensional representation using a deep network. While thisapproach works well in many domains, in domains where the optimal valuefunction cannot easily be reduced to a low-dimensional representation, learningcan be very slow and unstable. This paper contributes towards tackling suchchallenging domains, by proposing a new method, called Hybrid RewardArchitecture (HRA). HRA takes as input a decomposed reward function and learnsa separate value function for each component reward function. Because eachcomponent typically only depends on a subset of all features, the correspondingvalue function can be approximated more easily by a low-dimensionalrepresentation, enabling more effective learning. We demonstrate HRA on atoy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-humanperformance.
translated by 谷歌翻译
The recently introduced Deep Q-Networks (DQN) algorithm has gained attention as one of the first successful combinations of deep neural networks and reinforcement learning. Its promise was demonstrated in the Arcade Learning Environment (ALE), a challenging framework composed of dozens of Atari 2600 games used to evaluate general competency in AI. It achieved dramatically better results than earlier approaches, showing that its ability to learn good representations is quite robust and general. This paper attempts to understand the principles that underlie DQN's impressive performance and to better contextualize its success. We systematically evaluate the importance of key representational biases encoded by DQN's network by proposing simple linear representations that make use of these concepts. Incorporating these characteristics, we obtain a computationally practical feature set that achieves competitive performance to DQN in the ALE. Besides offering insight into the strengths and weaknesses of DQN, we provide a generic representation for the ALE, significantly reducing the burden of learning a representation for each game. Moreover, we also provide a simple, reproducible benchmark for the sake of comparison to future work in the ALE.
translated by 谷歌翻译
In recent years there is a growing interest in using deep representations forreinforcement learning. In this paper, we present a methodology and tools toanalyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose anew model, the Semi Aggregated Markov Decision Process (SAMDP), and analgorithm that learns it automatically. The SAMDP model allows us to identifyspatio-temporal abstractions directly from features and may be used as asub-goal detector in future work. Using our tools we reveal that the featureslearned by DQNs aggregate the state space in a hierarchical fashion, explainingits success. Moreover, we are able to understand and describe the policieslearned by DQNs for three different Atari2600 games and suggest ways tointerpret, debug and optimize deep neural networks in reinforcement learning.
translated by 谷歌翻译
人工智能(AI)的最新进展使人们重新建立了像人一样学习和思考的系统。许多进步来自于在对象识别,视频游戏和棋盘游戏等任务中使用端到端训练的深度神经网络,在某些方面实现了与人类相当的性能。尽管他们的生物灵感和性能成就,这些系统不同于人类智能的不规则方式。我们回顾了认知科学的进展,表明真正的人类学习和思维机器将不得不超越当前的工程学习趋势,以及他们如何学习它。具体而言,我们认为这些机器应该(a)构建世界的因果模型支持解释和理解,而不仅仅是解决模式识别问题; (b)在物理学和心理学的直觉理论中进行基础学习,以支持和丰富所学知识;以及(c)利用组合性和学习 - 学习快速获取知识并将其推广到新的任务和情境。我们建议针对这些目标的具体挑战和有希望的途径,这些目标可以将最近神经网络进步的强度与更结构化的认知模型结合起来。
translated by 谷歌翻译
强化学习(RL)是机器学习的一个分支,用于解决各种顺序决策问题而无需预先监督。由于最近深度学习的进步,新提出的Deep-RL算法已经能够在复杂的高维环境中表现得非常好。然而,即使在许多领域取得成功之后,这些方法的主要挑战之一是与高效决策所需的环境的高度相互作用。从大脑中寻求灵感,这个问题可以通过偏置决策来结合基于实例的学习来解决。记录高级经验。本文回顾了各种最近的强化学习方法,它们结合了外部记忆来解决决策问题,并对它们进行了调查。我们概述了不同的方法 - 以及它们的优点和缺点,应用以及用于基于内存的模型的标准实验设置。该评论希望成为有用的资源,以提供该领域最新进展的关键见解,并为其未来的进一步发展提供帮助。
translated by 谷歌翻译
In this paper, we propose the deep reinforcement relevance network (DRRN), a novel deep architecture, to design a better model for handling an unbounded action space with applications to language understanding for text-based games. For a particular class of games, a user must choose among a variable number of actions described by text, with the goal of maximizing long-term reward. In these games, the best action is typically that which best fits to the current situation (modeled as a state in the DRRN), also described by text. Because of the exponential complexity of natural language with respect to sentence length, there is typically an unbounded set of unique actions. Therefore, it is difficult to pre-define the action set. To address this challenge, the DRRN extracts separate high-level embedding vectors from the texts that describe states and actions, respectively, using a general interaction function, exploring inner product, bilinear, and DNN interaction, between these embedding vectors to approximate the Q-function. We evaluate the DRRN on two popular text games, showing superior performance over other deep Q-learning architectures.
translated by 谷歌翻译
随着视觉感知驱动的深度强化学习变得越来越广泛使用,越来越需要更好地理解和探索学识渊源。了解决策过程及其与视觉输入的关系对于识别学习行为中的问题非常有价值。然而,这一主题在研究社区中相对未被充分探索。在这项工作中,我们提出了一种方法,用于合成训练有素的代理人的视觉输入。这些输入或状态可能是需要采取特定行动的情况。此外,可以实现非常高或非常低的奖励的关键状态通常有助于理解系统的态势感知,因为它们可以对应于风险状态。为此,我们学习了关于环境的状态空间的生成模型。利用其潜在空间来优化目标函数以获得感兴趣的状态。在我们的实验中,我们表明这种方法可以为各种环境和强化学习方法生成信息。我们在标准的Atari基准测试游戏以及自主驾驶模拟器中探索结果。基于我们能够通过这种技术识别行为弱点的效率,我们认为这种一般方法可以作为人工智能安全应用的重要工具。
translated by 谷歌翻译
自主AI系统将在不久的将来进入人类社会提供服务并与人类一起工作。对于那些被接受和信任的系统,用户应该能够理解系统的推理过程,即系统应该是透明的。系统透明度使人们能够对系统的决策和行动形成一致的解释。透明度不仅对用户信任很重要,对软件调试和认证也很重要。近年来,深度神经网络在多个应用领域取得了巨大进步。然而,深度神经网络是不透明的。在本文中,我们报告了Deep ReinforcementLearning Networks(DRLN)透明度方面的工作。这样的网络非常成功地在图像输入域中不准确地学习动作控制,例如Atari游戏。在本文中,我们提出了一种新颖且通用的方法:(a)将明确的对象识别处理结合到深度强化学习模型中,(b)形成开发“对象显着性图”的基础,提供DRLN内部状态的可视化,从而能够形成解释,并且(c)可以纳入任何现有的深度加强学习框架。我们提出计算结果和人体实验来评估我们的方法。
translated by 谷歌翻译
Deep reinforcement learning is poised to revolu-tionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译