我们介绍了一种深度强化学习(RL)的方法,通过结构化感知和关系推理提高了传统方法的效率,泛化能力和可解释性。它使用自我注意来迭代地推理场景中的关系并引导模型。 - 免费政策。我们的研究结果表明,在一个名为Box-World的新型导航和规划任务中,我们的代理人找到了可解决的解决方案,这些解决方案在样本复杂性方面提高了基线,能够推广到比在训练期间更复杂的场景,以及整体性能。在“星际争霸II”学习环境中,我们的经纪人在六款迷你游戏中实现了最先进的性能 - 超过了四位人类大师的表现。通过考虑建筑感应偏差,我们的工作为克服深度RL中的重要但顽固的挑战开辟了新的方向。
translated by 谷歌翻译
物理结构 - 为了服务某些功能而构建物体,受物理动力学影响的能力 - 是人类智能的基础。在这里,我们介绍了一套具有挑战性的物理构造框架,其灵感来自于孩子们如何玩积木,例如匹配目标配置,堆叠和附加块以将对象连接在一起,以及在目标对象上创建类似遮蔽物的结构。然后,我们将研究一系列现代深度强化学习机构如何应对这些挑战,并介绍几种提供卓越性能的新方法。我们的结果表明,使用结构化表示(例如,对象和场景图)和结构化策略(例如,以对象为中心的动作)的代理优于那些使用较少结构化表示的代理,并且当被要求推理更大的场景时,优化超出其训练。通过蒙特卡罗树搜索使用基于模型的计划的代理商在我们最具挑战性的施工问题中也优于严格的无模型代理。我们认为,将结构化表征和推理与强大学习相结合的方法是拥有丰富的直观物理,场景理解和规划的代理人的关键途径。
translated by 谷歌翻译
互惠是人类社会互动的重要特征,也是人们合作的基础。更重要的是,简单的互惠形式已经证明在矩阵游戏社会困境中具有显着的弹性。最着名的是,在针对囚徒困境的比赛中,针对性的策略表现得非常好。不幸的是,这种策略并不适用于现实世界,其中合作或缺陷的选择在时间和空间上得到延伸。在这里,我们提出一般的在线强化学习算法,显示对其共同参与者的互惠行为。我们表明,在与$ 2 $ $ -player Markov游戏以及$ 5 $ -player intertmporal socialdilemmas中进行学习时,它可以为更广泛的群体带来更好的社交结果。我们分析了由此产生的政策,以表明往复行为受其共同参与者行为的强烈影响。
translated by 谷歌翻译
Learning to navigate in complex environments with dynamic elements is animportant milestone in developing AI agents. In this work we formulate thenavigation question as a reinforcement learning problem and show that dataefficiency and task performance can be dramatically improved by relying onadditional auxiliary tasks leveraging multimodal sensory inputs. In particularwe consider jointly learning the goal-driven reinforcement learning problemwith auxiliary depth prediction and loop closure classification tasks. Thisapproach can learn to navigate from raw sensory input in complicated 3D mazes,approaching human-level performance even under conditions where the goallocation changes frequently. We provide detailed analysis of the agentbehaviour, its ability to localise, and its network activity dynamics, showingthat the agent implicitly learns key navigation abilities.
translated by 谷歌翻译
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.
translated by 谷歌翻译
近年来,深度强化学习已显示出强大的成功,不完整的单代理任务,最近这种方法也被应用于多代理域。在本文中,我们提出了一种新的方法,称为MAGnet,用于多智能体强化学习(MARL),它利用通过自我关注机制获得的环境的相关图表示,以及受NerveNet架构启发的消息生成技术。我们将MAGnet方法应用于Pommerman游戏,结果显示它明显优于最先进的MARL解决方案,包括DQN,MADDPG和MCTS。
translated by 谷歌翻译
While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment. A wide variety of domains have dynamics that share common foundations like the laws of classical mechanics , which are rarely exploited by existing algorithms. In fact, humans continuously acquire and use such dynamics priors to easily adapt to operating in new environments. In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent. Our method involves pre-training a frame predictor on task-agnostic physics videos to initialize dynamics models (and fine-tune them) for unseen target environments. Our frame prediction architecture, SpatialNet, is designed specifically to capture localized physical phenomena and interactions. Our approach allows for both faster policy learning and convergence to better policies, outperforming competitive approaches on several different environments. We also demonstrate that incorporating this prior allows for more effective transfer between environments.
translated by 谷歌翻译
Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models-based on graph networks-which implement an inductive bias for object-and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization , across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differ-entiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.
translated by 谷歌翻译
In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards.
translated by 谷歌翻译
AI中的许多任务需要多个代理的协作。通常,代理之间的通信协议是手动指定的,并且在培训期间不会更改。在本文中,我们探索了一个简单的神经模型,称为CommNet,它使用连续通信来完成合作任务。该模型由多个代理组成,它们之间的通信是在他们的策略之外学习的。我们将此模型应用于各种各样的任务,展示了代理能够学习在自己之间进行交流的能力,从而提高了与非交流代理和基线相比的性能。在某些情况下,可以解释代理人设计的语言,揭示简单但有效的策略来解决任务。
translated by 谷歌翻译
多代理方案中的强化学习对于实际应用非常重要,但却带来了超出单一代理设置的挑战。我们提出了一种演员 - 评论家算法,它在多智能体设置中训练分散的政策,使用集中计算的批评者,这些批评者共享注意机制,每个时间步长为每个代理选择相关信息。与最近的方法相比,这种注意机制可以在复杂的多代理环境中实现更有效和可扩展的学习。我们的方法不仅适用于具有共享奖励的合作设置,还适用于个性化奖励设置,包括对抗设置,并且不对代理的动作空间做出任何假设。因此,它足够灵活,可以应用于大多数多智能体学习问题。
translated by 谷歌翻译
在许多顺序决策制定任务中,设计奖励功能是有挑战性的,这有助于RL代理有效地学习代理设计者认为良好的行为。在文献中已经提出了许多不同的向外设计问题的公式或其近似变体。在本文中,我们建立了Singhet.al的最佳奖励框架。将最优内在奖励函数定义为当RL代理使用时实现优化任务指定的内部奖励函数的行为。此框架中的先前工作已经显示了如何为基于前瞻性搜索的规划者学习良好的内在奖励功能。是否有可能学习学习者的内在奖励功能仍然是一个悬而未决的问题。在本文中,我们推导出一种新的算法,用于学习基于策略梯度的学习代理的内在奖励。我们将使用我们的算法的增强代理的性能与基于A2C的策略学习器(针对Atarigames)和基于PPO的策略学习器(针对Mujoco域)提供额外的内在奖励,其中基线代理使用相同的策略学习者但仅使用外在奖励。我们的结果显示大多数但不是所有领域的性能都有所提高。
translated by 谷歌翻译
The resurgence of deep neural networks has resulted in impressive advances in natural language processing (NLP). This success, however, is contingent on access to large amounts of structured supervision, often manually constructed and unavailable for many applications and domains. In this thesis, I present novel computational models that integrate reinforcement learning with language understanding to induce grounded representations of semantics. Using unstructured feedback, these techniques not only enable task-optimized representations which reduce dependence on high quality annotations, but also exploit language in adapting control policies across different environments. First, I describe an approach for learning to play text-based games, where all interaction is through natural language and the only source of feedback is in-game rewards. Employing a deep reinforcement learning framework to jointly learn state representations and action policies, our model outperforms several baselines on different domains, demonstrating the importance of learning expressive representations. Second, I exhibit a framework for utilizing textual descriptions to tackle the challenging problem of cross-domain policy transfer for reinforcement learning (RL). We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively make use of text. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. Finally, I demonstrate how reinforcement learning can enhance traditional NLP systems in low resource scenarios. In particular, I describe an autonomous agent that can learn to acquire and integrate external information to enhance information extraction. Our experiments on two databases-shooting incidents and food adulteration cases-demonstrate that our system significantly improves over traditional extractors and a competitive meta-classifier baseline. Acknowledgements This thesis is the culmination of an exciting five-year journey and I am grateful for the support, guidance and love of several mentors, colleagues, friends and family members. The most amazing thing about MIT is the people here-brilliant, visionary, and dedicated to pushing the boundaries of science and technology. A perfect example is my advisor, Regina Barzilay. Her enthusiasm, vision and mentorship have played a huge role in my PhD journey and I am forever indebted to her for helping me evolve into a competent researcher. I would like to thank my wonderful thesis committee of Tommi Jaakkola and Luke Zettlemoyer. In addition to key technical insights, Tommi has always patiently provided sound advice, both research and career related. Luke has been a great mentor, and much of his research was a big source of inspiration during my initial exploration into computational semantics. I have also had some great mentors over the last few years. In particular, SRK B
translated by 谷歌翻译
In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the widespread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.
translated by 谷歌翻译
深层强化学习(DRL)近年来取得了显着成效。这导致了应用程序和方法数量的急剧增加。最近的工作探索了超越单一代理方案的学习,并且已经考虑了多代理方案。初步结果报告成功不成熟的多智能体领域,尽管有一些挑战。在这方面,首先,本文提供了当前多智能体深层强化学习(MDRL)文献的清晰概述。其次,它提供了指导,以补充这一新兴领域:(i)展示DRL和多智能体学习(MAL)的方法和算法如何帮助解决MDRL中的问题和(ii)提供从这些工作中学到的一般经验教训。我们期望本文将有助于统一和激励未来研究,以利用两个领域(DRL和MAL)中存在的大量文献,共同努力促进多智能体社区的富有成效的研究。
translated by 谷歌翻译
协作是执行超出oneagent功能的任务的必要技能。广泛应用于传统和现代AI,多代理协作通常在简单的网格世界中进行研究。我们认为合作存在固有的视觉方面,应该在视觉丰富的环境中进行研究。一个关键的元素合作是通过显式,通过消息或隐式,通过对其他代理和视觉世界的感知来进行的交流。学习在视觉环境中进行协作需要学习(1)执行任务,(2)何时和沟通的内容,以及(3)如何根据这些沟通和视觉世界的感知采取行动。在本文中,我们研究了在AI2-THOR中学习直接从像素协作的问题,并展示了显式和隐式通信模式对于执行视觉任务的好处。有关更多详细信息,请参阅我们的项目页面:https://prior.allenai.org/projects/two-body-problem
translated by 谷歌翻译
建模代理行为对于理解多代理系统中复杂现象的出现至关重要。代理建模的先前工作在很大程度上具有特定功能,并且由手工设计领域特定的先验知识驱动。我们提出了一个通用的学习框架,用于使用少数交互数据在任意多个系统中建模代理行为。我们的框架castsagent建模作为表示学习问题。因此,我们构建了一个受模仿学习和代理人识别启发的新目标,并设计了一种无代理学习代理政策表示的算法。我们在经验上证明了所提出的框架在(i)用于连续控制的具有挑战性的高维竞争环境中的效用,以及(ii)用于通信的协作环境,关于监督预测任务,无监督聚类以及使用深度强化学习的策略优化。
translated by 谷歌翻译
强化学习领域(RL)面临着越来越具有组合复杂性的挑战性领域。对于RL代理来解决这些挑战,它必须能够有效地进行规划。先前的工作通常结合非特定的计划算法(例如树搜索)来利用环境的显式模型。最近,已经提出了一种新的方法家族,通过在函数逼近器(例如树形结构神经网络)中通过归纳偏差提供结构来学习如何规划,通过无模型RL算法进行端到端训练。 。在本文中,我们更进一步,并且凭经验证明,除了卷积网络和LSTM之类的标准神经网络组件之外没有特殊结构的完全无模型方法,可以学习展示通常与基于模型的计划器相关的许多特征。我们衡量我们的代理人在规划方面的有效性,以便在组合和不可逆转的状态空间,其数据效率以及利用额外思考时间的能力方面进行推广。我们发现我们的代理具有许多人可能期望在规划算法中找到的特征。此外,它超越了最先进的组合领域,如推箱子,并且优于其他无模型方法,利用强大的归纳偏向规划。
translated by 谷歌翻译
人工智能(AI)最近经历了复兴,在视觉,语言,控制和决策等关键领域取得了重大进展。部分原因在于廉价数据和廉价的计算资源,它们符合深度学习的自然优势。然而,在许多不同压力下发展的人类智能的许多定义特征仍然是当前方法所无法实现的。特别是,超越一个人的经验 - 从人类智慧中获得人类智慧的标志 - 仍然是现代人工智能的一项艰巨挑战。以下是部分立场文件,部分审查和部分统一。认为组合概括必须是人工智能达到人类能力的首要任务,结构化表征和计算是实现这一目标的关键。就像生物学使用自然和培养合作一样,我们拒绝“手工工程”和“端到端”学习之间的错误选择,而是主张从其互补优势中获益的方法。我们探索在深度学习架构中如何使用关系归纳偏差可以促进对实体,关系和组成它们的规则的学习。我们为AI工具包提供了一个新的构建模块,它具有强大的关系引导偏差 - 图形网络 - 它概括和扩展了在图形上运行的神经网络的各种方法,并提供了一个简单的界面来操纵结构化知识和生产结构化行为。我们讨论图网络如何支持关系推理和组合泛化,为更复杂,可解释和灵活的推理模式奠定基础。作为本文的参考,我们发布了一个用于构建图形网络的开源软件库,并演示了如何在实践中使用它们。
translated by 谷歌翻译
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neu-ral relational inference (NRI) model: an unsuper-vised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.
translated by 谷歌翻译