Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not purely opaque. Indeed, very often we may have access to information about the underlying system (e.g. that observations must obey certain laws of physics) that any "tabula rasa" neural network would need to re-learn from scratch, penalising performance. We incorporate this information into a pre-trained reasoning module, and investigate its role in shaping the discovered representations in diverse self-supervised learning settings from pixels. Our approach paves the way for a new class of representation learning, grounded in algorithmic priors.
translated by 谷歌翻译
组合优化是运营研究和计算机科学领域的一个公认领域。直到最近,它的方法一直集中在孤立地解决问题实例,而忽略了它们通常源于实践中的相关数据分布。但是,近年来,人们对使用机器学习,尤其是图形神经网络(GNN)的兴趣激增,作为组合任务的关键构件,直接作为求解器或通过增强确切的求解器。GNN的电感偏差有效地编码了组合和关系输入,因为它们对排列和对输入稀疏性的意识的不变性。本文介绍了对这个新兴领域的最新主要进步的概念回顾,旨在优化和机器学习研究人员。
translated by 谷歌翻译
神经算法推理的基石是解决算法任务的能力,尤其是以一种概括分布的方式。尽管近年来,该领域的方法学改进激增,但它们主要集中在建立专家模型上。专业模型能够学习仅执行一种算法或具有相同控制流骨干的算法的集合。相反,在这里,我们专注于构建通才神经算法学习者 - 单个图形神经网络处理器,能够学习执行各种算法,例如分类,搜索,动态编程,路径触发和几何学。我们利用CLRS基准来凭经验表明,就像在感知领域的最新成功一样,通才算法学习者可以通过“合并”知识来构建。也就是说,只要我们能够在单任务制度中学习很好地执行它们,就可以以多任务的方式有效地学习算法。在此激励的基础上,我们为CLR提供了一系列改进,对CLR的输入表示,培训制度和处理器体系结构,将平均单任务性能提高了20%以上。然后,我们进行了多任务学习者的彻底消融,以利用这些改进。我们的结果表明,一位通才学习者有效地结合了专家模型所捕获的知识。
translated by 谷歌翻译
尽管深度强化学习(RL)最近取得了许多成功,但其方法仍然效率低下,这使得在数据方面解决了昂贵的许多问题。我们的目标是通过利用未标记的数据中的丰富监督信号来进行学习状态表示,以解决这一问题。本文介绍了三种不同的表示算法,可以访问传统RL算法使用的数据源的不同子集使用:(i)GRICA受到独立组件分析(ICA)的启发,并训练深层神经网络以输出统计独立的独立特征。输入。 Grica通过最大程度地减少每个功能与其他功能之间的相互信息来做到这一点。此外,格里卡仅需要未分类的环境状态。 (ii)潜在表示预测(LARP)还需要更多的上下文:除了要求状态作为输入外,它还需要先前的状态和连接它们的动作。该方法通过预测当前状态和行动的环境的下一个状态来学习状态表示。预测器与图形搜索算法一起使用。 (iii)重新培训通过训练深层神经网络来学习国家表示,以学习奖励功能的平滑版本。该表示形式用于预处理输入到深度RL,而奖励预测指标用于奖励成型。此方法仅需要环境中的状态奖励对学习表示表示。我们发现,每种方法都有其优势和缺点,并从我们的实验中得出结论,包括无监督的代表性学习在RL解决问题的管道中可以加快学习的速度。
translated by 谷歌翻译
这篇综述解决了在深度强化学习(DRL)背景下学习测量数据的抽象表示的问题。尽管数据通常是模棱两可,高维且复杂的解释,但许多动态系统可以通过一组低维状态变量有效地描述。从数据中发现这些状态变量是提高数据效率,稳健性和DRL方法的概括,应对维度的诅咒以及将可解释性和见解带入Black-Box DRL的关键方面。这篇综述通过描述用于学习世界的学习代表的主要深度学习工具,提供对方法和原则的系统观点,总结应用程序,基准和评估策略,并讨论开放的方式,从而提供了DRL中无监督的代表性学习的全面概述,挑战和未来的方向。
translated by 谷歌翻译
以对象为中心的表示是通过提供柔性抽象可以在可以建立的灵活性抽象来实现更系统的推广的有希望的途径。最近的简单2D和3D数据集的工作表明,具有对象的归纳偏差的模型可以学习段,并代表单独的数据的统计结构中的有意义对象,而无需任何监督。然而,尽管使用越来越复杂的感应偏差(例如,用于场景的尺寸或3D几何形状),但这种完全无监督的方法仍然无法扩展到不同的现实数据。在本文中,我们采取了弱监督的方法,并专注于如何使用光流的形式的视频数据的时间动态,2)调节在简单的对象位置上的模型可以用于启用分段和跟踪对象在明显更现实的合成数据中。我们介绍了一个顺序扩展,以便引入我们训练的推出,我们训练用于预测现实看的合成场景的光流,并显示调节该模型的初始状态在一小组提示,例如第一帧中的物体的质量中心,是足以显着改善实例分割。这些福利超出了新型对象,新颖背景和更长的视频序列的培训分配。我们还发现,在推论期间可以使用这种初始状态调节作为对特定物体或物体部分的型号查询模型,这可能会为一系列弱监管方法铺平,并允许更有效的互动训练有素的型号。
translated by 谷歌翻译
无监督的视觉表示学习提供了一个机会,可以利用大型无标记轨迹的大型语料库形成有用的视觉表示,这可以使强化学习(RL)算法的培训受益。但是,评估此类表示的适应性需要培训RL算法,该算法在计算上是密集型且具有较高的差异结果。为了减轻此问题,我们为无监督的RL表示方案设计了一个评估协议,其差异较低,计算成本降低了600倍。受愿景社区的启发,我们提出了两个线性探测任务:预测在给定状态下观察到的奖励,并预测特定状态下专家的行动。这两个任务通常适用于许多RL域,我们通过严格的实验表明,它们与Atari100k基准的实际下游控制性能密切相关。这提供了一种更好的方法,可以探索预处理算法的空间,而无需为每个设置运行RL评估。利用这一框架,我们进一步改善了RL的现有自学学习(SSL)食谱,突出了前向模型的重要性,视觉骨架的大小以及无监督目标的精确配方。
translated by 谷歌翻译
Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to wellinformed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.
translated by 谷歌翻译
这本数字本书包含在物理模拟的背景下与深度学习相关的一切实际和全面的一切。尽可能多,所有主题都带有Jupyter笔记本的形式的动手代码示例,以便快速入门。除了标准的受监督学习的数据中,我们将看看物理丢失约束,更紧密耦合的学习算法,具有可微分的模拟,以及加强学习和不确定性建模。我们生活在令人兴奋的时期:这些方法具有从根本上改变计算机模拟可以实现的巨大潜力。
translated by 谷歌翻译
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.
translated by 谷歌翻译
我们提出了一种从基于隐式对象编码器,神经辐射字段(NERFS)和图神经网络的图像观测值中学习组成多对象动力学模型的方法。由于其强大的3D先验,NERF已成为代表场景的流行选择。但是,大多数NERF方法都在单个场景上进行了训练,以全球模型代表整个场景,从而对新型场景进行概括,其中包含不同数量的对象,具有挑战性。取而代之的是,我们提出了一个以对象为中心的自动编码器框架,该框架将场景的多个视图映射到一组分别表示每个对象的潜在向量。潜在矢量参数化可以从中重建场景的单个nerf。基于那些潜在向量,我们在潜在空间中训练图形神经网络动力学模型,以实现动力学预测的组成性。我们方法的一个关键特征是,潜在向量被迫通过NERF解码器编码3D信息,这使我们能够在学习动力学模型中纳入结构先验,从而使长期预测与多个基线相比更加稳定。模拟和现实世界的实验表明,我们的方法可以建模和学习构图场景的动态,包括刚性和可变形对象。视频:https://dannydriess.github.io/compnerfdyn/
translated by 谷歌翻译
组成概括是学习和决策的关键能力。我们专注于在面向对象的环境中进行强化学习的设置,以研究世界建模中的组成概括。我们(1)通过代数方法正式化组成概括问题,(2)研究世界模型如何实现这一目标。我们介绍了一个概念环境,对象库和两个实例,并部署了一条原则的管道来衡量概括能力。通过公式的启发,我们使用我们的框架分析了几种具有精确或没有组成概括能力的方法,并设计了一种可区分的方法,同构对象的世界模型(HOWM),可以实现柔软但更有效的组成概括。
translated by 谷歌翻译
Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks.
translated by 谷歌翻译
学习动态是机器学习(ML)的许多重要应用的核心,例如机器人和自主驾驶。在这些设置中,ML算法通常需要推理使用高维观察的物理系统,例如图像,而不访问底层状态。最近,已经提出了几种方法将从经典机制的前沿集成到ML模型中,以解决图像的物理推理的挑战。在这项工作中,我们清醒了这些模型的当前功能。为此,我们介绍一套由17个数据集组成的套件,该数据集基于具有呈现各种动态的物理系统的视觉观测。我们对几种强大的基线进行了彻底的和详细比较了物理启发方法的主要类别。虽然包含物理前沿的模型通常可以学习具有所需特性的潜在空间,但我们的结果表明这些方法无法显着提高标准技术。尽管如此,我们发现使用连续和时间可逆动力学的使用效益所有课程的模型。
translated by 谷歌翻译
当前独立于域的经典计划者需要问题域和实例作为输入的符号模型,从而导致知识采集瓶颈。同时,尽管深度学习在许多领域都取得了重大成功,但知识是在与符号系统(例如计划者)不兼容的亚符号表示中编码的。我们提出了Latplan,这是一种无监督的建筑,结合了深度学习和经典计划。只有一组未标记的图像对,显示了环境中允许的过渡子集(训练输入),Latplan学习了环境的完整命题PDDL动作模型。稍后,当给出代表初始状态和目标状态(计划输入)的一对图像时,Latplan在符号潜在空间中找到了目标状态的计划,并返回可视化的计划执行。我们使用6个计划域的基于图像的版本来评估LATPLAN:8个插头,15个式嘴,Blockworld,Sokoban和两个LightsOut的变体。
translated by 谷歌翻译
Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.
translated by 谷歌翻译
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
translated by 谷歌翻译
In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.
translated by 谷歌翻译
We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs offpolicy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.2x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features. Our code is open-sourced and available at https://www. github.com/MishaLaskin/curl.
translated by 谷歌翻译
尽管其作为当今机器学习模式的数据效率低下的突出解决方案,但自我监督的学习尚未从纯粹的多功能代理人的角度进行研究。在这项工作中,我们建议对齐内部主观表示,该主观表示自然地出现在多种子体的设置中,其中代理商接受相同的潜在环境国家的部分观察,可以导致更多的数据有效的陈述。我们提出了多种子体环境,其中代理商无法访问其他人的观察,而是可以在有限范围内进行通信,保证可以在个人代表学习中利用的常见上下文。原因是主观观察必然指的是基础环境国家的相同子集,并且关于这些国家的沟通可以自由地提供监督信号。为了突出沟通的重要性,我们将我们的设置称为\ Texit {社会监督的代表学习}。我们提出了一个由AutoEncoders群体组成的最小架构,在那里我们定义了丢失函数,捕获有效通信的不同方面,并检查它们对学习的表示的影响。我们表明我们所提出的架构允许出现对齐的表示。提出具有不同环境状态的不同观点引入的主观性有助于学习卓越的抽象表示,以满足环境状态的相同观点的单一AutoEncoder和AutoEncoders的群体。完全是,我们的结果表明了主观观点的沟通如何导致多助理系统中的更多抽象表现,开放有望的观点,以便在代表学习和紧急沟通的交叉口中进行未来的研究。
translated by 谷歌翻译