AI的一个关键挑战是构建体现的系统,该系统在动态变化的环境中运行。此类系统必须适应更改任务上下文并持续学习。虽然标准的深度学习系统实现了最先进的静态基准的结果,但它们通常在动态方案中挣扎。在这些设置中,来自多个上下文的错误信号可能会彼此干扰,最终导致称为灾难性遗忘的现象。在本文中,我们将生物学启发的架构调查为对这些问题的解决方案。具体而言,我们表明树突和局部抑制系统的生物物理特性使网络能够以特定于上下文的方式动态限制和路由信息。我们的主要贡献如下。首先,我们提出了一种新颖的人工神经网络架构,该架构将活跃的枝形和稀疏表示融入了标准的深度学习框架中。接下来,我们在需要任务的适应性的两个单独的基准上研究这种架构的性能:Meta-World,一个机器人代理必须学习同时解决各种操纵任务的多任务强化学习环境;和一个持续的学习基准,其中模型的预测任务在整个训练中都会发生变化。对两个基准的分析演示了重叠但不同和稀疏的子网的出现,允许系统流动地使用最小的遗忘。我们的神经实现标志在单一架构上第一次在多任务和持续学习设置上取得了竞争力。我们的研究揭示了神经元的生物学特性如何通知深度学习系统,以解决通常不可能对传统ANN来解决的动态情景。
translated by 谷歌翻译
我们研究了任务不合时宜的持续强化学习方法(tACRL)。 TACRL是一种结合了部分观察RL(任务不可知论的结果)和持续学习的困难(CL)的困难,即在任务的非平稳序列上学习。我们将tACRL方法与以前文献规定的软上限进行比较:多任务学习(MTL)方法,这些方法不必处理非平稳数据分布以及任务感知方法,这些方法可以在完整的情况下进行操作可观察性。我们考虑了先前未开发的基线,用于基于重播的复发性RL(3RL),其中我们增强了具有复发机制的RL算法,以减轻部分可观察性和经验经验的重播机制,以使CL中的灾难性遗忘。通过研究一系列RL任务的经验性能,我们发现3RL匹配并克服MTL和任务感知的软上限的情况令人惊讶。我们提出假设,可以解释不断的和任务不足学习研究的这个拐点。通过对流行的多任务和持续学习基准元世界的大规模研究,我们的假设在连续控制任务中进行了经验检验。通过分析包括梯度冲突在内的不同培训统计数据,我们发现证据表明3RL的表现超出其能够快速推断新任务与以前的任务的关系,从而实现前进的转移。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. While considerable research has focused on learning multiple input distributions, typically in classification, lifelong reinforcement learning (LRL) must also deal with variations in the state and transition distributions, and in the reward functions. Modulating masks, recently developed for classification, are particularly suitable to deal with such a large spectrum of task variations. In this paper, we adapted modulating masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison with LRL baselines in both discrete and continuous RL tasks shows competitive performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
translated by 谷歌翻译
人类通常通过将它们分解为更容易的子问题,然后结合子问题解决方案来解决复杂的问题。这种类型的组成推理允许在解决共享一部分基础构图结构的未来任务时重复使用子问题解决方案。在持续或终身的强化学习(RL)设置中,将知识分解为可重复使用的组件的能力将使代理通过利用积累的组成结构来快速学习新的RL任务。我们基于神经模块探索一种特定形式的组成形式,并提出了一组RL问题,可以直观地接受组成溶液。从经验上讲,我们证明了神经组成确实捕获了问题空间的基本结构。我们进一步提出了一种构图终身RL方法,该方法利用累积的神经成分来加速学习未来任务的学习,同时通过离线RL通过离线RL保留以前的RL,而不是重播经验。
translated by 谷歌翻译
在基于人工神经网络的终身学习系统中,最大的障碍之一是在遇到新信息时无法保留旧知识。这种现象被称为灾难性遗忘。在本文中,我们提出了一种新型的连接主义架构,即顺序的神经编码网络,在从数据点流中学习时忘记了,并且与当今的网络不同,它不会通过流行的错误反向传播来学习。基于预测性处理的神经认知理论,我们的模型以生物学上可行的方式适应了突触,而另一个神经系统学会了指导和控制这种类似皮层的结构,模仿了一些基础神经节的某些任务连续控制功能。在我们的实验中,我们证明了与标准神经模型相比,我们的自组织系统经历的遗忘大大降低,表现优于先前提出的方法,包括基于排练/数据缓冲的方法,包括标准(SplitMnist,SplitMnist,Split Mnist等) 。)和定制基准测试,即使以溪流式的方式进行了训练。我们的工作提供了证据表明,在实际神经元系统中模仿机制,例如本地学习,横向竞争,可以产生新的方向和可能性,以应对终身机器学习的巨大挑战。
translated by 谷歌翻译
The ability for an agent to continuously learn new skills without catastrophically forgetting existing knowledge is of critical importance for the development of generally intelligent agents. Most methods devised to address this problem depend heavily on well-defined task boundaries, and thus depend on human supervision. Our task-agnostic method, Self-Activating Neural Ensembles (SANE), uses a modular architecture designed to avoid catastrophic forgetting without making any such assumptions. At the beginning of each trajectory, a module in the SANE ensemble is activated to determine the agent's next policy. During training, new modules are created as needed and only activated modules are updated to ensure that unused modules remain unchanged. This system enables our method to retain and leverage old skills, while growing and learning new ones. We demonstrate our approach on visually rich procedurally generated environments.
translated by 谷歌翻译
A long-standing challenge in artificial intelligence is lifelong learning. In lifelong learning, many tasks are presented in sequence and learners must efficiently transfer knowledge between tasks while avoiding catastrophic forgetting over long lifetimes. On these problems, policy reuse and other multi-policy reinforcement learning techniques can learn many tasks. However, they can generate many temporary or permanent policies, resulting in memory issues. Consequently, there is a need for lifetime-scalable methods that continually refine a policy library of a pre-defined size. This paper presents a first approach to lifetime-scalable policy reuse. To pre-select the number of policies, a notion of task capacity, the maximal number of tasks that a policy can accurately solve, is proposed. To evaluate lifetime policy reuse using this method, two state-of-the-art single-actor base-learners are compared: 1) a value-based reinforcement learner, Deep Q-Network (DQN) or Deep Recurrent Q-Network (DRQN); and 2) an actor-critic reinforcement learner, Proximal Policy Optimisation (PPO) with or without Long Short-Term Memory layer. By selecting the number of policies based on task capacity, D(R)QN achieves near-optimal performance with 6 policies in a 27-task MDP domain and 9 policies in an 18-task POMDP domain; with fewer policies, catastrophic forgetting and negative transfer are observed. Due to slow, monotonic improvement, PPO requires fewer policies, 1 policy for the 27-task domain and 4 policies for the 18-task domain, but it learns the tasks with lower accuracy than D(R)QN. These findings validate lifetime-scalable policy reuse and suggest using D(R)QN for larger and PPO for smaller library sizes.
translated by 谷歌翻译
尽管深度强化学习(RL)最近取得了许多成功,但其方法仍然效率低下,这使得在数据方面解决了昂贵的许多问题。我们的目标是通过利用未标记的数据中的丰富监督信号来进行学习状态表示,以解决这一问题。本文介绍了三种不同的表示算法,可以访问传统RL算法使用的数据源的不同子集使用:(i)GRICA受到独立组件分析(ICA)的启发,并训练深层神经网络以输出统计独立的独立特征。输入。 Grica通过最大程度地减少每个功能与其他功能之间的相互信息来做到这一点。此外,格里卡仅需要未分类的环境状态。 (ii)潜在表示预测(LARP)还需要更多的上下文:除了要求状态作为输入外,它还需要先前的状态和连接它们的动作。该方法通过预测当前状态和行动的环境的下一个状态来学习状态表示。预测器与图形搜索算法一起使用。 (iii)重新培训通过训练深层神经网络来学习国家表示,以学习奖励功能的平滑版本。该表示形式用于预处理输入到深度RL,而奖励预测指标用于奖励成型。此方法仅需要环境中的状态奖励对学习表示表示。我们发现,每种方法都有其优势和缺点,并从我们的实验中得出结论,包括无监督的代表性学习在RL解决问题的管道中可以加快学习的速度。
translated by 谷歌翻译
元钢筋学习(Meta-RL)算法使得能够快速适应动态环境中的少量样本的任务。通过代理策略网络中的动态表示(通过推理关于任务上下文,模型参数更新或两者)获得的动态表示来实现这样的壮举。然而,由于在策略网络上满足不同的政策,因此获得了超越简单基准问题的快速适应的丰富动态表示是具有挑战性的。本文通过将神经调节引入模块化组件来解决挑战,以增加调节神经元活动的标准策略网络,以便为任务适应提供有效的动态表示。策略网络的建议扩展是在越来越复杂的多个离散和连续控制环境中进行评估。为了证明在Meta-R1中的延伸的一般性和益处,将神经调序的网络应用于两个最先进的META-RL算法(胱瓦和珍珠)。结果表明,与基线相比,通过神经调节增强的Meta-R1产生明显更好的结果和更丰富的动态表示。
translated by 谷歌翻译
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.
translated by 谷歌翻译
人类可以通过最小的相互干扰连续学习几项任务,但一次接受多个任务进行培训时的表现较差。标准深神经网络相反。在这里,我们提出了针对人工神经网络的新型计算限制,灵感来自灵长类动物前额叶皮层的较​​早作品,以捕获交织训练的成本,并允许网络在不忘记的情况下按顺序学习两个任务。我们通过两个算法主题,所谓的“呆滞”任务单元和HEBBIAN训练步骤增强了标准随机梯度下降,该步骤加强了任务单元和编码与任务相关信息的隐藏单元之间的连接。我们发现,“缓慢”的单元在培训期间引入了转换成本,该单元在交错训练下偏向表示的表示,而忽略了上下文提示的联合表示,而Hebbian步骤则促进了从任务单元到隐藏层的门控方案的形成这会产生正交表示,完全防止干扰。在先前发布的人类行为数据上验证该模型表明,它与接受过封锁或交错课程训练的参与者的表现相匹配,并且这些绩效差异是由真实类别边界的误解驱动的。
translated by 谷歌翻译
深度神经网络的强大学习能力使强化学习者能够直接从连续环境中学习有效的控制政策。从理论上讲,为了实现稳定的性能,神经网络假设I.I.D.不幸的是,在训练数据在时间上相关且非平稳的一般强化学习范式中,输入不存在。这个问题可能导致“灾难性干扰”和性能崩溃的现象。在本文中,我们提出智商,即干涉意识深度Q学习,以减轻单任务深度加固学习中的灾难性干扰。具体来说,我们求助于在线聚类,以实现在线上下文部门,以及一个多头网络和一个知识蒸馏正规化术语,用于保留学习上下文的政策。与现有方法相比,智商基于深Q网络,始终如一地提高稳定性和性能,并通过对经典控制和ATARI任务进行了广泛的实验。该代码可在以下网址公开获取:https://github.com/sweety-dm/interference-aware-ware-deep-q-learning。
translated by 谷歌翻译
我们开发了一种新的持续元学习方法,以解决连续多任务学习中的挑战。在此设置中,代理商的目标是快速通过任何任务序列实现高奖励。先前的Meta-Creenifiltive学习算法已经表现出有希望加速收购新任务的结果。但是,他们需要在培训期间访问所有任务。除了简单地将过去的经验转移到新任务,我们的目标是设计学习学习的持续加强学习算法,使用他们以前任务的经验更快地学习新任务。我们介绍了一种新的方法,连续的元策略搜索(Comps),通过以增量方式,在序列中的每个任务上,通过序列的每个任务来消除此限制,而无需重新访问先前的任务。 Comps持续重复两个子程序:使用RL学习新任务,并使用RL的经验完全离线Meta学习,为后续任务学习做好准备。我们发现,在若干挑战性连续控制任务的旧序列上,Comps优于持续的持续学习和非政策元增强方法。
translated by 谷歌翻译
在本文中,我们通过神经生成编码的神经认知计算框架(NGC)提出了一种无反向传播的方法,以机器人控制(NGC),设计了一种完全由强大的预测性编码/处理电路构建的代理,体现计划的原则。具体而言,我们制作了一种自适应剂系统,我们称之为主动预测性编码(ACTPC),该系统可以平衡内部生成的认知信号(旨在鼓励智能探索)与内部生成的仪器信号(旨在鼓励寻求目标行为)最终学习如何使用现实的机器人模拟器(即超现实的机器人套件)来控制各种模拟机器人系统以及复杂的机器人臂,以解决块提升任务并可能选择问题。值得注意的是,我们的实验结果表明,我们提出的ACTPC代理在面对稀疏(外部)奖励信号方面表现良好,并且具有竞争力或竞争性或胜过几种强大的基于反向Prop的RL方法。
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
在人类中,感知意识促进了来自感官输入的快速识别和提取信息。这种意识在很大程度上取决于人类代理人如何与环境相互作用。在这项工作中,我们提出了主动神经生成编码,用于学习动作驱动的生成模型的计算框架,而不会在动态环境中反正出错误(Backprop)。具体而言,我们开发了一种智能代理,即使具有稀疏奖励,也可以从规划的认知理论中汲取灵感。我们展示了我们框架与深度Q学习竞争力的几个简单的控制问题。我们的代理的强劲表现提供了有希望的证据,即神经推断和学习的无背方法可以推动目标定向行为。
translated by 谷歌翻译
The ability to sequentially learn multiple tasks without forgetting is a key skill of biological brains, whereas it represents a major challenge to the field of deep learning. To avoid catastrophic forgetting, various continual learning (CL) approaches have been devised. However, these usually require discrete task boundaries. This requirement seems biologically implausible and often limits the application of CL methods in the real world where tasks are not always well defined. Here, we take inspiration from neuroscience, where sparse, non-overlapping neuronal representations have been suggested to prevent catastrophic forgetting. As in the brain, we argue that these sparse representations should be chosen on the basis of feed forward (stimulus-specific) as well as top-down (context-specific) information. To implement such selective sparsity, we use a bio-plausible form of hierarchical credit assignment known as Deep Feedback Control (DFC) and combine it with a winner-take-all sparsity mechanism. In addition to sparsity, we introduce lateral recurrent connections within each layer to further protect previously learned representations. We evaluate the new sparse-recurrent version of DFC on the split-MNIST computer vision benchmark and show that only the combination of sparsity and intra-layer recurrent connections improves CL performance with respect to standard backpropagation. Our method achieves similar performance to well-known CL methods, such as Elastic Weight Consolidation and Synaptic Intelligence, without requiring information about task boundaries. Overall, we showcase the idea of adopting computational principles from the brain to derive new, task-free learning algorithms for CL.
translated by 谷歌翻译
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
translated by 谷歌翻译
为了在专门的神经形态硬件中进行节能计算,我们提出了尖峰神经编码,这是基于预测性编码理论的人工神经模型家族的实例化。该模型是同类模型,它是通过在“猜测和检查”的永无止境过程中运行的,神经元可以预测彼此的活动值,然后调整自己的活动以做出更好的未来预测。我们系统的互动性,迭代性质非常适合感官流预测的连续时间表述,并且如我们所示,模型的结构产生了局部突触更新规则,可以用来补充或作为在线峰值定位的替代方案依赖的可塑性。在本文中,我们对模型的实例化进行了实例化,该模型包括泄漏的集成和火灾单元。但是,我们系统所在的框架自然可以结合更复杂的神经元,例如Hodgkin-Huxley模型。我们在模式识别方面的实验结果证明了当二进制尖峰列车是通信间通信的主要范式时,模型的潜力。值得注意的是,尖峰神经编码在分类绩效方面具有竞争力,并且在从任务序列中学习时会降低遗忘,从而提供了更经济的,具有生物学上的替代品,可用于流行的人工神经网络。
translated by 谷歌翻译