Real-world applications often require learning continuously from a stream of data under ever-changing conditions. When trying to learn from such non-stationary data, deep neural networks (DNNs) undergo catastrophic forgetting of previously learned information. Among the common approaches to avoid catastrophic forgetting, rehearsal-based methods have proven effective. However, they are still prone to forgetting due to task-interference as all parameters respond to all tasks. To counter this, we take inspiration from sparse coding in the brain and introduce dynamic modularity and sparsity (Dynamos) for rehearsal-based general continual learning. In this setup, the DNN learns to respond to stimuli by activating relevant subsets of neurons. We demonstrate the effectiveness of Dynamos on multiple datasets under challenging continual learning evaluation protocols. Finally, we show that our method learns representations that are modular and specialized, while maintaining reusability by activating subsets of neurons with overlaps corresponding to the similarity of stimuli.
translated by 谷歌翻译
AI的一个关键挑战是构建体现的系统,该系统在动态变化的环境中运行。此类系统必须适应更改任务上下文并持续学习。虽然标准的深度学习系统实现了最先进的静态基准的结果,但它们通常在动态方案中挣扎。在这些设置中,来自多个上下文的错误信号可能会彼此干扰,最终导致称为灾难性遗忘的现象。在本文中,我们将生物学启发的架构调查为对这些问题的解决方案。具体而言,我们表明树突和局部抑制系统的生物物理特性使网络能够以特定于上下文的方式动态限制和路由信息。我们的主要贡献如下。首先,我们提出了一种新颖的人工神经网络架构,该架构将活跃的枝形和稀疏表示融入了标准的深度学习框架中。接下来,我们在需要任务的适应性的两个单独的基准上研究这种架构的性能:Meta-World,一个机器人代理必须学习同时解决各种操纵任务的多任务强化学习环境;和一个持续的学习基准,其中模型的预测任务在整个训练中都会发生变化。对两个基准的分析演示了重叠但不同和稀疏的子网的出现,允许系统流动地使用最小的遗忘。我们的神经实现标志在单一架构上第一次在多任务和持续学习设置上取得了竞争力。我们的研究揭示了神经元的生物学特性如何通知深度学习系统,以解决通常不可能对传统ANN来解决的动态情景。
translated by 谷歌翻译
持续学习的目标(CL)是随着时间的推移学习不同的任务。与CL相关的主要Desiderata是在旧任务上保持绩效,利用后者来改善未来任务的学习,并在培训过程中引入最小的开销(例如,不需要增长的模型或再培训)。我们建议通过固定密度的稀疏神经网络来解决这些避难所的神经启发性塑性适应(NISPA)体系结构。 NISPA形成了稳定的途径,可以从较旧的任务中保存知识。此外,NISPA使用连接重新设计来创建新的塑料路径,以重用有关新任务的现有知识。我们对EMNIST,FashionMnist,CIFAR10和CIFAR100数据集的广泛评估表明,NISPA的表现明显胜过代表性的最先进的持续学习基线,并且与盆地相比,它的可学习参数最多少了十倍。我们还认为稀疏是持续学习的重要组成部分。 NISPA代码可在https://github.com/burakgurbuz97/nispa上获得。
translated by 谷歌翻译
模块化是持续学习(CL)的令人信服的解决方案,是相关任务建模的问题。学习和组合模块来解决不同的任务提供了一种抽象来解决CL的主要挑战,包括灾难性的遗忘,向后和向前传输跨任务以及子线性模型的增长。我们引入本地模块组成(LMC),该方法是模块化CL的方法,其中每个模块都提供了局部结构组件,其估计模块与输入的相关性。基于本地相关评分进行动态模块组合。我们展示了对任务身份(IDS)的不可知性来自(本地)结构学习,该结构学习是特定于模块和/或模型特定于以前的作品,使LMC适用于与以前的作品相比的更多CL设置。此外,LMC还跟踪输入分布的统计信息,并在检测到异常样本时添加新模块。在第一组实验中,LMC与最近的持续转移学习基准上的现有方法相比,不需要任务标识。在另一个研究中,我们表明结构学习的局部性允许LMC插入相关但未遵守的任务(OOD),以及在不同任务序列上独立于不同的任务序列培训的模块化网络,而无需任何微调。最后,在寻找LMC的限制,我们在30和100个任务的更具挑战性序列上研究它,展示了本地模块选择在存在大量候选模块时变得更具挑战性。在此设置中,与Oracle基准的基线相比,最佳执行LMC产生的模块更少,但它达到了较低的总体精度。 CodeBase可在https://github.com/oleksost/lmc下找到。
translated by 谷歌翻译
尽管深度神经网络(DNNS)在封闭世界的学习方案中取得了令人印象深刻的分类性能,但它们通常无法概括地在动态的开放世界环境中看不见的类别,在这种环境中,概念数量无界的数量。相反,人类和动物学习者具有通过识别和适应新颖观察结果来逐步更新知识的能力。特别是,人类通过独家(唯一)基本特征集来表征概念,这些特征既用于识别已知类别和识别新颖性。受到自然学习者的启发,我们引入了稀疏的高级独特,低水平共享的特征表示(Shels),同时鼓励学习独家的高级功能和必不可少的,共享的低级功能。高级功能的排他性使DNN能够自动检测到分布(OOD)数据,而通过稀疏的低级功能可以有效利用容量,可以容纳新知识。最终的方法使用OOD检测来执行班级持续学习,而没有已知的类边界。我们表明,使用木材进行新颖性检测导致对各种基准数据集的最新OOD检测方法的统计显着改善。此外,我们证明了木木模型在课堂学习环境中减轻灾难性的遗忘,从而实现了一个组合的新颖性检测和住宿框架,该框架支持在开放世界中学习
translated by 谷歌翻译
从非平稳的输入数据流进行连续/终身学习是智力的基石。尽管在各种应用中表现出色,但深度神经网络仍容易在学习新信息时忘记他们以前学习的信息。这种现象称为“灾难性遗忘”,深深地植根于稳定性困境。近年来,克服深层神经网络中的灾难性遗忘已成为一个积极的研究领域。特别是,基于梯度投射的方法最近在克服灾难性遗忘时表现出了出色的表现。本文提出了基于稀疏性和异质辍学的两种受生物学启发的机制,这些机制在长期的任务上显着提高了持续学习者的表现。我们提出的方法建立在梯度投影内存(GPM)框架上。我们利用神经网络的每一层中的K-获奖者激活来为每个任务执行层次稀疏激活,以及任务间的异质辍学,鼓励网络在不同任务之间使用非重叠的激活模式。此外,我们引入了两个新的基准,用于在分配转移下连续学习,即连续的瑞士卷和Imagenet Superdog-40。最后,我们对我们提出的方法进行了深入的分析,并证明了各种基准持续学习问题的显着性能。
translated by 谷歌翻译
机器学习的一个显着缺点是模型能够更快地解决新问题,而不会忘记获得的知识。为了更好地理解这个问题,已经出现了持续的学习来系统地调查学习协议,其中模型顺序地观察由一系列任务产生的样本。首先,我们提出了一种促进学习和遗忘之间进行权衡的最优性原则。我们从有界合理性的信息化学制定中获得了这一原则,并显示了与其他连续学习方法的联系。其次,基于这一原则,我们提出了一种神经网络层,用于持续学习,称为变分的专家(移动),缓解遗忘,同时使知识有益转移到新任务。我们对MNIST和CIFAR10数据集的变型的实验表明,与最先进的方法相比,移动层的竞争性能。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
The ability for an agent to continuously learn new skills without catastrophically forgetting existing knowledge is of critical importance for the development of generally intelligent agents. Most methods devised to address this problem depend heavily on well-defined task boundaries, and thus depend on human supervision. Our task-agnostic method, Self-Activating Neural Ensembles (SANE), uses a modular architecture designed to avoid catastrophic forgetting without making any such assumptions. At the beginning of each trajectory, a module in the SANE ensemble is activated to determine the agent's next policy. During training, new modules are created as needed and only activated modules are updated to ensure that unused modules remain unchanged. This system enables our method to retain and leverage old skills, while growing and learning new ones. We demonstrate our approach on visually rich procedurally generated environments.
translated by 谷歌翻译
由于灾难性的遗忘,计算系统的持续学习是挑战。我们在果蝇嗅觉系统中发现了两个层神经循环,通过独特地组合稀疏编码和关联学习来解决这一挑战。在第一层中,使用稀疏,高尺寸表示来编码气味,这通过激活非重叠神经元的神经元以进行不同气味来减少内存干扰。在第二层中,在学习期间仅修改异味活性神经元和与气味相关的输出神经元之间的突触;冻结其余重量以防止不相关的存储器被覆盖。我们经验和分析显示,这种简单轻型的算法显着提高了不断的学习性能。飞行关联学习算法与经典的Perceptron学习算法引人注目,尽管我们表现出两种修改对于减少灾难性遗忘至关重要。总体而言,果蝇演变了一种有效的终身学习算法,可以转换来自神经科学的电路机制以改善机器计算。
translated by 谷歌翻译
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
translated by 谷歌翻译
当在具有不同分布的数据集上不断学习时,神经网络往往会忘记以前学习的知识,这一现象被称为灾难性遗忘。数据集之间的分配更改会导致更多的遗忘。最近,基于参数 - 隔离的方法在克服遗忘时具有巨大的潜力。但是,当他们在培训过程中修复每个数据集的神经路径时,他们的概括不佳,并且在推断过程中需要数据集标签。此外,他们不支持向后的知识转移,因为它们优先于过去的数据。在本文中,我们提出了一种名为ADAPTCL的新的自适应学习方法,该方法完全重复使用并在学习的参数上生长,以克服灾难性的遗忘,并允许在不需要数据集标签的情况下进行积极的向后传输。我们提出的技术通过允许最佳的冷冻参数重复使用在相同的神经路径上生长。此外,它使用参数级数据驱动的修剪来为数据分配同等优先级。我们对MNIST变体,域和食物新鲜度检测数据集进行了广泛的实验,而无需数据集标签。结果表明,我们所提出的方法优于替代基线,可以最大程度地减少遗忘和实现积极的向后知识转移。
translated by 谷歌翻译
当随着时间的推移学习任务时,人工神经网络遭受称为灾难性遗忘(CF)的问题。当在训练网络的训练过程中覆盖网络的权重,导致忘记旧信息的新任务时,会发生这种情况。为了解决这个问题,我们提出了META可重复使用的知识或标记,这是一种新的方法,可以在学习新任务时促进重量可重用性而不是覆盖。具体来说,标记在任务之间保留一组共享权重。我们将这些共享权重设定为共同的知识库(KB),不仅用于学习新任务,而且还富有以丰富的新知识,因为模型了解新任务。标记背后的关键组件是两倍。一方面,冶金学习方法提供了逐步丰富KB的关键机制,并在任务之间促进重量可重用性。另一方面,一组培训掩模提供了选择性地从KB相关权重中选择的关键机制来解决每个任务。通过使用Mark,我们实现了最普遍的基准,在几个流行的基准中实现了最新的基准,在20分拆性MiniimAgenet数据集上超过了平均精度的最佳性能方法,同时使用55%的数量来实现几乎零遗忘参数。此外,消融研究提供了证据,实际上,标记正在学习每个任务选择性地使用的可重复使用的知识。
translated by 谷歌翻译
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.
translated by 谷歌翻译
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
translated by 谷歌翻译
已知生物制剂在他们的生活过程中学习许多不同的任务,并且能够重新审视以前的任务和行为,而没有表现不损失。相比之下,人工代理容易出于“灾难性遗忘”,在以前任务上的性能随着所获取的新的任务而恶化。最近使用该方法通过鼓励参数保持接近以前任务的方法来解决此缺点。这可以通过(i)使用特定的参数正常数来完成,该参数正常数是在参数空间中映射合适的目的地,或(ii)通过将渐变投影到不会干扰先前任务的子空间来指导优化旅程。然而,这些方法通常在前馈和经常性神经网络中表现出子分子表现,并且经常性网络对支持生物持续学习的神经动力学研究感兴趣。在这项工作中,我们提出了自然的持续学习(NCL),一种统一重量正则化和预测梯度下降的新方法。 NCL使用贝叶斯重量正常化来鼓励在收敛的所有任务上进行良好的性能,并将其与梯度投影结合使用先前的精度,这可以防止在优化期间陷入灾难性遗忘。当应用于前馈和经常性网络中的连续学习问题时,我们的方法占据了标准重量正则化技术和投影的方法。最后,训练有素的网络演变了特定于任务特定的动态,这些动态被认为是学习的新任务,类似于生物电路中的实验结果。
translated by 谷歌翻译
我们引入了一个新的培训范式,该范围对神经网络参数空间进行间隔约束以控制遗忘。当代持续学习(CL)方法从一系列数据流有效地培训神经网络,同时减少灾难性遗忘的负面影响,但它们不能提供任何确保的确保网络性能不会随着时间的流逝而无法控制地恶化。在这项工作中,我们展示了如何通过将模型的持续学习作为其参数空间的持续收缩来遗忘。为此,我们提出了Hypertrectangle训练,这是一种新的训练方法,其中每个任务都由参数空间中的超矩形表示,完全包含在先前任务的超矩形中。这种配方将NP-HARD CL问题降低到多项式时间,同时提供了完全防止遗忘的弹性。我们通过开发Intercontinet(间隔持续学习)算法来验证我们的主张,该算法利用间隔算术来有效地将参数区域建模为高矩形。通过实验结果,我们表明我们的方法在不连续的学习设置中表现良好,而无需存储以前的任务中的数据。
translated by 谷歌翻译