持续学习的目标(CL)是随着时间的推移学习不同的任务。与CL相关的主要Desiderata是在旧任务上保持绩效,利用后者来改善未来任务的学习,并在培训过程中引入最小的开销(例如,不需要增长的模型或再培训)。我们建议通过固定密度的稀疏神经网络来解决这些避难所的神经启发性塑性适应(NISPA)体系结构。 NISPA形成了稳定的途径,可以从较旧的任务中保存知识。此外,NISPA使用连接重新设计来创建新的塑料路径,以重用有关新任务的现有知识。我们对EMNIST,FashionMnist,CIFAR10和CIFAR100数据集的广泛评估表明,NISPA的表现明显胜过代表性的最先进的持续学习基线,并且与盆地相比,它的可学习参数最多少了十倍。我们还认为稀疏是持续学习的重要组成部分。 NISPA代码可在https://github.com/burakgurbuz97/nispa上获得。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
在持续学习中使用神经网络中的任务特定组件(CL)是一种令人信服的策略,可以解决固定容量模型中稳定性 - 塑性困境,而无需访问过去的数据。当前方法仅着重于选择一个新任务的子网络,以减少忘记过去任务。但是,这种选择可能会限制有助于将来学习的相关过去知识的前瞻性转移。我们的研究表明,当统一的分类器用于所有类别的任务课程学习(class-il)时,共同满足这两个目标是更具挑战性的,因为这很容易跨越任务之间的类之间的歧义。此外,当跨任务的课程相似性增加时,挑战就会增加。为了应对这一挑战,我们提出了一种名为AFAF的新CL方法,旨在避免忘记并允许使用Fix-apainality模型在IL类中向前转移。 AFAF分配了一个子网络,该子网络可以选择性地转移相关知识到新任务,同时保留过去的知识,重复一些先前分配的组件以利用固定容量,并在存在相似之处时解决类型。该实验表明,AFAF在为模型提供多种CL所需属性方面的有效性,同时在具有不同语义相似性的各种具有挑战性的基准上优于最先进的方法。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
一组复杂的机制促进了大脑中的持续学习(CL)。这包括用于整合信息的多个内存系统的相互作用,如互补学习系统(CLS)理论和突触巩固,以保护获得的知识免受擦除。因此,我们提出了一种通用CL方法,该方法在突触巩固和双重记忆体验重播(Synergy)之间产生协同作用。我们的方法保持语义记忆,该记忆积累并巩固了整个任务中的信息,并与情节内存进行交互以有效重播。它通过跟踪训练轨迹期间参数的重要性并将其固定在语义内存中的巩固参数中,进一步采用了突触巩固。据我们所知,我们的研究是第一个与突触合并一起使用双重记忆体验重播的,该合并适用于一般CL,网络在培训或推理过程中不利用任务边界或任务标签。我们对各种具有挑战性的CL情景和特征分析的评估表明,将突触巩固和CLS理论纳入启用DNN中的有效CL的功效。
translated by 谷歌翻译
尽管深度神经网络(DNNS)在封闭世界的学习方案中取得了令人印象深刻的分类性能,但它们通常无法概括地在动态的开放世界环境中看不见的类别,在这种环境中,概念数量无界的数量。相反,人类和动物学习者具有通过识别和适应新颖观察结果来逐步更新知识的能力。特别是,人类通过独家(唯一)基本特征集来表征概念,这些特征既用于识别已知类别和识别新颖性。受到自然学习者的启发,我们引入了稀疏的高级独特,低水平共享的特征表示(Shels),同时鼓励学习独家的高级功能和必不可少的,共享的低级功能。高级功能的排他性使DNN能够自动检测到分布(OOD)数据,而通过稀疏的低级功能可以有效利用容量,可以容纳新知识。最终的方法使用OOD检测来执行班级持续学习,而没有已知的类边界。我们表明,使用木材进行新颖性检测导致对各种基准数据集的最新OOD检测方法的统计显着改善。此外,我们证明了木木模型在课堂学习环境中减轻灾难性的遗忘,从而实现了一个组合的新颖性检测和住宿框架,该框架支持在开放世界中学习
translated by 谷歌翻译
AI的一个关键挑战是构建体现的系统,该系统在动态变化的环境中运行。此类系统必须适应更改任务上下文并持续学习。虽然标准的深度学习系统实现了最先进的静态基准的结果,但它们通常在动态方案中挣扎。在这些设置中,来自多个上下文的错误信号可能会彼此干扰,最终导致称为灾难性遗忘的现象。在本文中,我们将生物学启发的架构调查为对这些问题的解决方案。具体而言,我们表明树突和局部抑制系统的生物物理特性使网络能够以特定于上下文的方式动态限制和路由信息。我们的主要贡献如下。首先,我们提出了一种新颖的人工神经网络架构,该架构将活跃的枝形和稀疏表示融入了标准的深度学习框架中。接下来,我们在需要任务的适应性的两个单独的基准上研究这种架构的性能:Meta-World,一个机器人代理必须学习同时解决各种操纵任务的多任务强化学习环境;和一个持续的学习基准,其中模型的预测任务在整个训练中都会发生变化。对两个基准的分析演示了重叠但不同和稀疏的子网的出现,允许系统流动地使用最小的遗忘。我们的神经实现标志在单一架构上第一次在多任务和持续学习设置上取得了竞争力。我们的研究揭示了神经元的生物学特性如何通知深度学习系统,以解决通常不可能对传统ANN来解决的动态情景。
translated by 谷歌翻译
Sparse neural networks attract increasing interest as they exhibit comparable performance to their dense counterparts while being computationally efficient. Pruning the dense neural networks is among the most widely used methods to obtain a sparse neural network. Driven by the high training cost of such methods that can be unaffordable for a low-resource device, training sparse neural networks sparsely from scratch has recently gained attention. However, existing sparse training algorithms suffer from various issues, including poor performance in high sparsity scenarios, computing dense gradient information during training, or pure random topology search. In this paper, inspired by the evolution of the biological brain and the Hebbian learning theory, we present a new sparse training approach that evolves sparse neural networks according to the behavior of neurons in the network. Concretely, by exploiting the cosine similarity metric to measure the importance of the connections, our proposed method, Cosine similarity-based and Random Topology Exploration (CTRE), evolves the topology of sparse neural networks by adding the most important connections to the network without calculating dense gradient in the backward. We carried out different experiments on eight datasets, including tabular, image, and text datasets, and demonstrate that our proposed method outperforms several state-of-the-art sparse training algorithms in extremely sparse neural networks by a large gap. The implementation code is available on https://github.com/zahraatashgahi/CTRE
translated by 谷歌翻译
恶意软件(恶意软件)分类为持续学习(CL)制度提供了独特的挑战,这是由于每天收到的新样本的数量以及恶意软件的发展以利用新漏洞。在典型的一天中,防病毒供应商将获得数十万个独特的软件,包括恶意和良性,并且在恶意软件分类器的一生中,有超过十亿个样品很容易积累。鉴于问题的规模,使用持续学习技术的顺序培训可以在减少培训和存储开销方面提供可观的好处。但是,迄今为止,还没有对CL应用于恶意软件分类任务的探索。在本文中,我们研究了11种应用于三个恶意软件任务的CL技术,涵盖了常见的增量学习方案,包括任务,类和域增量学习(IL)。具体而言,使用两个现实的大规模恶意软件数据集,我们评估了CL方法在二进制恶意软件分类(domain-il)和多类恶意软件家庭分类(Task-IL和类IL)任务上的性能。令我们惊讶的是,在几乎所有情况下,持续的学习方法显着不足以使训练数据的幼稚关节重播 - 在某些情况下,将精度降低了70个百分点以上。与关节重播相比,有选择性重播20%的存储数据的一种简单方法可以实现更好的性能,占训练时间的50%。最后,我们讨论了CL技术表现出乎意料差的潜在原因,希望它激发进一步研究在恶意软件分类域中更有效的技术。
translated by 谷歌翻译
模块化是持续学习(CL)的令人信服的解决方案,是相关任务建模的问题。学习和组合模块来解决不同的任务提供了一种抽象来解决CL的主要挑战,包括灾难性的遗忘,向后和向前传输跨任务以及子线性模型的增长。我们引入本地模块组成(LMC),该方法是模块化CL的方法,其中每个模块都提供了局部结构组件,其估计模块与输入的相关性。基于本地相关评分进行动态模块组合。我们展示了对任务身份(IDS)的不可知性来自(本地)结构学习,该结构学习是特定于模块和/或模型特定于以前的作品,使LMC适用于与以前的作品相比的更多CL设置。此外,LMC还跟踪输入分布的统计信息,并在检测到异常样本时添加新模块。在第一组实验中,LMC与最近的持续转移学习基准上的现有方法相比,不需要任务标识。在另一个研究中,我们表明结构学习的局部性允许LMC插入相关但未遵守的任务(OOD),以及在不同任务序列上独立于不同的任务序列培训的模块化网络,而无需任何微调。最后,在寻找LMC的限制,我们在30和100个任务的更具挑战性序列上研究它,展示了本地模块选择在存在大量候选模块时变得更具挑战性。在此设置中,与Oracle基准的基线相比,最佳执行LMC产生的模块更少,但它达到了较低的总体精度。 CodeBase可在https://github.com/oleksost/lmc下找到。
translated by 谷歌翻译
我们引入了一个新的培训范式,该范围对神经网络参数空间进行间隔约束以控制遗忘。当代持续学习(CL)方法从一系列数据流有效地培训神经网络,同时减少灾难性遗忘的负面影响,但它们不能提供任何确保的确保网络性能不会随着时间的流逝而无法控制地恶化。在这项工作中,我们展示了如何通过将模型的持续学习作为其参数空间的持续收缩来遗忘。为此,我们提出了Hypertrectangle训练,这是一种新的训练方法,其中每个任务都由参数空间中的超矩形表示,完全包含在先前任务的超矩形中。这种配方将NP-HARD CL问题降低到多项式时间,同时提供了完全防止遗忘的弹性。我们通过开发Intercontinet(间隔持续学习)算法来验证我们的主张,该算法利用间隔算术来有效地将参数区域建模为高矩形。通过实验结果,我们表明我们的方法在不连续的学习设置中表现良好,而无需存储以前的任务中的数据。
translated by 谷歌翻译
当在具有不同分布的数据集上不断学习时,神经网络往往会忘记以前学习的知识,这一现象被称为灾难性遗忘。数据集之间的分配更改会导致更多的遗忘。最近,基于参数 - 隔离的方法在克服遗忘时具有巨大的潜力。但是,当他们在培训过程中修复每个数据集的神经路径时,他们的概括不佳,并且在推断过程中需要数据集标签。此外,他们不支持向后的知识转移,因为它们优先于过去的数据。在本文中,我们提出了一种名为ADAPTCL的新的自适应学习方法,该方法完全重复使用并在学习的参数上生长,以克服灾难性的遗忘,并允许在不需要数据集标签的情况下进行积极的向后传输。我们提出的技术通过允许最佳的冷冻参数重复使用在相同的神经路径上生长。此外,它使用参数级数据驱动的修剪来为数据分配同等优先级。我们对MNIST变体,域和食物新鲜度检测数据集进行了广泛的实验,而无需数据集标签。结果表明,我们所提出的方法优于替代基线,可以最大程度地减少遗忘和实现积极的向后知识转移。
translated by 谷歌翻译
Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.
translated by 谷歌翻译
持续学习研究的主要重点领域是通过设计新算法对分布变化更强大的新算法来减轻神经网络中的“灾难性遗忘”问题。尽管持续学习文献的最新进展令人鼓舞,但我们对神经网络的特性有助于灾难性遗忘的理解仍然有限。为了解决这个问题,我们不关注持续的学习算法,而是在这项工作中专注于模型本身,并研究神经网络体系结构对灾难性遗忘的“宽度”的影响,并表明宽度在遗忘遗产方面具有出人意料的显着影响。为了解释这种效果,我们从各个角度研究网络的学习动力学,例如梯度正交性,稀疏性和懒惰的培训制度。我们提供了与不同架构和持续学习基准之间的经验结果一致的潜在解释。
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
translated by 谷歌翻译
A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles.
translated by 谷歌翻译
尽管对历史数据的访问权限有限,但在不断学习的情况下,重播方法已成功地减轻灾难性遗忘而成功。但是,在许多现实世界中,存储历史数据都很便宜,但是由于处理时间限制,将禁止重播所有历史数据。在这种情况下,我们建议学习学习的时间来学习连续学习系统,在该系统中,我们将学习重播时间表,以在不同的时间步骤中重播哪些任务。为了证明学习时间的重要性,我们首先使用蒙特卡洛树搜索来找到适当的重播时间表,并表明它可以根据持续的学习表现优于固定的调度策略。此外,为了提高调度效率本身,我们建议使用强化学习来学习重播调度策略,这些策略可以推广到新的持续学习场景而不增加计算成本。在我们的实验中,我们展示了学习学习时间的优势,这使当前的持续学习研究更接近现实世界的需求。
translated by 谷歌翻译