我们引入了一种内部重播的新方法,该方法根据网络深度调节排练的频率。虽然重播策略减轻了神经网络中灾难性遗忘的影响,但最近对生成重播的作品表明,仅在网络的更深层次上进行排练才能改善持续学习的性能。但是,生成方法引入了其他计算开销,从而限制了其应用程序。通过观察到的神经网络的早期层次忘记忘记了,我们建议在重播过程中使用中级功能更新频率不同的网络层。这通过省略了发电机的更深层和主要模型的早期层来减少计算负担。我们命名我们的方法渐进式潜在重播,并表明它在使用较少的资源时表现优于内部重播。
translated by 谷歌翻译
由于其非参数化干扰和灾难性遗忘的非参数化能力,核心连续学习\ Cite {derakhshani2021kernel}最近被成为一个强大的持续学习者。不幸的是,它的成功是以牺牲一个明确的内存为代价来存储来自过去任务的样本,这妨碍了具有大量任务的连续学习设置的可扩展性。在本文中,我们介绍了生成的内核持续学习,探讨了生成模型与内核之间的协同作用以进行持续学习。生成模型能够生产用于内核学习的代表性样本,其消除了在内核持续学习中对内存的依赖性。此外,由于我们仅在生成模型上重播,我们避免了与在整个模型上需要重播的先前的方法相比,在计算上更有效的情况下避免任务干扰。我们进一步引入了监督的对比正规化,使我们的模型能够为更好的基于内核的分类性能产生更具辨别性样本。我们对三种广泛使用的连续学习基准进行了广泛的实验,展示了我们贡献的能力和益处。最值得注意的是,在具有挑战性的SplitCifar100基准测试中,只需一个简单的线性内核,我们获得了与内核连续学习的相同的准确性,对于内存的十分之一,或者对于相同的内存预算的10.1%的精度增益。
translated by 谷歌翻译
我们研究深度神经网络中不同的输出层如何学习并忘记在持续的学习环境中。以下三个因素可能会影响输出层中的灾难性忘记:(1)权重修改,(2)干扰和(3)投影漂移。在本文中,我们的目标是提供更多关于如何改变输出层可以解决(1)和(2)的洞察。在几个连续学习情景中提出并评估了这些问题的一些潜在解决方案。我们表明,最佳执行类型的输出层取决于数据分布漂移和/或可用数据量。特别地,在某些情况下,在某些情况下,标准线性层将失败,结果改变参数化是足够的,以便实现显着更好的性能,从而引入持续学习算法,而是使用标准SGD训练模型。我们的分析和结果在连续学习场景中输出层动态的阐明,并表明了一种选择给定场景的最佳输出层的方法。
translated by 谷歌翻译
在不失去先前学习的情况下学习新任务和技能(即灾难性遗忘)是人为和生物神经网络的计算挑战,但是人工系统努力与其生物学类似物达成平等。哺乳动物的大脑采用众多神经手术来支持睡眠期间的持续学习。这些是人工适应的成熟。在这里,我们研究了建模哺乳动物睡眠的三个不同组成部分如何影响人工神经网络中的持续学习:(1)在非比型眼运动(NREM)睡眠期间观察到的垂直记忆重播过程; (2)链接到REM睡眠的生成记忆重播过程; (3)已提出的突触降压过程,以调整信噪比和支持神经保养。在评估持续学习CIFAR-100图像分类基准上的性能时,我们发现将所有三个睡眠组件的包含在内。在以后的任务期间,训练和灾难性遗忘在训练过程中提高了最高准确性。尽管某些灾难性遗忘在网络培训过程中持续存在,但更高水平的突触缩减水平会导致更好地保留早期任务,并进一步促进随后培训期间早期任务准确性的恢复。一个关键的要点是,在考虑使用突触缩小范围的水平时,手头有一个权衡 - 更具侵略性的缩减更好地保护早期任务,但较少的缩减可以增强学习新任务的能力。中级水平可以在训练过程中与最高的总体精度达到平衡。总体而言,我们的结果都提供了有关如何适应睡眠组件以增强人工连续学习系统的洞察力,并突出了未来神经科学睡眠研究的领域,以进一步进一步进行此类系统。
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF throughout the training process, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network's embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
translated by 谷歌翻译
恶意软件(恶意软件)分类为持续学习(CL)制度提供了独特的挑战,这是由于每天收到的新样本的数量以及恶意软件的发展以利用新漏洞。在典型的一天中,防病毒供应商将获得数十万个独特的软件,包括恶意和良性,并且在恶意软件分类器的一生中,有超过十亿个样品很容易积累。鉴于问题的规模,使用持续学习技术的顺序培训可以在减少培训和存储开销方面提供可观的好处。但是,迄今为止,还没有对CL应用于恶意软件分类任务的探索。在本文中,我们研究了11种应用于三个恶意软件任务的CL技术,涵盖了常见的增量学习方案,包括任务,类和域增量学习(IL)。具体而言,使用两个现实的大规模恶意软件数据集,我们评估了CL方法在二进制恶意软件分类(domain-il)和多类恶意软件家庭分类(Task-IL和类IL)任务上的性能。令我们惊讶的是,在几乎所有情况下,持续的学习方法显着不足以使训练数据的幼稚关节重播 - 在某些情况下,将精度降低了70个百分点以上。与关节重播相比,有选择性重播20%的存储数据的一种简单方法可以实现更好的性能,占训练时间的50%。最后,我们讨论了CL技术表现出乎意料差的潜在原因,希望它激发进一步研究在恶意软件分类域中更有效的技术。
translated by 谷歌翻译
Neural networks are prone to catastrophic forgetting when trained incrementally on different tasks. Popular incremental learning methods mitigate such forgetting by retaining a subset of previously seen samples and replaying them during the training on subsequent tasks. However, this is not always possible, e.g., due to data protection regulations. In such restricted scenarios, one can employ generative models to replay either artificial images or hidden features to a classifier. In this work, we propose Genifer (GENeratIve FEature-driven image Replay), where a generative model is trained to replay images that must induce the same hidden features as real samples when they are passed through the classifier. Our technique therefore incorporates the benefits of both image and feature replay, i.e.: (1) unlike conventional image replay, our generative model explicitly learns the distribution of features that are relevant for classification; (2) in contrast to feature replay, our entire classifier remains trainable; and (3) we can leverage image-space augmentations, which increase distillation performance while also mitigating overfitting during the training of the generative model. We show that Genifer substantially outperforms the previous state of the art for various settings on the CIFAR-100 and CUB-200 datasets.
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Anomaly Detection is a relevant problem that arises in numerous real-world applications, especially when dealing with images. However, there has been little research for this task in the Continual Learning setting. In this work, we introduce a novel approach called SCALE (SCALing is Enough) to perform Compressed Replay in a framework for Anomaly Detection in Continual Learning setting. The proposed technique scales and compresses the original images using a Super Resolution model which, to the best of our knowledge, is studied for the first time in the Continual Learning setting. SCALE can achieve a high level of compression while maintaining a high level of image reconstruction quality. In conjunction with other Anomaly Detection approaches, it can achieve optimal results. To validate the proposed approach, we use a real-world dataset of images with pixel-based anomalies, with the scope to provide a reliable benchmark for Anomaly Detection in the context of Continual Learning, serving as a foundation for further advancements in the field.
translated by 谷歌翻译
本文研究持续学习(CL)的逐步学习(CIL)。已经提出了许多方法来处理CIL中的灾难性遗忘(CF)。大多数方法都会为单个头网络中所有任务的所有类别构建单个分类器。为了防止CF,一种流行的方法是记住以前任务中的少数样本,并在培训新任务时重播它们。但是,这种方法仍然患有严重的CF,因为在内存中仅使用有限的保存样本数量来更新或调整了先前任务的参数。本文提出了一种完全不同的方法,该方法使用变压器网络为每个任务(称为多头模型)构建一个单独的分类器(头部),称为更多。与其在内存中使用保存的样本在现有方法中更新以前的任务/类的网络,不如利用保存的样本来构建特定任务分类器(添加新的分类头),而无需更新用于先前任务/类的网络。新任务的模型经过培训,可以学习任务的类别,并且还可以检测到不是从相同数据分布(即,均分布(OOD))的样本。这使测试实例属于的任务的分类器能够为正确的类产生高分,而其他任务的分类器可以产生低分,因为测试实例不是来自这些分类器的数据分布。实验结果表明,更多的表现优于最先进的基线,并且自然能够在持续学习环境中进行OOD检测。
translated by 谷歌翻译
持续学习的目标(CL)是随着时间的推移学习不同的任务。与CL相关的主要Desiderata是在旧任务上保持绩效,利用后者来改善未来任务的学习,并在培训过程中引入最小的开销(例如,不需要增长的模型或再培训)。我们建议通过固定密度的稀疏神经网络来解决这些避难所的神经启发性塑性适应(NISPA)体系结构。 NISPA形成了稳定的途径,可以从较旧的任务中保存知识。此外,NISPA使用连接重新设计来创建新的塑料路径,以重用有关新任务的现有知识。我们对EMNIST,FashionMnist,CIFAR10和CIFAR100数据集的广泛评估表明,NISPA的表现明显胜过代表性的最先进的持续学习基线,并且与盆地相比,它的可学习参数最多少了十倍。我们还认为稀疏是持续学习的重要组成部分。 NISPA代码可在https://github.com/burakgurbuz97/nispa上获得。
translated by 谷歌翻译
在线持续学习是一个充满挑战的学习方案,模型必须从非平稳的数据流中学习,其中每个样本只能看到一次。主要的挑战是在避免灾难性遗忘的同时逐步学习,即在从新数据中学习时忘记先前获得的知识的问题。在这种情况下,一种流行的解决方案是使用较小的内存来保留旧数据并随着时间的推移进行排练。不幸的是,由于内存尺寸有限,随着时间的推移,内存的质量会恶化。在本文中,我们提出了OLCGM,这是一种基于新型重放的持续学习策略,该策略使用知识冷凝技术连续压缩记忆并更好地利用其有限的尺寸。样品冷凝步骤压缩了旧样品,而不是像其他重播策略那样将其删除。结果,实验表明,每当与数据的复杂性相比,每当记忆预算受到限制,OLCGM都会提高与最先进的重播策略相比的最终准确性。
translated by 谷歌翻译
当随着时间的推移学习任务时,人工神经网络遭受称为灾难性遗忘(CF)的问题。当在训练网络的训练过程中覆盖网络的权重,导致忘记旧信息的新任务时,会发生这种情况。为了解决这个问题,我们提出了META可重复使用的知识或标记,这是一种新的方法,可以在学习新任务时促进重量可重用性而不是覆盖。具体来说,标记在任务之间保留一组共享权重。我们将这些共享权重设定为共同的知识库(KB),不仅用于学习新任务,而且还富有以丰富的新知识,因为模型了解新任务。标记背后的关键组件是两倍。一方面,冶金学习方法提供了逐步丰富KB的关键机制,并在任务之间促进重量可重用性。另一方面,一组培训掩模提供了选择性地从KB相关权重中选择的关键机制来解决每个任务。通过使用Mark,我们实现了最普遍的基准,在几个流行的基准中实现了最新的基准,在20分拆性MiniimAgenet数据集上超过了平均精度的最佳性能方法,同时使用55%的数量来实现几乎零遗忘参数。此外,消融研究提供了证据,实际上,标记正在学习每个任务选择性地使用的可重复使用的知识。
translated by 谷歌翻译
我们提出了一种有效的正则化战略(CW-TALAR),用于解决持续的学习问题。它使用由在由所有任务共享的底层神经网络的目标层上定义的两个概率分布之间的校准术语,该概率分布在由所有任务共享的底层神经网络的目标层,以及用于建模输出数据表示的克拉米 - WOLD发生器的简单架构。我们的策略在学习新任务时保留了目标层分发,但不需要记住以前的任务的数据集。我们执行涉及几个常见监督框架的实验,该框架证明了CW-TALAR方法的竞争力与一些现有的现有最先进的持续学习模型相比。
translated by 谷歌翻译
Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incoming data stream of a class is naturally multimodal. To address this issue, in this work, we first propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM). Specifically, the centroids and covariance matrices of the mixture model are adapted incrementally according to incoming data streams. The advantages are two-fold: (i) we can characterize more accurately complex data streams and (ii) by using centroids for each class produced by OT-MM, we can estimate the similarity of an unseen example to each class more reasonably when doing inference. Moreover, to combat the catastrophic forgetting in the CIL scenario, we further propose Dynamic Preservation. Particularly, after performing the dynamic preservation technique across data streams, the latent representations of the classes in the old and new tasks become more condensed themselves and more separate from each other. Together with a contraction feature extractor, this technique facilitates the model in mitigating the catastrophic forgetting. The experimental results on real-world datasets show that our proposed method can significantly outperform the current state-of-the-art baselines.
translated by 谷歌翻译
持续学习旨在通过以在线学习方式利用过去获得的知识,同时能够在所有以前的任务上表现良好,从而学习一系列任务,这对人工智能(AI)系统至关重要,因此持续学习与传统学习模式相比,更适合大多数现实和复杂的应用方案。但是,当前的模型通常在每个任务上的类标签上学习一个通用表示基础,并选择有效的策略来避免灾难性的遗忘。我们假设,仅从获得的知识中选择相关且有用的零件比利用整个知识更有效。基于这一事实,在本文中,我们提出了一个新框架,名为“选择相关的在线持续学习知识(SRKOCL),该框架结合了一种额外的有效频道注意机制,以选择每个任务的特定相关知识。我们的模型还结合了经验重播和知识蒸馏,以避免灾难性的遗忘。最后,在不同的基准上进行了广泛的实验,竞争性实验结果表明,我们提出的SRKOCL是针对最先进的承诺方法。
translated by 谷歌翻译
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
translated by 谷歌翻译
Continual Learning is considered a key step toward next-generation Artificial Intelligence. Among various methods, replay-based approaches that maintain and replay a small episodic memory of previous samples are one of the most successful strategies against catastrophic forgetting. However, since forgetting is inevitable given bounded memory and unbounded tasks, how to forget is a problem continual learning must address. Therefore, beyond simply avoiding catastrophic forgetting, an under-explored issue is how to reasonably forget while ensuring the merits of human memory, including 1. storage efficiency, 2. generalizability, and 3. some interpretability. To achieve these simultaneously, our paper proposes a new saliency-augmented memory completion framework for continual learning, inspired by recent discoveries in memory completion separation in cognitive neuroscience. Specifically, we innovatively propose to store the part of the image most important to the tasks in episodic memory by saliency map extraction and memory encoding. When learning new tasks, previous data from memory are inpainted by an adaptive data generation module, which is inspired by how humans complete episodic memory. The module's parameters are shared across all tasks and it can be jointly trained with a continual learning classifier as bilevel optimization. Extensive experiments on several continual learning and image classification benchmarks demonstrate the proposed method's effectiveness and efficiency.
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译