持续学习(CL)旨在制定模仿人类能力顺序学习新任务的能力,同时能够保留从过去经验获得的知识。在本文中,我们介绍了内存约束在线连续学习(MC-OCL)的新问题,这对存储器开销对可能算法可以用于避免灾难性遗忘的记忆开销。最多,如果不是全部,之前的CL方法违反了这些约束,我们向MC-OCL提出了一种算法解决方案:批量蒸馏(BLD),基于正则化的CL方法,有效地平衡了稳定性和可塑性,以便学习数据流,同时保留通过蒸馏解决旧任务的能力。我们在三个公开的基准测试中进行了广泛的实验评估,经验证明我们的方法成功地解决了MC-OCL问题,并实现了需要更高内存开销的先前蒸馏方法的可比准确性。
translated by 谷歌翻译
持续学习旨在快速,不断地从一系列任务中学习当前的任务。与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据扩展(MMKDDA)提出一个名为Meta学习更新的新框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了来自情节记忆和当前任务的样品,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们对四个基准数据集的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。
translated by 谷歌翻译
持续学习旨在通过以在线学习方式利用过去获得的知识,同时能够在所有以前的任务上表现良好,从而学习一系列任务,这对人工智能(AI)系统至关重要,因此持续学习与传统学习模式相比,更适合大多数现实和复杂的应用方案。但是,当前的模型通常在每个任务上的类标签上学习一个通用表示基础,并选择有效的策略来避免灾难性的遗忘。我们假设,仅从获得的知识中选择相关且有用的零件比利用整个知识更有效。基于这一事实,在本文中,我们提出了一个新框架,名为“选择相关的在线持续学习知识(SRKOCL),该框架结合了一种额外的有效频道注意机制,以选择每个任务的特定相关知识。我们的模型还结合了经验重播和知识蒸馏,以避免灾难性的遗忘。最后,在不同的基准上进行了广泛的实验,竞争性实验结果表明,我们提出的SRKOCL是针对最先进的承诺方法。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
在线持续学习是一个充满挑战的学习方案,模型必须从非平稳的数据流中学习,其中每个样本只能看到一次。主要的挑战是在避免灾难性遗忘的同时逐步学习,即在从新数据中学习时忘记先前获得的知识的问题。在这种情况下,一种流行的解决方案是使用较小的内存来保留旧数据并随着时间的推移进行排练。不幸的是,由于内存尺寸有限,随着时间的推移,内存的质量会恶化。在本文中,我们提出了OLCGM,这是一种基于新型重放的持续学习策略,该策略使用知识冷凝技术连续压缩记忆并更好地利用其有限的尺寸。样品冷凝步骤压缩了旧样品,而不是像其他重播策略那样将其删除。结果,实验表明,每当与数据的复杂性相比,每当记忆预算受到限制,OLCGM都会提高与最先进的重播策略相比的最终准确性。
translated by 谷歌翻译
人类智慧的主食是以不断的方式获取知识的能力。在Stark对比度下,深网络忘记灾难性,而且为此原因,类增量连续学习促进方法的子字段逐步学习一系列任务,将顺序获得的知识混合成综合预测。这项工作旨在评估和克服我们以前提案黑暗体验重播(Der)的陷阱,这是一种简单有效的方法,将排练和知识蒸馏结合在一起。灵感来自于我们的思想不断重写过去的回忆和对未来的期望,我们赋予了我的能力,即我的能力来修改其重播记忆,以欢迎有关过去数据II的新信息II)为学习尚未公开的课程铺平了道路。我们表明,这些策略的应用导致了显着的改进;实际上,得到的方法 - 被称为扩展-DAR(X-DER) - 优于标准基准(如CiFar-100和MiniimAgeNet)的技术状态,并且这里引入了一个新颖的。为了更好地了解,我们进一步提供了广泛的消融研究,以证实并扩展了我们以前研究的结果(例如,在持续学习设置中知识蒸馏和漂流最小值的价值)。
translated by 谷歌翻译
这项工作调查了持续学习(CL)与转移学习(TL)之间的纠缠。特别是,我们阐明了网络预训练的广泛应用,强调它本身受到灾难性遗忘的影响。不幸的是,这个问题导致在以后任务期间知识转移的解释不足。在此基础上,我们提出了转移而不忘记(TWF),这是在固定的经过预定的兄弟姐妹网络上建立的混合方法,该方法不断传播源域中固有的知识,通过层次损失项。我们的实验表明,TWF在各种设置上稳步优于其他CL方法,在各种数据集和不同的缓冲尺寸上,平均每种类型的精度增长了4.81%。
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译
A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overcome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.
translated by 谷歌翻译
恶意软件(恶意软件)分类为持续学习(CL)制度提供了独特的挑战,这是由于每天收到的新样本的数量以及恶意软件的发展以利用新漏洞。在典型的一天中,防病毒供应商将获得数十万个独特的软件,包括恶意和良性,并且在恶意软件分类器的一生中,有超过十亿个样品很容易积累。鉴于问题的规模,使用持续学习技术的顺序培训可以在减少培训和存储开销方面提供可观的好处。但是,迄今为止,还没有对CL应用于恶意软件分类任务的探索。在本文中,我们研究了11种应用于三个恶意软件任务的CL技术,涵盖了常见的增量学习方案,包括任务,类和域增量学习(IL)。具体而言,使用两个现实的大规模恶意软件数据集,我们评估了CL方法在二进制恶意软件分类(domain-il)和多类恶意软件家庭分类(Task-IL和类IL)任务上的性能。令我们惊讶的是,在几乎所有情况下,持续的学习方法显着不足以使训练数据的幼稚关节重播 - 在某些情况下,将精度降低了70个百分点以上。与关节重播相比,有选择性重播20%的存储数据的一种简单方法可以实现更好的性能,占训练时间的50%。最后,我们讨论了CL技术表现出乎意料差的潜在原因,希望它激发进一步研究在恶意软件分类域中更有效的技术。
translated by 谷歌翻译
Continual Learning (CL) is an emerging machine learning paradigm that aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. To avoid performance decrease caused by forgetting, prior studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. Despite the promising results, since CL is often assumed to execute on mobile or IoT devices, the EM size is bounded by the small hardware memory capacity and makes it infeasible to meet the accuracy requirements for real-world applications. Specifically, all prior CL methods discard samples overflowed from the EM and can never retrieve them back for subsequent training steps, incurring loss of information that would exacerbate catastrophic forgetting. We explore a novel hierarchical EM management strategy to address the forgetting issue. In particular, in mobile and IoT devices, real-time data can be stored not just in high-speed RAMs but in internal storage devices as well, which offer significantly larger capacity than the RAMs. Based on this insight, we propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage without being interfered by the slow access speed of the storage. We call it Carousel Memory (CarM). As CarM is complementary to existing CL methods, we conduct extensive evaluations of our method with seven popular CL methods and show that CarM significantly improves the accuracy of the methods across different settings by large margins in final average accuracy (up to 28.4%) while retaining the same training efficiency.
translated by 谷歌翻译
当自我监督的模型已经显示出比在规模上未标记的数据训练的情况下的监督对方的可比视觉表现。然而,它们的功效在持续的学习(CL)场景中灾难性地减少,其中数据被顺序地向模型呈现给模型。在本文中,我们表明,通过添加将表示的当前状态映射到其过去状态,可以通过添加预测的网络来无缝地转换为CL的蒸馏机制。这使我们能够制定一个持续自我监督的视觉表示的框架,学习(i)显着提高了学习象征的质量,(ii)与若干最先进的自我监督目标兼容(III)几乎没有近似参数调整。我们通过在各种CL设置中培训六种受欢迎的自我监督模型来证明我们的方法的有效性。
translated by 谷歌翻译
最近的自我监督学习方法能够学习高质量的图像表示,并通过监督方法关闭差距。但是,这些方法无法逐步获取新的知识 - 事实上,它们实际上主要仅用为具有IID数据的预训练阶段。在这项工作中,我们在没有额外的记忆或重放的情况下调查持续学习制度的自我监督方法。为防止忘记以前的知识,我们提出了功能正规化的使用。我们将表明,朴素的功能正则化,也称为特征蒸馏,导致可塑性的低可塑性,因此严重限制了连续的学习性能。为了解决这个问题,我们提出了预测的功能正则化,其中一个单独的投影网络确保新学习的特征空间保留了先前的特征空间的信息,同时允许学习新功能。这使我们可以防止在保持学习者的可塑性时忘记。针对应用于自我监督的其他增量学习方法的评估表明我们的方法在不同场景和多个数据集中获得竞争性能。
translated by 谷歌翻译
从非稳定性数据流不断学习是过去几年中日益普及的具有挑战性的研究课题。能够在高效,有效和可扩展的方式中不断地学习,适应和推广,是人工智能系统可持续发展的基础。然而,以持续学习的代理为中心的视图需要直接学习原始数据,这限制了独立代理,效率和当前方法的隐私之间的相互作用。相反,我们认为,持续学习系统应该利用经过培训的模型的形式利用压缩信息的可用性。在本文中,我们介绍并将一个名为“EX-Modul持续学习”(EXML)的新范式介绍并形式化,其中代理从一系列先前培训的模型而不是原始数据学习。我们进一步贡献了三种前模型连续学习算法和包括三个数据集(Mnist,CiFar-10和Core50)的经验设置,以及所提出的算法广泛测试的八种情况。最后,我们突出了前模式范式的特点,我们指出了有趣的未来研究方向。
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational cost, in which the main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones. With the specific focus on the class-incremental OCL scenario, i.e. OCL for classification, the recent advance incorporates the contrastive learning technique for learning more generalised feature representation to achieve the state-of-the-art performance but is still unable to fully resolve the catastrophic forgetting. In this paper, we follow the strategy of adopting contrastive learning but further introduce the semantically distinct augmentation technique, in which it leverages strong augmentation to generate more data samples, and we show that considering these samples semantically different from their original classes (thus being related to the out-of-distribution samples) in the contrastive learning mechanism contributes to alleviate forgetting and facilitate model stability. Moreover, in addition to contrastive learning, the typical classification mechanism and objective (i.e. softmax classifier and cross-entropy loss) are included in our model design for faster convergence and utilising the label information, but particularly equipped with a sampling strategy to tackle the tendency of favouring the new classes (i.e. model bias towards the recently learnt classes). Upon conducting extensive experiments on CIFAR-10, CIFAR-100, and Mini-Imagenet datasets, our proposed method is shown to achieve superior performance against various baselines.
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
本文研究持续学习(CL)的逐步学习(CIL)。已经提出了许多方法来处理CIL中的灾难性遗忘(CF)。大多数方法都会为单个头网络中所有任务的所有类别构建单个分类器。为了防止CF,一种流行的方法是记住以前任务中的少数样本,并在培训新任务时重播它们。但是,这种方法仍然患有严重的CF,因为在内存中仅使用有限的保存样本数量来更新或调整了先前任务的参数。本文提出了一种完全不同的方法,该方法使用变压器网络为每个任务(称为多头模型)构建一个单独的分类器(头部),称为更多。与其在内存中使用保存的样本在现有方法中更新以前的任务/类的网络,不如利用保存的样本来构建特定任务分类器(添加新的分类头),而无需更新用于先前任务/类的网络。新任务的模型经过培训,可以学习任务的类别,并且还可以检测到不是从相同数据分布(即,均分布(OOD))的样本。这使测试实例属于的任务的分类器能够为正确的类产生高分,而其他任务的分类器可以产生低分,因为测试实例不是来自这些分类器的数据分布。实验结果表明,更多的表现优于最先进的基线,并且自然能够在持续学习环境中进行OOD检测。
translated by 谷歌翻译