Continual Learning is considered a key step toward next-generation Artificial Intelligence. Among various methods, replay-based approaches that maintain and replay a small episodic memory of previous samples are one of the most successful strategies against catastrophic forgetting. However, since forgetting is inevitable given bounded memory and unbounded tasks, how to forget is a problem continual learning must address. Therefore, beyond simply avoiding catastrophic forgetting, an under-explored issue is how to reasonably forget while ensuring the merits of human memory, including 1. storage efficiency, 2. generalizability, and 3. some interpretability. To achieve these simultaneously, our paper proposes a new saliency-augmented memory completion framework for continual learning, inspired by recent discoveries in memory completion separation in cognitive neuroscience. Specifically, we innovatively propose to store the part of the image most important to the tasks in episodic memory by saliency map extraction and memory encoding. When learning new tasks, previous data from memory are inpainted by an adaptive data generation module, which is inspired by how humans complete episodic memory. The module's parameters are shared across all tasks and it can be jointly trained with a continual learning classifier as bilevel optimization. Extensive experiments on several continual learning and image classification benchmarks demonstrate the proposed method's effectiveness and efficiency.
translated by 谷歌翻译
持续学习旨在通过以在线学习方式利用过去获得的知识,同时能够在所有以前的任务上表现良好,从而学习一系列任务,这对人工智能(AI)系统至关重要,因此持续学习与传统学习模式相比,更适合大多数现实和复杂的应用方案。但是,当前的模型通常在每个任务上的类标签上学习一个通用表示基础,并选择有效的策略来避免灾难性的遗忘。我们假设,仅从获得的知识中选择相关且有用的零件比利用整个知识更有效。基于这一事实,在本文中,我们提出了一个新框架,名为“选择相关的在线持续学习知识(SRKOCL),该框架结合了一种额外的有效频道注意机制,以选择每个任务的特定相关知识。我们的模型还结合了经验重播和知识蒸馏,以避免灾难性的遗忘。最后,在不同的基准上进行了广泛的实验,竞争性实验结果表明,我们提出的SRKOCL是针对最先进的承诺方法。
translated by 谷歌翻译
持续学习旨在快速,不断地从一系列任务中学习当前的任务。与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据扩展(MMKDDA)提出一个名为Meta学习更新的新框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了来自情节记忆和当前任务的样品,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们对四个基准数据集的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
多任务学习是一个框架,可执行多个学习任务以共享知识以提高其概括能力。虽然浅做多任务学习可以学习任务关系,但它只能处理预定义的功能。现代深度多任务学习可以共同学习潜在的功能和任务共享,但任务关系却很晦涩。同样,他们预先定义哪些层和神经元应该跨任务共享,并且不能适应地学习。为了应对这些挑战,本文提出了一个新的多任务学习框架,该框架通过补充现有浅层和深层多任务学习方案的强度,共同学习潜在特征和明确的任务关系。具体而言,我们建议将任务关系建模为任务输入梯度之间的相似性,并对它们的等效性进行理论分析。此外,我们创新地提出了一个多任务学习目标,该目标可以通过新的正规机明确学习任务关系。理论分析表明,由于提出的正常化程序,概括性误差已减少。在多个多任务学习和图像分类基准上进行的广泛实验证明了所提出的方法有效性,效率以及在学习任务关系模式中的合理性。
translated by 谷歌翻译
Data-Free Class Incremental Learning (DFCIL) aims to sequentially learn tasks with access only to data from the current one. DFCIL is of interest because it mitigates concerns about privacy and long-term storage of data, while at the same time alleviating the problem of catastrophic forgetting in incremental learning. In this work, we introduce robust saliency guidance for DFCIL and propose a new framework, which we call RObust Saliency Supervision (ROSS), for mitigating the negative effect of saliency drift. Firstly, we use a teacher-student architecture leveraging low-level tasks to supervise the model with global saliency. We also apply boundary-guided saliency to protect it from drifting across object boundaries at intermediate layers. Finally, we introduce a module for injecting and recovering saliency noise to increase robustness of saliency preservation. Our experiments demonstrate that our method can retain better saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet and ImageNet-Subset DFCIL benchmarks. Code will be made publicly available.
translated by 谷歌翻译
持续学习(CL)旨在开发单一模型适应越来越多的任务的技术,从而潜在地利用跨任务的学习以资源有效的方式。 CL系统的主要挑战是灾难性的遗忘,在学习新任务时忘记了早期的任务。为了解决此问题,基于重播的CL方法在遇到遇到任务中选择的小缓冲区中维护和重复培训。我们提出梯度Coreset重放(GCR),一种新颖的重播缓冲区选择和使用仔细设计的优化标准的更新策略。具体而言,我们选择并维护一个“Coreset”,其与迄今为止关于当前模型参数的所有数据的梯度紧密近似,并讨论其有效应用于持续学习设置所需的关键策略。在学习的离线持续学习环境中,我们在最先进的最先进的最先进的持续学习环境中表现出显着的收益(2%-4%)。我们的调查结果还有效地转移到在线/流媒体CL设置,从而显示现有方法的5%。最后,我们展示了持续学习的监督对比损失的价值,当与我们的子集选择策略相结合时,累计增益高达5%。
translated by 谷歌翻译
持续学习需要模型来学习新任务,同时保持先前学识到的知识。已经提出了各种算法来解决这一真正的挑战。到目前为止,基于排练的方法,例如经验重播,取得了最先进的性能。这些方法将过去任务的一小部分保存为内存缓冲区,以防止模型忘记以前学识的知识。但是,它们中的大多数情况都同样对待每一个新任务,即,在学习不同的新任务时修复了框架的超级参数。这样的设置缺乏对过去和新任务之间的关系/相似性的考虑。例如,与从公共汽车中学到的人相比,从狗的知识/特征比识别猫(新任务)更有益。在这方面,我们提出了一种基于BI级优化的元学习算法,以便自适应地调整从过去和新任务中提取的知识之间的关系。因此,该模型可以在持续学习期间找到适当的梯度方向,避免在内存缓冲区上的严重过度拟合问题。广泛的实验是在三个公开的数据集(即CiFar-10,CiFar-100和微小想象网)上进行的。实验结果表明,该方法可以一致地改善所有基线的性能。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
人类智慧的主食是以不断的方式获取知识的能力。在Stark对比度下,深网络忘记灾难性,而且为此原因,类增量连续学习促进方法的子字段逐步学习一系列任务,将顺序获得的知识混合成综合预测。这项工作旨在评估和克服我们以前提案黑暗体验重播(Der)的陷阱,这是一种简单有效的方法,将排练和知识蒸馏结合在一起。灵感来自于我们的思想不断重写过去的回忆和对未来的期望,我们赋予了我的能力,即我的能力来修改其重播记忆,以欢迎有关过去数据II的新信息II)为学习尚未公开的课程铺平了道路。我们表明,这些策略的应用导致了显着的改进;实际上,得到的方法 - 被称为扩展-DAR(X-DER) - 优于标准基准(如CiFar-100和MiniimAgeNet)的技术状态,并且这里引入了一个新颖的。为了更好地了解,我们进一步提供了广泛的消融研究,以证实并扩展了我们以前研究的结果(例如,在持续学习设置中知识蒸馏和漂流最小值的价值)。
translated by 谷歌翻译
我们探索无任务持续学习(CL),其中培训模型以避免在没有明确的任务边界或身份的情况下造成灾难性的遗忘。在无任务CL上的许多努力中,一个值得注意的方法是基于内存的,存储和重放训练示例的子集。然而,由于CL模型不断更新,所以存储的示例的效用可以随时间缩短。这里,我们提出基于梯度的存储器编辑(GMED),该框架是通过梯度更新在连续输入空间中编辑存储的示例的框架,以便为重放创建更多的“具有挑战性”示例。 GMED编辑的例子仍然类似于其未编辑的形式,但可以在即将到来的模型更新中产生增加的损失,从而使未来的重播在克服灾难性遗忘方面更有效。通过施工,GMED可以与其他基于内存的CL算法一起无缝应用,以进一步改进。实验验证了GMED的有效性,以及我们最好的方法显着优于基线和以前的五个数据集中的最先进。可以在https://github.com/ink-usc/gmed找到代码。
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
根据互补学习系统(CLS)理论〜\ cite {mcclelland1995there}在神经科学中,人类通过两个补充系统有效\ emph {持续学习}:一种快速学习系统,以海马为中心,用于海马,以快速学习细节,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验的快速学习, ;以及位于新皮层中的缓慢学习系统,以逐步获取有关环境的结构化知识。在该理论的激励下,我们提出\ emph {dualnets}(对于双网络),这是一个一般的持续学习框架,该框架包括一个快速学习系统,用于监督从特定任务和慢速学习系统中的模式分离代表学习,用于表示任务的慢学习系统 - 不可知论的一般代表通过自我监督学习(SSL)。双网符可以无缝地将两种表示类型纳入整体框架中,以促进在深层神经网络中更好地持续学习。通过广泛的实验,我们在各种持续的学习协议上展示了双网络的有希望的结果,从标准离线,任务感知设置到具有挑战性的在线,无任务的场景。值得注意的是,在Ctrl〜 \ Cite {veniat2020202020202020202020202020202020202020202020202020202020202021- coite {ostapenko2021-continual}的基准中。此外,我们进行了全面的消融研究,以验证双nets功效,鲁棒性和可伸缩性。代码可在\ url {https://github.com/phquang/dualnet}上公开获得。
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
在线持续学习是一个充满挑战的学习方案,模型必须从非平稳的数据流中学习,其中每个样本只能看到一次。主要的挑战是在避免灾难性遗忘的同时逐步学习,即在从新数据中学习时忘记先前获得的知识的问题。在这种情况下,一种流行的解决方案是使用较小的内存来保留旧数据并随着时间的推移进行排练。不幸的是,由于内存尺寸有限,随着时间的推移,内存的质量会恶化。在本文中,我们提出了OLCGM,这是一种基于新型重放的持续学习策略,该策略使用知识冷凝技术连续压缩记忆并更好地利用其有限的尺寸。样品冷凝步骤压缩了旧样品,而不是像其他重播策略那样将其删除。结果,实验表明,每当与数据的复杂性相比,每当记忆预算受到限制,OLCGM都会提高与最先进的重播策略相比的最终准确性。
translated by 谷歌翻译
Continual Learning (CL) is an emerging machine learning paradigm that aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. To avoid performance decrease caused by forgetting, prior studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. Despite the promising results, since CL is often assumed to execute on mobile or IoT devices, the EM size is bounded by the small hardware memory capacity and makes it infeasible to meet the accuracy requirements for real-world applications. Specifically, all prior CL methods discard samples overflowed from the EM and can never retrieve them back for subsequent training steps, incurring loss of information that would exacerbate catastrophic forgetting. We explore a novel hierarchical EM management strategy to address the forgetting issue. In particular, in mobile and IoT devices, real-time data can be stored not just in high-speed RAMs but in internal storage devices as well, which offer significantly larger capacity than the RAMs. Based on this insight, we propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage without being interfered by the slow access speed of the storage. We call it Carousel Memory (CarM). As CarM is complementary to existing CL methods, we conduct extensive evaluations of our method with seven popular CL methods and show that CarM significantly improves the accuracy of the methods across different settings by large margins in final average accuracy (up to 28.4%) while retaining the same training efficiency.
translated by 谷歌翻译
One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.
translated by 谷歌翻译
持续学习(CL)依次学习像人类这样的新任务,其目标是实现更好的稳定性(S,记住过去的任务)和可塑性(P,适应新任务)。由于过去的培训数据不可用,因此探索培训示例中S和P的影响差异很有价值,这可能会改善对更好的SP的学习模式。受影响函数的启发(如果),我们首先研究了示例通过添加扰动来示例体重和计算影响推导的影响。为了避免在神经网络中Hessian逆的存储和计算负担,我们提出了一种简单而有效的METASP算法,以模拟IF计算中的两个关键步骤,并获得S-和P-Aware示例的影响。此外,我们建议通过解决双目标优化问题来融合两种示例影响,并获得对SP Pareto最优性的融合影响。融合影响可用于控制模型的更新并优化排练的存储。经验结果表明,我们的算法在任务和类别基准CL数据集上都显着优于最先进的方法。
translated by 谷歌翻译
持续学习的现有工作(CL)的重点是减轻灾难性遗忘,即学习新任务时过去任务的模型绩效恶化。但是,CL系统的训练效率不足,这限制了CL系统在资源有限的方案下的现实应用。在这项工作中,我们提出了一个名为“稀疏持续学习”(SPARCL)的新颖框架,这是第一个利用稀疏性以使边缘设备上具有成本效益的持续学习的研究。 SPARCL通过三个方面的协同作用来实现训练加速度和准确性保护:体重稀疏性,数据效率和梯度稀疏性。具体而言,我们建议在整个CL过程中学习一个稀疏网络,动态数据删除(DDR),以删除信息较少的培训数据和动态梯度掩盖(DGM),以稀疏梯度更新。他们每个人不仅提高了效率,而且进一步减轻了灾难性的遗忘。 SPARCL始终提高现有最新CL方法(SOTA)CL方法的训练效率最多减少了训练失败,而且令人惊讶的是,SOTA的准确性最多最多提高了1.7%。 SPARCL还优于通过将SOTA稀疏训练方法适应CL设置的效率和准确性获得的竞争基线。我们还评估了SPARCL在真实手机上的有效性,进一步表明了我们方法的实际潜力。
translated by 谷歌翻译