在课堂增量学习(CIL)设置中,在每个学习阶段将类别组引入模型。目的是学习到目前为止观察到的所有类别的统一模型表现。鉴于视觉变压器(VIT)在常规分类设置中的最新流行,一个有趣的问题是研究其持续学习行为。在这项工作中,我们为CIL开发了一个伪造的双蒸馏变压器,称为$ \ textrm {d}^3 \ textrm {前} $。提出的模型利用混合嵌套的VIT设计,以确保数据效率和可扩展性对小数据集和大数据集。与最近的基于VIT的CIL方法相反,我们的$ \ textrm {d}^3 \ textrm {前} $在学习新任务并仍然适用于大量增量任务时不会动态扩展其体系结构。 $ \ textrm {d}^3 \ textrm {oft} $的CIL行为的改善归功于VIT设计的两个基本变化。首先,我们将增量学习视为一个长尾分类问题,其中大多数新课程的大多数样本都超过了可用于旧课程的有限范例。为了避免对少数族裔的偏见,我们建议动态调整逻辑,以强调保留与旧任务相关的表示形式。其次,我们建议在学习跨任务进行时保留空间注意图的配置。这有助于减少灾难性遗忘,通过限制模型以将注意力保留到最歧视区域上。 $ \ textrm {d}^3 \ textrm {以前} $在CIFAR-100,MNIST,SVHN和Imagenet数据集的增量版本上获得了有利的结果。
translated by 谷歌翻译
本文在课堂增量学习中使用视觉变压器(VIT)研究。令人惊讶的是,天真地应用Vit替代卷积神经网络(CNNS)导致性能下降。我们的分析揭示了三个天然使用VIT的问题:(a)vit在课程中较小时具有非常缓慢的会聚,(b)在比CNN的模型中观察到新类的更多偏差,并且(c)适当的学习率Vit太低,无法学习良好的分类器。基于此分析,我们展示了这些问题可以简单地通过使用现有技术来解决:使用卷积杆,平衡FineTuning来纠正偏置,以及分类器的更高学习率。我们的简单解决方案名为Vitil(Vit用于增量学习),为所有三类增量学习设置实现了全新的最先进的保证金,为研究界提供了强大的基线。例如,在ImageNet-1000上,我们的体内体达到69.20%的前1个精度为500个初始类别的15个初始类别,5个增量步骤(每次100个新类),表现优于leulir + dde ​​1.69%。对于10个增量步骤(100个新课程)的更具挑战性的协议,我们的方法优于PODNet 7.27%(65.13%与57.86%)。
translated by 谷歌翻译
深网络架构在不忘记以前的任务的情况下努力继续学习新任务。最近的趋势表明,基于参数扩展的动态架构可以在持续学习中有效地减少灾难性忘记。但是,现有方法通常需要在测试时需要任务标识符,需要复杂调整以平衡越来越多的参数,并且几乎不在任务中共享任何信息。结果,他们努力扩展到大量任务,而无需显着开销。在本文中,我们提出了一种基于专用编码器/解码器框架的变压器体系结构。批判性地,编码器和解码器在所有任务中共享。通过特殊令牌的动态扩展,我们专注于任务分发的解码器网络的各个向前。由于严格控制参数扩展,我们的策略缩小到大量任务,同时具有可忽略的内存和时间开销。此外,这种有效的策略不需要任何HyperParameter调整来控制网络的扩展。我们的模型在大型ImageNet100和ImageNet100上达到了Cifar100和最先进的表演,而参数比并发动态框架的参数越小。
translated by 谷歌翻译
在课堂学习学习中,预计该模型将在保持以前课程的知识的同时,不断地学习新课程。这里的挑战在于保留该模型在功能空间中有效代表先前类的能力,同时调整其代表传入的新类。我们提出了两个基于蒸馏的目标,用于类增量学习,以利用特征空间的结构来维持以前的课程的准确性,并使学习新课程。在我们的第一个目标(称为跨空间聚类(CSC))中,我们建议使用先前模型的特征空间结构来表征优化的方向,这些方向可以最大程度地保留类 - 特定类的所有实例应集体优化,对,以及他们应该集体优化的人。除了最大程度地减少忘记之外,这种间接的鼓励模型将所有类的实例聚集在当前功能空间中,并引起牛群免疫的感觉,从而使班级的所有样本都可以将模型共同与遗忘班级共同打击模型。我们的第二个目标被称为受控转移(CT)从研究班间转移的研究的逐步学习。 CT明确近似于和条件,当前模型在逐步到达类和先验类之间的语义相似性上。这使模型可以学习类,以使其从相似的先前类中最大化正向转移,从而提高可塑性,并最大程度地减少不同先验类别的负向后转移,从而增强稳定性。我们在两个基准数据集上执行了广泛的实验,并在三种突出的课堂学习方法的顶部添加了我们的方法(CSCCT)。我们观察到各种实验环境的性能一致。
translated by 谷歌翻译
深入学习模型遭受较旧阶段中课程的灾难性遗忘,因为它们在类增量学习设置中新阶段所引入的课程中受过培训。在这项工作中,我们表明灾难性忘记对模型预测的影响随着相同图像的方向的变化而变化,这是一种新的发现。基于此,我们提出了一种新的数据集合方法,该方法结合了图像的不同取向的预测,以帮助模型保留关于先前所见的类别的进一步信息,从而减少忘记模型预测的效果。但是,如果使用传统技术训练,我们无法直接使用数据集合方法。因此,我们还提出了一种新的双重增量学习框架,涉及共同培训网络,其中包括两个增量学习目标,即类渐进式学习目标以及我们提出的数据增量学习目标。在双增量学习框架中,每个图像属于两个类,即图像类(用于类增量学习)和方向类(用于数据增量学习)。在Class-Incremental学习中,每个新阶段都会引入一组新的类,并且模型无法从较旧阶段访问完整的培训数据。在我们提出的数据增量学习中,方向类在所有阶段保持相同,并且在类 - 增量学习中的新阶段引入的数据充当了这些方向类的新培训数据。我们经验证明双增量学习框架对数据集合方法至关重要。我们将拟议的课程逐步增量学习方法应用拟议方法,并经验表明我们的框架显着提高了这些方法的性能。
translated by 谷歌翻译
Data-Free Class Incremental Learning (DFCIL) aims to sequentially learn tasks with access only to data from the current one. DFCIL is of interest because it mitigates concerns about privacy and long-term storage of data, while at the same time alleviating the problem of catastrophic forgetting in incremental learning. In this work, we introduce robust saliency guidance for DFCIL and propose a new framework, which we call RObust Saliency Supervision (ROSS), for mitigating the negative effect of saliency drift. Firstly, we use a teacher-student architecture leveraging low-level tasks to supervise the model with global saliency. We also apply boundary-guided saliency to protect it from drifting across object boundaries at intermediate layers. Finally, we introduce a module for injecting and recovering saliency noise to increase robustness of saliency preservation. Our experiments demonstrate that our method can retain better saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet and ImageNet-Subset DFCIL benchmarks. Code will be made publicly available.
translated by 谷歌翻译
深度学习模型在逐步学习新任务时遭受灾难性遗忘。已经提出了增量学习,以保留旧课程的知识,同时学习识别新课程。一种典型的方法是使用一些示例来避免忘记旧知识。在这种情况下,旧类和新课之间的数据失衡是导致模型性能下降的关键问题。由于数据不平衡,已经设计了几种策略来纠正新类别的偏见。但是,他们在很大程度上依赖于新旧阶层之间偏见关系的假设。因此,它们不适合复杂的现实世界应用。在这项研究中,我们提出了一种假设不足的方法,即多粒性重新平衡(MGRB),以解决此问题。重新平衡方法用于减轻数据不平衡的影响;但是,我们从经验上发现,他们将拟合新的课程。为此,我们进一步设计了一个新颖的多晶正式化项,该项使模型还可以考虑除了重新平衡数据之外的类别的相关性。类层次结构首先是通过将语义或视觉上类似类分组来构建的。然后,多粒性正则化将单热标签向量转换为连续的标签分布,这反映了基于构造的类层次结构的目标类别和其他类之间的关系。因此,该模型可以学习类间的关系信息,这有助于增强新旧课程的学习。公共数据集和现实世界中的故障诊断数据集的实验结果验证了所提出的方法的有效性。
translated by 谷歌翻译
持续学习旨在快速,不断地从一系列任务中学习当前的任务。与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据扩展(MMKDDA)提出一个名为Meta学习更新的新框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了来自情节记忆和当前任务的样品,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们对四个基准数据集的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks -a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatialbased distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively. 5
translated by 谷歌翻译
持续学习旨在通过以在线学习方式利用过去获得的知识,同时能够在所有以前的任务上表现良好,从而学习一系列任务,这对人工智能(AI)系统至关重要,因此持续学习与传统学习模式相比,更适合大多数现实和复杂的应用方案。但是,当前的模型通常在每个任务上的类标签上学习一个通用表示基础,并选择有效的策略来避免灾难性的遗忘。我们假设,仅从获得的知识中选择相关且有用的零件比利用整个知识更有效。基于这一事实,在本文中,我们提出了一个新框架,名为“选择相关的在线持续学习知识(SRKOCL),该框架结合了一种额外的有效频道注意机制,以选择每个任务的特定相关知识。我们的模型还结合了经验重播和知识蒸馏,以避免灾难性的遗忘。最后,在不同的基准上进行了广泛的实验,竞争性实验结果表明,我们提出的SRKOCL是针对最先进的承诺方法。
translated by 谷歌翻译
General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods. Code is available at https://github.com/lijincm/CoCa.
translated by 谷歌翻译
Continually learning to segment more and more types of image regions is a desired capability for many intelligent systems. However, such continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning. While multiple knowledge distillation strategies originally for continual classification have been well adapted to continual semantic segmentation, they only consider transferring old knowledge based on the outputs from one or more layers of deep fully convolutional networks. Different from existing solutions, this study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements (Eg. pixels or small local regions) within each image which can capture both within-class and between-class knowledge. The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model. Considering that pixels belonging to the same class in each image often share similar visual properties, a class-specific region pooling is applied to provide more efficient relationship information for knowledge transfer. Extensive evaluations on multiple public benchmarks support that the proposed self-attention transfer method can further effectively alleviate the catastrophic forgetting issue, and its flexible combination with one or more widely adopted strategies significantly outperforms state-of-the-art solutions.
translated by 谷歌翻译
人类智慧的主食是以不断的方式获取知识的能力。在Stark对比度下,深网络忘记灾难性,而且为此原因,类增量连续学习促进方法的子字段逐步学习一系列任务,将顺序获得的知识混合成综合预测。这项工作旨在评估和克服我们以前提案黑暗体验重播(Der)的陷阱,这是一种简单有效的方法,将排练和知识蒸馏结合在一起。灵感来自于我们的思想不断重写过去的回忆和对未来的期望,我们赋予了我的能力,即我的能力来修改其重播记忆,以欢迎有关过去数据II的新信息II)为学习尚未公开的课程铺平了道路。我们表明,这些策略的应用导致了显着的改进;实际上,得到的方法 - 被称为扩展-DAR(X-DER) - 优于标准基准(如CiFar-100和MiniimAgeNet)的技术状态,并且这里引入了一个新颖的。为了更好地了解,我们进一步提供了广泛的消融研究,以证实并扩展了我们以前研究的结果(例如,在持续学习设置中知识蒸馏和漂流最小值的价值)。
translated by 谷歌翻译
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).
translated by 谷歌翻译
Conventionally, deep neural networks are trained offline, relying on a large dataset prepared in advance. This paradigm is often challenged in real-world applications, e.g. online services that involve continuous streams of incoming data. Recently, incremental learning receives increasing attention, and is considered as a promising solution to the practical challenges mentioned above. However, it has been observed that incremental learning is subject to a fundamental difficulty -catastrophic forgetting, namely adapting a model to new data often results in severe performance degradation on previous tasks or classes. Our study reveals that the imbalance between previous and new data is a crucial cause to this problem. In this work, we develop a new framework for incrementally learning a unified classifier, i.e. a classifier that treats both old and new classes uniformly. Specifically, we incorporate three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance. Experiments show that the proposed method can effectively rebalance the training process, thus obtaining superior performance compared to the existing methods. On CIFAR-100 and ImageNet, our method can reduce the classification errors by more than 6% and 13% respectively, under the incremental setting of 10 phases.
translated by 谷歌翻译
这项工作调查了持续学习(CL)与转移学习(TL)之间的纠缠。特别是,我们阐明了网络预训练的广泛应用,强调它本身受到灾难性遗忘的影响。不幸的是,这个问题导致在以后任务期间知识转移的解释不足。在此基础上,我们提出了转移而不忘记(TWF),这是在固定的经过预定的兄弟姐妹网络上建立的混合方法,该方法不断传播源域中固有的知识,通过层次损失项。我们的实验表明,TWF在各种设置上稳步优于其他CL方法,在各种数据集和不同的缓冲尺寸上,平均每种类型的精度增长了4.81%。
translated by 谷歌翻译
分层结构在最近的视觉变压器中很受欢迎,但是,它们需要复杂的设计和大规模的数据集。在本文中,我们探讨了在非重叠图像块上嵌套基本本地变压器的想法,并以分层方式聚合它们。我们发现块聚合函数在启用跨块非本地信息通信方面发挥着关键作用。此观察导致我们设计简化的架构,该架构需要在原始视觉变压器上更改次要代码。拟议的明智选择的设计的好处是三倍:(1)巢汇聚速度更快,需要更少的培训数据,以实现对图中的良好的概率和小型数据集如CiFAR; (2)在将关键思想扩展到图像生成时,巢导致强大的解码器,这是8美元\时代比以前的基于变压器的发电机更快; (3)我们展示通过我们设计中的这种嵌套层次结构解耦了特征学习和抽象过程,使得能够构建一种新的方法(命名的Gradcat),用于视觉解释学习模型。源代码可用https://github.com/google-research/nested-transformer。
translated by 谷歌翻译
在这个不断变化的世界中,必须不断学习新概念的能力。但是,深层神经网络在学习新类别时会遭受灾难性的遗忘。已经提出了许多减轻这种现象的作品,而其中大多数要么属于稳定性困境,要么陷入了过多的计算或储存开销。受到梯度增强算法的启发,以逐渐适应目标模型和上一个合奏模型之间的残差,我们提出了一种新颖的两阶段学习范式寄养物,使该模型能够适应新的类别。具体而言,我们首先动态扩展新模块,以适合原始模型的目标和输出之间的残差。接下来,我们通过有效的蒸馏策略删除冗余参数和特征尺寸,以维护单个骨干模型。我们在不同的设置下验证CIFAR-100和Imagenet-100/1000的方法寄养。实验结果表明,我们的方法实现了最先进的性能。代码可在以下网址获得:https://github.com/g-u-n/eccv22-foster。
translated by 谷歌翻译
The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting. However, task confusion is not well assessed within this framework, e.g., the discrepancy between classes of different tasks is not well learned (i.e., inter-task confusion, ITC), and certain priority is still given to the latest class batch (i.e., old-new confusion, ONC). We empirically validate the side effects of the two types of confusion. Meanwhile, a novel solution called Task Correlated Incremental Learning (TCIL) is proposed to encourage discriminative and fair feature utilization across tasks. TCIL performs a multi-level knowledge distillation to propagate knowledge learned from old tasks to the new one. It establishes information flow paths at both feature and logit levels, enabling the learning to be aware of old classes. Besides, attention mechanism and classifier re-scoring are applied to generate more fair classification scores. We conduct extensive experiments on CIFAR100 and ImageNet100 datasets. The results demonstrate that TCIL consistently achieves state-of-the-art accuracy. It mitigates both ITC and ONC, while showing advantages in battle with catastrophic forgetting even no rehearsal memory is reserved.
translated by 谷歌翻译