Data-Free Class Incremental Learning (DFCIL) aims to sequentially learn tasks with access only to data from the current one. DFCIL is of interest because it mitigates concerns about privacy and long-term storage of data, while at the same time alleviating the problem of catastrophic forgetting in incremental learning. In this work, we introduce robust saliency guidance for DFCIL and propose a new framework, which we call RObust Saliency Supervision (ROSS), for mitigating the negative effect of saliency drift. Firstly, we use a teacher-student architecture leveraging low-level tasks to supervise the model with global saliency. We also apply boundary-guided saliency to protect it from drifting across object boundaries at intermediate layers. Finally, we introduce a module for injecting and recovering saliency noise to increase robustness of saliency preservation. Our experiments demonstrate that our method can retain better saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet and ImageNet-Subset DFCIL benchmarks. Code will be made publicly available.
translated by 谷歌翻译
在课堂增量学习(CIL)设置中,在每个学习阶段将类别组引入模型。目的是学习到目前为止观察到的所有类别的统一模型表现。鉴于视觉变压器(VIT)在常规分类设置中的最新流行,一个有趣的问题是研究其持续学习行为。在这项工作中,我们为CIL开发了一个伪造的双蒸馏变压器,称为$ \ textrm {d}^3 \ textrm {前} $。提出的模型利用混合嵌套的VIT设计,以确保数据效率和可扩展性对小数据集和大数据集。与最近的基于VIT的CIL方法相反,我们的$ \ textrm {d}^3 \ textrm {前} $在学习新任务并仍然适用于大量增量任务时不会动态扩展其体系结构。 $ \ textrm {d}^3 \ textrm {oft} $的CIL行为的改善归功于VIT设计的两个基本变化。首先,我们将增量学习视为一个长尾分类问题,其中大多数新课程的大多数样本都超过了可用于旧课程的有限范例。为了避免对少数族裔的偏见,我们建议动态调整逻辑,以强调保留与旧任务相关的表示形式。其次,我们建议在学习跨任务进行时保留空间注意图的配置。这有助于减少灾难性遗忘,通过限制模型以将注意力保留到最歧视区域上。 $ \ textrm {d}^3 \ textrm {以前} $在CIFAR-100,MNIST,SVHN和Imagenet数据集的增量版本上获得了有利的结果。
translated by 谷歌翻译
Continual Learning is considered a key step toward next-generation Artificial Intelligence. Among various methods, replay-based approaches that maintain and replay a small episodic memory of previous samples are one of the most successful strategies against catastrophic forgetting. However, since forgetting is inevitable given bounded memory and unbounded tasks, how to forget is a problem continual learning must address. Therefore, beyond simply avoiding catastrophic forgetting, an under-explored issue is how to reasonably forget while ensuring the merits of human memory, including 1. storage efficiency, 2. generalizability, and 3. some interpretability. To achieve these simultaneously, our paper proposes a new saliency-augmented memory completion framework for continual learning, inspired by recent discoveries in memory completion separation in cognitive neuroscience. Specifically, we innovatively propose to store the part of the image most important to the tasks in episodic memory by saliency map extraction and memory encoding. When learning new tasks, previous data from memory are inpainted by an adaptive data generation module, which is inspired by how humans complete episodic memory. The module's parameters are shared across all tasks and it can be jointly trained with a continual learning classifier as bilevel optimization. Extensive experiments on several continual learning and image classification benchmarks demonstrate the proposed method's effectiveness and efficiency.
translated by 谷歌翻译
在学习新知识时,班级学习学习(CIL)与灾难性遗忘和无数据CIL(DFCIL)的斗争更具挑战性,而无需访问以前学过的课程的培训数据。尽管最近的DFCIL作品介绍了诸如模型反转以合成以前类的数据,但由于合成数据和真实数据之间的严重域间隙,它们无法克服遗忘。为了解决这个问题,本文提出了有关DFCIL的关系引导的代表学习(RRL),称为R-DFCIL。在RRL中,我们引入了关系知识蒸馏,以灵活地将新数据的结构关系从旧模型转移到当前模型。我们的RRL增强DFCIL可以指导当前的模型来学习与以前类的表示更好地兼容的新课程的表示,从而大大减少了在改善可塑性的同时遗忘。为了避免表示和分类器学习之间的相互干扰,我们在RRL期间采用本地分类损失而不是全球分类损失。在RRL之后,分类头将通过全球类平衡的分类损失进行完善,以解决数据不平衡问题,并学习新课程和以前类之间的决策界限。关于CIFAR100,Tiny-Imagenet200和Imagenet100的广泛实验表明,我们的R-DFCIL显着超过了以前的方法,并实现了DFCIL的新最新性能。代码可从https://github.com/jianzhangcs/r-dfcil获得。
translated by 谷歌翻译
本文介绍了类增量语义分割(CISS)问题的固态基线。虽然最近的CISS算法利用了知识蒸馏(KD)技术的变体来解决问题,但他们未能充分解决CISS引起灾难性遗忘的关键挑战;背景类的语义漂移和多标签预测问题。为了更好地解决这些挑战,我们提出了一种新方法,被称为SSUL-M(具有内存的未知标签的语义分割),通过仔细组合为语义分割量身定制的技术。具体来说,我们要求三项主要贡献。 (1)在背景课程中定义未知的类,以帮助学习未来的课程(帮助可塑性),(2)冻结骨干网以及与二进制交叉熵丢失和伪标签的跨熵丢失的分类器,以克服灾难性的遗忘(帮助稳定)和(3)首次利用微小的示例存储器在CISS中提高可塑性和稳定性。广泛进行的实验表明了我们的方法的有效性,而不是标准基准数据集上最近的最新的基线的性能明显更好。此外,与彻底的消融分析有关我们对彻底消融分析的贡献,并与传统的类增量学习针对分类相比,讨论了CISS问题的不同自然。官方代码可在https://github.com/clovaai/ssul获得。
translated by 谷歌翻译
在这个不断变化的世界中,必须不断学习新概念的能力。但是,深层神经网络在学习新类别时会遭受灾难性的遗忘。已经提出了许多减轻这种现象的作品,而其中大多数要么属于稳定性困境,要么陷入了过多的计算或储存开销。受到梯度增强算法的启发,以逐渐适应目标模型和上一个合奏模型之间的残差,我们提出了一种新颖的两阶段学习范式寄养物,使该模型能够适应新的类别。具体而言,我们首先动态扩展新模块,以适合原始模型的目标和输出之间的残差。接下来,我们通过有效的蒸馏策略删除冗余参数和特征尺寸,以维护单个骨干模型。我们在不同的设置下验证CIFAR-100和Imagenet-100/1000的方法寄养。实验结果表明,我们的方法实现了最先进的性能。代码可在以下网址获得:https://github.com/g-u-n/eccv22-foster。
translated by 谷歌翻译
Neural networks are prone to catastrophic forgetting when trained incrementally on different tasks. Popular incremental learning methods mitigate such forgetting by retaining a subset of previously seen samples and replaying them during the training on subsequent tasks. However, this is not always possible, e.g., due to data protection regulations. In such restricted scenarios, one can employ generative models to replay either artificial images or hidden features to a classifier. In this work, we propose Genifer (GENeratIve FEature-driven image Replay), where a generative model is trained to replay images that must induce the same hidden features as real samples when they are passed through the classifier. Our technique therefore incorporates the benefits of both image and feature replay, i.e.: (1) unlike conventional image replay, our generative model explicitly learns the distribution of features that are relevant for classification; (2) in contrast to feature replay, our entire classifier remains trainable; and (3) we can leverage image-space augmentations, which increase distillation performance while also mitigating overfitting during the training of the generative model. We show that Genifer substantially outperforms the previous state of the art for various settings on the CIFAR-100 and CUB-200 datasets.
translated by 谷歌翻译
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).
translated by 谷歌翻译
基于正规化的方法有利于缓解类渐进式学习中的灾难性遗忘问题。由于缺乏旧任务图像,如果分类器在新图像上产生类似的输出,它们通常会假设旧知识得到很好的保存。在本文中,我们发现他们的效果很大程度上取决于旧课程的性质:它们在彼此之间容易区分的课程上工作,但可能在更细粒度的群体上失败,例如,男孩和女孩。在SPIRIT中,此类方法将新数据项目投入到完全连接层中的权重向量中跨越的特征空间,对应于旧类。由此产生的预测在细粒度的旧课程上是相似的,因此,新分类器将逐步失去这些课程的歧视能力。为了解决这个问题,我们提出了一种无记忆生成的重播策略,通过直接从旧分类器生成代表性的旧图像并结合新的分类器培训的新数据来保留细粒度的旧阶级特征。为了解决所产生的样本的均化问题,我们还提出了一种分集体损失,使得产生的样品之间的Kullback Leibler(KL)发散。我们的方法最好是通过先前的基于正规化的方法补充,证明是为了易于区分的旧课程有效。我们验证了上述关于CUB-200-2011,CALTECH-101,CIFAR-100和微小想象的设计和见解,并表明我们的策略优于现有的无记忆方法,并具有清晰的保证金。代码可在https://github.com/xmengxin/mfgr获得
translated by 谷歌翻译
语义细分(CISS)的课堂学习学习目前是一个经过深入研究的领域,旨在通过依次学习新的语义类别来更新语义分割模型。 CISS中的一个主要挑战是克服灾难性遗忘的影响,这描述了在模型接受新的一组课程培训之后,先前学习的类的准确性突然下降。尽管在减轻灾难性遗忘方面取得了最新进展,但在CISS中特别遗忘的根本原因尚未得到很好的理解。因此,在一组实验和代表性分析中,我们证明了背景类别的语义转移和对新类别的偏见是忘记CISS的主要原因。此外,我们表明两者都在网络的更深层分类层中表现出来,而模型的早期层没有影响。最后,我们证明了如何利用背景中包含的信息在知识蒸馏和无偏见的跨透镜损失的帮助下有效地减轻两种原因。
translated by 谷歌翻译
持续深度学习的领域是一个新兴领域,已经取得了很多进步。但是,同时仅根据图像分类的任务进行了大多数方法,这在智能车辆领域无关。直到最近才提出了班级开展语义分割的方法。但是,所有这些方法都是基于某种形式的知识蒸馏。目前,尚未对基于重播的方法进行调查,这些方法通常在连续的环境中用于对象识别。同时,尽管无监督的语义分割的域适应性获得了很多吸引力,但在持续环境中有关域内收入学习的调查并未得到充分研究。因此,我们工作的目的是评估和调整已建立的解决方案,以连续对象识别语义分割任务,并为连续语义分割的任务提供基线方法和评估协议。首先,我们介绍了类和域内的分割的评估协议,并分析了选定的方法。我们表明,语义分割变化的任务的性质在减轻与图像分类相比最有效的方法中最有效。特别是,在课堂学习中,学习知识蒸馏被证明是至关重要的工具,而在域内,学习重播方法是最有效的方法。
translated by 谷歌翻译
虽然现有的语义分割方法实现令人印象深刻的结果,但它们仍然努力将其模型逐步更新,因为新类别被发现。此外,逐个像素注释昂贵且耗时。本文提出了一种新颖的对语义分割学习弱增量学习的框架,旨在学习从廉价和大部分可用的图像级标签进行新课程。与现有的方法相反,需要从下线生成伪标签,我们使用辅助分类器,用图像级标签培训并由分段模型规范化,在线获取伪监督并逐步更新模型。我们通过使用由辅助分类器生成的软标签来应对过程中的内在噪声。我们展示了我们对Pascal VOC和Coco数据集的方法的有效性,表现出离线弱监督方法,并获得了具有全面监督的增量学习方法的结果。
translated by 谷歌翻译
General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods. Code is available at https://github.com/lijincm/CoCa.
translated by 谷歌翻译
深入学习模型遭受较旧阶段中课程的灾难性遗忘,因为它们在类增量学习设置中新阶段所引入的课程中受过培训。在这项工作中,我们表明灾难性忘记对模型预测的影响随着相同图像的方向的变化而变化,这是一种新的发现。基于此,我们提出了一种新的数据集合方法,该方法结合了图像的不同取向的预测,以帮助模型保留关于先前所见的类别的进一步信息,从而减少忘记模型预测的效果。但是,如果使用传统技术训练,我们无法直接使用数据集合方法。因此,我们还提出了一种新的双重增量学习框架,涉及共同培训网络,其中包括两个增量学习目标,即类渐进式学习目标以及我们提出的数据增量学习目标。在双增量学习框架中,每个图像属于两个类,即图像类(用于类增量学习)和方向类(用于数据增量学习)。在Class-Incremental学习中,每个新阶段都会引入一组新的类,并且模型无法从较旧阶段访问完整的培训数据。在我们提出的数据增量学习中,方向类在所有阶段保持相同,并且在类 - 增量学习中的新阶段引入的数据充当了这些方向类的新培训数据。我们经验证明双增量学习框架对数据集合方法至关重要。我们将拟议的课程逐步增量学习方法应用拟议方法,并经验表明我们的框架显着提高了这些方法的性能。
translated by 谷歌翻译
深度神经网络在学习新任务时遭受灾难性遗忘的主要限制。在本文中,我们专注于语义细分中的课堂持续学习,其中新类别随着时间的推移,而在未保留以前的训练数据。建议的持续学习方案塑造了潜在的空间来减少遗忘,同时提高了对新型课程的识别。我们的框架是由三种新的组件驱动,我们还毫不费力地结合现有的技术。首先,匹配的原型匹配在旧类上强制执行潜在空间一致性,约束编码器在后续步骤中为先前看到的类生成类似的潜在潜在表示。其次,特征稀疏性允许在潜在空间中腾出空间以容纳新型课程。最后,根据他们的语义,在统一的同时撕裂不同类别的语义,对形成对比的学习。对Pascal VOC2012和ADE20K数据集的广泛评估展示了我们方法的有效性,显着优于最先进的方法。
translated by 谷歌翻译
这项工作调查了持续学习(CL)与转移学习(TL)之间的纠缠。特别是,我们阐明了网络预训练的广泛应用,强调它本身受到灾难性遗忘的影响。不幸的是,这个问题导致在以后任务期间知识转移的解释不足。在此基础上,我们提出了转移而不忘记(TWF),这是在固定的经过预定的兄弟姐妹网络上建立的混合方法,该方法不断传播源域中固有的知识,通过层次损失项。我们的实验表明,TWF在各种设置上稳步优于其他CL方法,在各种数据集和不同的缓冲尺寸上,平均每种类型的精度增长了4.81%。
translated by 谷歌翻译
最近的自我监督学习方法能够学习高质量的图像表示,并通过监督方法关闭差距。但是,这些方法无法逐步获取新的知识 - 事实上,它们实际上主要仅用为具有IID数据的预训练阶段。在这项工作中,我们在没有额外的记忆或重放的情况下调查持续学习制度的自我监督方法。为防止忘记以前的知识,我们提出了功能正规化的使用。我们将表明,朴素的功能正则化,也称为特征蒸馏,导致可塑性的低可塑性,因此严重限制了连续的学习性能。为了解决这个问题,我们提出了预测的功能正则化,其中一个单独的投影网络确保新学习的特征空间保留了先前的特征空间的信息,同时允许学习新功能。这使我们可以防止在保持学习者的可塑性时忘记。针对应用于自我监督的其他增量学习方法的评估表明我们的方法在不同场景和多个数据集中获得竞争性能。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks -a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatialbased distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively. 5
translated by 谷歌翻译
在课堂学习学习中,预计该模型将在保持以前课程的知识的同时,不断地学习新课程。这里的挑战在于保留该模型在功能空间中有效代表先前类的能力,同时调整其代表传入的新类。我们提出了两个基于蒸馏的目标,用于类增量学习,以利用特征空间的结构来维持以前的课程的准确性,并使学习新课程。在我们的第一个目标(称为跨空间聚类(CSC))中,我们建议使用先前模型的特征空间结构来表征优化的方向,这些方向可以最大程度地保留类 - 特定类的所有实例应集体优化,对,以及他们应该集体优化的人。除了最大程度地减少忘记之外,这种间接的鼓励模型将所有类的实例聚集在当前功能空间中,并引起牛群免疫的感觉,从而使班级的所有样本都可以将模型共同与遗忘班级共同打击模型。我们的第二个目标被称为受控转移(CT)从研究班间转移的研究的逐步学习。 CT明确近似于和条件,当前模型在逐步到达类和先验类之间的语义相似性上。这使模型可以学习类,以使其从相似的先前类中最大化正向转移,从而提高可塑性,并最大程度地减少不同先验类别的负向后转移,从而增强稳定性。我们在两个基准数据集上执行了广泛的实验,并在三种突出的课堂学习方法的顶部添加了我们的方法(CSCCT)。我们观察到各种实验环境的性能一致。
translated by 谷歌翻译