Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks -a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatialbased distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively. 5
translated by 谷歌翻译
在学习新知识时,班级学习学习(CIL)与灾难性遗忘和无数据CIL(DFCIL)的斗争更具挑战性,而无需访问以前学过的课程的培训数据。尽管最近的DFCIL作品介绍了诸如模型反转以合成以前类的数据,但由于合成数据和真实数据之间的严重域间隙,它们无法克服遗忘。为了解决这个问题,本文提出了有关DFCIL的关系引导的代表学习(RRL),称为R-DFCIL。在RRL中,我们引入了关系知识蒸馏,以灵活地将新数据的结构关系从旧模型转移到当前模型。我们的RRL增强DFCIL可以指导当前的模型来学习与以前类的表示更好地兼容的新课程的表示,从而大大减少了在改善可塑性的同时遗忘。为了避免表示和分类器学习之间的相互干扰,我们在RRL期间采用本地分类损失而不是全球分类损失。在RRL之后,分类头将通过全球类平衡的分类损失进行完善,以解决数据不平衡问题,并学习新课程和以前类之间的决策界限。关于CIFAR100,Tiny-Imagenet200和Imagenet100的广泛实验表明,我们的R-DFCIL显着超过了以前的方法,并实现了DFCIL的新最新性能。代码可从https://github.com/jianzhangcs/r-dfcil获得。
translated by 谷歌翻译
Conventionally, deep neural networks are trained offline, relying on a large dataset prepared in advance. This paradigm is often challenged in real-world applications, e.g. online services that involve continuous streams of incoming data. Recently, incremental learning receives increasing attention, and is considered as a promising solution to the practical challenges mentioned above. However, it has been observed that incremental learning is subject to a fundamental difficulty -catastrophic forgetting, namely adapting a model to new data often results in severe performance degradation on previous tasks or classes. Our study reveals that the imbalance between previous and new data is a crucial cause to this problem. In this work, we develop a new framework for incrementally learning a unified classifier, i.e. a classifier that treats both old and new classes uniformly. Specifically, we incorporate three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance. Experiments show that the proposed method can effectively rebalance the training process, thus obtaining superior performance compared to the existing methods. On CIFAR-100 and ImageNet, our method can reduce the classification errors by more than 6% and 13% respectively, under the incremental setting of 10 phases.
translated by 谷歌翻译
深度学习模型在逐步学习新任务时遭受灾难性遗忘。已经提出了增量学习,以保留旧课程的知识,同时学习识别新课程。一种典型的方法是使用一些示例来避免忘记旧知识。在这种情况下,旧类和新课之间的数据失衡是导致模型性能下降的关键问题。由于数据不平衡,已经设计了几种策略来纠正新类别的偏见。但是,他们在很大程度上依赖于新旧阶层之间偏见关系的假设。因此,它们不适合复杂的现实世界应用。在这项研究中,我们提出了一种假设不足的方法,即多粒性重新平衡(MGRB),以解决此问题。重新平衡方法用于减轻数据不平衡的影响;但是,我们从经验上发现,他们将拟合新的课程。为此,我们进一步设计了一个新颖的多晶正式化项,该项使模型还可以考虑除了重新平衡数据之外的类别的相关性。类层次结构首先是通过将语义或视觉上类似类分组来构建的。然后,多粒性正则化将单热标签向量转换为连续的标签分布,这反映了基于构造的类层次结构的目标类别和其他类之间的关系。因此,该模型可以学习类间的关系信息,这有助于增强新旧课程的学习。公共数据集和现实世界中的故障诊断数据集的实验结果验证了所提出的方法的有效性。
translated by 谷歌翻译
The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting. However, task confusion is not well assessed within this framework, e.g., the discrepancy between classes of different tasks is not well learned (i.e., inter-task confusion, ITC), and certain priority is still given to the latest class batch (i.e., old-new confusion, ONC). We empirically validate the side effects of the two types of confusion. Meanwhile, a novel solution called Task Correlated Incremental Learning (TCIL) is proposed to encourage discriminative and fair feature utilization across tasks. TCIL performs a multi-level knowledge distillation to propagate knowledge learned from old tasks to the new one. It establishes information flow paths at both feature and logit levels, enabling the learning to be aware of old classes. Besides, attention mechanism and classifier re-scoring are applied to generate more fair classification scores. We conduct extensive experiments on CIFAR100 and ImageNet100 datasets. The results demonstrate that TCIL consistently achieves state-of-the-art accuracy. It mitigates both ITC and ONC, while showing advantages in battle with catastrophic forgetting even no rehearsal memory is reserved.
translated by 谷歌翻译
在课堂学习学习中,预计该模型将在保持以前课程的知识的同时,不断地学习新课程。这里的挑战在于保留该模型在功能空间中有效代表先前类的能力,同时调整其代表传入的新类。我们提出了两个基于蒸馏的目标,用于类增量学习,以利用特征空间的结构来维持以前的课程的准确性,并使学习新课程。在我们的第一个目标(称为跨空间聚类(CSC))中,我们建议使用先前模型的特征空间结构来表征优化的方向,这些方向可以最大程度地保留类 - 特定类的所有实例应集体优化,对,以及他们应该集体优化的人。除了最大程度地减少忘记之外,这种间接的鼓励模型将所有类的实例聚集在当前功能空间中,并引起牛群免疫的感觉,从而使班级的所有样本都可以将模型共同与遗忘班级共同打击模型。我们的第二个目标被称为受控转移(CT)从研究班间转移的研究的逐步学习。 CT明确近似于和条件,当前模型在逐步到达类和先验类之间的语义相似性上。这使模型可以学习类,以使其从相似的先前类中最大化正向转移,从而提高可塑性,并最大程度地减少不同先验类别的负向后转移,从而增强稳定性。我们在两个基准数据集上执行了广泛的实验,并在三种突出的课堂学习方法的顶部添加了我们的方法(CSCCT)。我们观察到各种实验环境的性能一致。
translated by 谷歌翻译
当在新的类或新任务上逐步训练时,深度神经网络易于灾难性遗忘,因为对新数据的适应导致旧课程和任务的性能急剧下降。通过使用小记忆进行排练和知识蒸馏,已证明最近的方法可有效缓解灾难性的遗忘。然而,由于内存的尺寸有限,旧的和新类可用的数据量之间的大不平衡仍然存在,这导致模型的整体精度恶化。为了解决这个问题,我们建议使用平衡的软制跨熵损失,并表明它可以与进出的方法相结合,以便在某些情况下降低培训过程的计算成本,以提高其性能。对竞争的想象,Subimagenet和CiFar100数据集的实验显示了最艺术态度的结果。
translated by 谷歌翻译
深网络架构在不忘记以前的任务的情况下努力继续学习新任务。最近的趋势表明,基于参数扩展的动态架构可以在持续学习中有效地减少灾难性忘记。但是,现有方法通常需要在测试时需要任务标识符,需要复杂调整以平衡越来越多的参数,并且几乎不在任务中共享任何信息。结果,他们努力扩展到大量任务,而无需显着开销。在本文中,我们提出了一种基于专用编码器/解码器框架的变压器体系结构。批判性地,编码器和解码器在所有任务中共享。通过特殊令牌的动态扩展,我们专注于任务分发的解码器网络的各个向前。由于严格控制参数扩展,我们的策略缩小到大量任务,同时具有可忽略的内存和时间开销。此外,这种有效的策略不需要任何HyperParameter调整来控制网络的扩展。我们的模型在大型ImageNet100和ImageNet100上达到了Cifar100和最先进的表演,而参数比并发动态框架的参数越小。
translated by 谷歌翻译
在课堂增量学习(CIL)设置中,在每个学习阶段将类别组引入模型。目的是学习到目前为止观察到的所有类别的统一模型表现。鉴于视觉变压器(VIT)在常规分类设置中的最新流行,一个有趣的问题是研究其持续学习行为。在这项工作中,我们为CIL开发了一个伪造的双蒸馏变压器,称为$ \ textrm {d}^3 \ textrm {前} $。提出的模型利用混合嵌套的VIT设计,以确保数据效率和可扩展性对小数据集和大数据集。与最近的基于VIT的CIL方法相反,我们的$ \ textrm {d}^3 \ textrm {前} $在学习新任务并仍然适用于大量增量任务时不会动态扩展其体系结构。 $ \ textrm {d}^3 \ textrm {oft} $的CIL行为的改善归功于VIT设计的两个基本变化。首先,我们将增量学习视为一个长尾分类问题,其中大多数新课程的大多数样本都超过了可用于旧课程的有限范例。为了避免对少数族裔的偏见,我们建议动态调整逻辑,以强调保留与旧任务相关的表示形式。其次,我们建议在学习跨任务进行时保留空间注意图的配置。这有助于减少灾难性遗忘,通过限制模型以将注意力保留到最歧视区域上。 $ \ textrm {d}^3 \ textrm {以前} $在CIFAR-100,MNIST,SVHN和Imagenet数据集的增量版本上获得了有利的结果。
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
持续学习旨在快速,不断地从一系列任务中学习当前的任务。与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据扩展(MMKDDA)提出一个名为Meta学习更新的新框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了来自情节记忆和当前任务的样品,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们对四个基准数据集的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。
translated by 谷歌翻译
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).
translated by 谷歌翻译
当自我监督的模型已经显示出比在规模上未标记的数据训练的情况下的监督对方的可比视觉表现。然而,它们的功效在持续的学习(CL)场景中灾难性地减少,其中数据被顺序地向模型呈现给模型。在本文中,我们表明,通过添加将表示的当前状态映射到其过去状态,可以通过添加预测的网络来无缝地转换为CL的蒸馏机制。这使我们能够制定一个持续自我监督的视觉表示的框架,学习(i)显着提高了学习象征的质量,(ii)与若干最先进的自我监督目标兼容(III)几乎没有近似参数调整。我们通过在各种CL设置中培训六种受欢迎的自我监督模型来证明我们的方法的有效性。
translated by 谷歌翻译
A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a classincremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively.iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail.
translated by 谷歌翻译
视觉世界中新对象的不断出现对现实世界部署中当前的深度学习方法构成了巨大的挑战。由于稀有性或成本,新任务学习的挑战通常会加剧新类别的数据。在这里,我们探讨了几乎没有类别学习的重要任务(FSCIL)及其极端数据稀缺条件。理想的FSCIL模型都需要在所有类别上表现良好,无论其显示顺序或数据的匮乏。开放式现实世界条件也需要健壮,并可以轻松地适应始终在现场出现的新任务。在本文中,我们首先重新评估当前的任务设置,并为FSCIL任务提出更全面和实用的设置。然后,受到FSCIL和现代面部识别系统目标的相似性的启发,我们提出了我们的方法 - 增强角损失渐进分类或爱丽丝。在爱丽丝(Alice)中,我们建议使用角度损失损失来获得良好的特征。由于所获得的功能不仅需要紧凑,而且还需要足够多样化以维持未来的增量类别的概括,我们进一步讨论了类增强,数据增强和数据平衡如何影响分类性能。在包括CIFAR100,Miniimagenet和Cub200在内的基准数据集上的实验证明了爱丽丝在最新的FSCIL方法上的性能提高。
translated by 谷歌翻译
课堂学习学习需要可塑性和稳定性,以便在保留过去的知识的同时从新数据中学习。由于灾难性的遗忘,当没有内存缓冲区可用时,在这两个属性之间找到妥协尤其具有挑战性。主流方法需要存储两个深层模型,因为它们使用微调与以前的增量状态的知识蒸馏一起整合了新类。我们提出了一种具有相似数量参数但分布不同的方法,以便在可塑性和稳定性之间找到更好的平衡。遵循已经通过基于转移的增量方法部署的方法,我们在初始状态后冻结了功能提取器。最古老的增量状态的类对这种冷冻提取器进行训练,以确保稳定性。使用部分微调模型预测最近的类别以引入可塑性。我们提出的可塑性层可以纳入任何用于无内存增量学习的基于转移的方法,并将其应用于两种此类方法。评估是通过三个大型数据集进行的。结果表明,与现有方法相比,所有测试的配置中均获得了性能提高。
translated by 谷歌翻译
Many modern computer vision algorithms suffer from two major bottlenecks: scarcity of data and learning new tasks incrementally. While training the model with new batches of data the model looses it's ability to classify the previous data judiciously which is termed as catastrophic forgetting. Conventional methods have tried to mitigate catastrophic forgetting of the previously learned data while the training at the current session has been compromised. The state-of-the-art generative replay based approaches use complicated structures such as generative adversarial network (GAN) to deal with catastrophic forgetting. Additionally, training a GAN with few samples may lead to instability. In this work, we present a novel method to deal with these two major hurdles. Our method identifies a better embedding space with an improved contrasting loss to make classification more robust. Moreover, our approach is able to retain previously acquired knowledge in the embedding space even when trained with new classes. We update previous session class prototypes while training in such a way that it is able to represent the true class mean. This is of prime importance as our classification rule is based on the nearest class mean classification strategy. We have demonstrated our results by showing that the embedding space remains intact after training the model with new classes. We showed that our method preformed better than the existing state-of-the-art algorithms in terms of accuracy across different sessions.
translated by 谷歌翻译
持续学习旨在通过以在线学习方式利用过去获得的知识,同时能够在所有以前的任务上表现良好,从而学习一系列任务,这对人工智能(AI)系统至关重要,因此持续学习与传统学习模式相比,更适合大多数现实和复杂的应用方案。但是,当前的模型通常在每个任务上的类标签上学习一个通用表示基础,并选择有效的策略来避免灾难性的遗忘。我们假设,仅从获得的知识中选择相关且有用的零件比利用整个知识更有效。基于这一事实,在本文中,我们提出了一个新框架,名为“选择相关的在线持续学习知识(SRKOCL),该框架结合了一种额外的有效频道注意机制,以选择每个任务的特定相关知识。我们的模型还结合了经验重播和知识蒸馏,以避免灾难性的遗忘。最后,在不同的基准上进行了广泛的实验,竞争性实验结果表明,我们提出的SRKOCL是针对最先进的承诺方法。
translated by 谷歌翻译
很少有课堂学习(FSCIL)旨在仅用几个样本不断学习新概念,这很容易遭受灾难性的遗忘和过度拟合的问题。旧阶级的无法获得性和新颖样本的稀缺性使实现保留旧知识和学习新颖概念之间的权衡很大。受到不同模型的启发,我们在学习新颖概念时记住了不同的知识,我们提出了一个记忆的补充网络(MCNET),以整合多个模型,以在新任务中相互补充不同的记忆知识。此外,为了用很少的新样本更新模型,我们开发了一个原型平滑的硬矿三元组(PSHT)损失,以将新型样品不仅在当前任务中彼此远离,而且在旧分布中脱颖而出。在三个基准数据集(例如CIFAR100,Miniimagenet和Cub200)上进行了广泛的实验,证明了我们提出的方法的优势。
translated by 谷歌翻译