示例性类增量学习需要分类模型来逐步学习新的类知识,而无需保留任何旧样本。最近,基于并行单级分类器(POC)的框架,它为每个类别独立地列举单级分类器(OCC),引起了广泛的关注,因为它可以自然避免灾难性的遗忘。然而,由于其不同OOC的独立培训策略,POC遭受了弱歧视性和可比性。为满足这一挑战,我们提出了一个新的框架,命名为判别和可比单级分类器,用于增量学习(Discoil)。 Discoil遵循POC的基本原理,但它采用变分自动编码器(VAE)而不是其他良好的一流的单级分类器(例如,深度SVDD),因为训练VAE不仅可以识别属于输入样本的概率一个班级,但也会生成课程的伪样本,以协助学习新任务。通过这种优势,与旧级别的VAE相比,Discoil列举了一个新的VAE,这迫使新级VAE为新级样本重建,但对于旧级伪样本更糟糕,从而提高了可比性。此外,Discoil引入了铰链重建损失以确保辨别性。我们在MNIST,CIFAR10和TINY-ImageNet中广泛评估我们的方法。实验结果表明,Discoil实现了最先进的性能。
translated by 谷歌翻译
General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods. Code is available at https://github.com/lijincm/CoCa.
translated by 谷歌翻译
基于正规化的方法有利于缓解类渐进式学习中的灾难性遗忘问题。由于缺乏旧任务图像,如果分类器在新图像上产生类似的输出,它们通常会假设旧知识得到很好的保存。在本文中,我们发现他们的效果很大程度上取决于旧课程的性质:它们在彼此之间容易区分的课程上工作,但可能在更细粒度的群体上失败,例如,男孩和女孩。在SPIRIT中,此类方法将新数据项目投入到完全连接层中的权重向量中跨越的特征空间,对应于旧类。由此产生的预测在细粒度的旧课程上是相似的,因此,新分类器将逐步失去这些课程的歧视能力。为了解决这个问题,我们提出了一种无记忆生成的重播策略,通过直接从旧分类器生成代表性的旧图像并结合新的分类器培训的新数据来保留细粒度的旧阶级特征。为了解决所产生的样本的均化问题,我们还提出了一种分集体损失,使得产生的样品之间的Kullback Leibler(KL)发散。我们的方法最好是通过先前的基于正规化的方法补充,证明是为了易于区分的旧课程有效。我们验证了上述关于CUB-200-2011,CALTECH-101,CIFAR-100和微小想象的设计和见解,并表明我们的策略优于现有的无记忆方法,并具有清晰的保证金。代码可在https://github.com/xmengxin/mfgr获得
translated by 谷歌翻译
由于其非参数化干扰和灾难性遗忘的非参数化能力,核心连续学习\ Cite {derakhshani2021kernel}最近被成为一个强大的持续学习者。不幸的是,它的成功是以牺牲一个明确的内存为代价来存储来自过去任务的样本,这妨碍了具有大量任务的连续学习设置的可扩展性。在本文中,我们介绍了生成的内核持续学习,探讨了生成模型与内核之间的协同作用以进行持续学习。生成模型能够生产用于内核学习的代表性样本,其消除了在内核持续学习中对内存的依赖性。此外,由于我们仅在生成模型上重播,我们避免了与在整个模型上需要重播的先前的方法相比,在计算上更有效的情况下避免任务干扰。我们进一步引入了监督的对比正规化,使我们的模型能够为更好的基于内核的分类性能产生更具辨别性样本。我们对三种广泛使用的连续学习基准进行了广泛的实验,展示了我们贡献的能力和益处。最值得注意的是,在具有挑战性的SplitCifar100基准测试中,只需一个简单的线性内核,我们获得了与内核连续学习的相同的准确性,对于内存的十分之一,或者对于相同的内存预算的10.1%的精度增益。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
由于推断,数据表示和重建属性,变异自动编码器(VAE)已成功地用于连续学习分类任务。但是,它们具有与持续学习过程中学到的类和数据库相对应的规格生成图像的能力(CL)尚不清楚,而灾难性遗忘仍然是一个重大挑战。在本文中,我们首先通过开发一个将CL作为动态最佳传输问题制定的新理论框架来分析VAE的遗忘行为。该框架证明了与数据可能性相似的范围,而无需任务信息,并解释了在培训过程中如何丢失先验知识。然后,我们提出了一种新颖的记忆缓冲方法,即在线合作记忆(OCM)框架,该框架由短期内存(STM)组成,该框架不断存储最近的样本以为模型提供未来的信息,以及长期记忆( LTM)旨在保留各种样本。拟议的OCM根据信息多样性选择标准将某些样本从STM转移到LTM,而无需任何监督信号。然后将OCM框架与动态VAE扩展混合网络结合使用,以进一步增强其性能。
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
古典机器学习者仅设计用于解决一项任务,而无需采用新的新兴任务或课程,而这种能力在现实世界中更实用和人类。为了解决这种缺点,阐述了持续的机器学习者,以表彰使用域和班级的任务流,不同的任务之间的转变。在本文中,我们提出了一种基于一个基于对比的连续学习方法,其能够处理多个持续学习场景。具体地,我们通过特征传播和对比表示学习来对准当前和先前的表示空间来弥合不同任务之间的域移位。为了进一步减轻特征表示的类别的班次,利用了监督的对比损失以使与不同类别的相同类的示例嵌入。广泛的实验结果表明,与一组尖端连续学习方法相比,六个连续学习基准中提出的方法的出色性能。
translated by 谷歌翻译
恶意软件(恶意软件)分类为持续学习(CL)制度提供了独特的挑战,这是由于每天收到的新样本的数量以及恶意软件的发展以利用新漏洞。在典型的一天中,防病毒供应商将获得数十万个独特的软件,包括恶意和良性,并且在恶意软件分类器的一生中,有超过十亿个样品很容易积累。鉴于问题的规模,使用持续学习技术的顺序培训可以在减少培训和存储开销方面提供可观的好处。但是,迄今为止,还没有对CL应用于恶意软件分类任务的探索。在本文中,我们研究了11种应用于三个恶意软件任务的CL技术,涵盖了常见的增量学习方案,包括任务,类和域增量学习(IL)。具体而言,使用两个现实的大规模恶意软件数据集,我们评估了CL方法在二进制恶意软件分类(domain-il)和多类恶意软件家庭分类(Task-IL和类IL)任务上的性能。令我们惊讶的是,在几乎所有情况下,持续的学习方法显着不足以使训练数据的幼稚关节重播 - 在某些情况下,将精度降低了70个百分点以上。与关节重播相比,有选择性重播20%的存储数据的一种简单方法可以实现更好的性能,占训练时间的50%。最后,我们讨论了CL技术表现出乎意料差的潜在原因,希望它激发进一步研究在恶意软件分类域中更有效的技术。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
我们考虑了类增量学习(CIL)问题,其中学习代理人通过逐步到达的培训数据批次不断学习新课程,并旨在在迄今为止所学的所有课程中很好地预测。问题的主要挑战是灾难性的遗忘,对于基于典范的示例性记忆方法,通常众所周知,遗忘通常是由于分类评分偏差引起的,该分类得分偏差是由于新类和新类之间的数据失衡而注射的旧课(在示例记忆中)。尽管已经提出了几种方法来通过一些其他后处理(例如,得分重新缩放或平衡的微调)来纠正这种分数偏见,但没有对这种偏见的根本原因进行系统分析。为此,我们分析了通过组合所有旧类和新类的输出得分来计算SoftMax概率的主要原因。然后,我们提出了一种新方法,称为分离的软磁性学习(SS-IL),该方法由分离的SoftMax(SS)输出层组成,结合了任务知识蒸馏(TKD)来解决此类偏见。在几个大规模CIL基准数据集的广泛实验结果中,我们通过在没有任何其他后处理的情况下获得更加平衡的预测分数来表明我们的SS-IL实现了强大的最新准确性。
translated by 谷歌翻译
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
translated by 谷歌翻译
我们研究了类新型小说类发现的新任务(class-incd),该任务是指在未标记的数据集中发现新型类别的问题,该问题通过利用已在包含脱节的标签数据集上训练的预训练的模型,该模型已受过培训但是相关类别。除了发现新颖的课程外,我们还旨在维护模型识别先前看到的基本类别的能力。受到基于彩排的增量学习方法的启发,在本文中,我们提出了一种新颖的方法,以防止通过共同利用基类功能原型和特征级知识蒸馏来忘记对基础类的过去信息。我们还提出了一种自我训练的聚类策略,该策略同时将新颖的类别簇簇,并为基础和新颖类培训共同分类器。这使得我们的方法能够在课堂内设置中运行。我们的实验以三个共同的基准进行,表明我们的方法显着优于最先进的方法。代码可从https://github.com/oatmealliu/class-incd获得
translated by 谷歌翻译
无数据知识蒸馏(KD)允许从训练有素的神经网络(教师)到更紧凑的一个(学生)的知识转移在没有原始训练数据。现有的作品使用验证集来监视学生通过实际数据的准确性,并在整个过程中报告最高性能。但是,验证数据可能无法在蒸馏时间可用,使得记录实现峰值精度的学生快照即可。因此,实际的无数据KD方法应该是坚固的,理想情况下,在蒸馏过程中理想地提供单调增加的学生准确性。这是具有挑战性的,因为学生因合成数据的分布转移而经历了知识劣化。克服这个问题的直接方法是定期存储和排练生成的样本,这增加了内存占据措施并创造了隐私问题。我们建议用生成网络模拟先前观察到的合成样品的分布。特别地,我们设计了具有训练目标的变形式自动化器(VAE),其定制以最佳地学习合成数据表示。学生被生成的伪重播技术排练,其中样品由VAE产生。因此,可以防止知识劣化而不存储任何样本。在图像分类基准测试中的实验表明,我们的方法优化了蒸馏模型精度的预期值,同时消除了采样存储方法产生的大型内存开销。
translated by 谷歌翻译
在运行时检测新颖类的问题称为开放式检测,对于各种现实世界应用,例如医疗应用,自动驾驶等。在深度学习的背景下进行开放式检测涉及解决两个问题:(i):(i)必须将输入图像映射到潜在表示中,该图像包含足够的信息来检测异常值,并且(ii)必须学习一个可以从潜在表示中提取此信息以识别异常情况的异常评分函数。深度异常检测方法的研究缓慢进展。原因之一可能是大多数论文同时引入了新的表示学习技术和新的异常评分方法。这项工作的目的是通过提供分别衡量表示学习和异常评分的有效性的方法来改善这种方法。这项工作做出了两项方法论贡献。首先是引入甲骨文异常检测的概念,以量化学习潜在表示中可用的信息。第二个是引入Oracle表示学习,该学习产生的表示形式可以保证足以准确的异常检测。这两种技术可帮助研究人员将学习表示的质量与异常评分机制的性能分开,以便他们可以调试和改善系统。这些方法还为通过更好的异常评分机制改善了多少开放类别检测提供了上限。两个牙齿的组合给出了任何开放类别检测方法可以实现的性能的上限。这项工作介绍了这两种Oracle技术,并通过将它们应用于几种领先的开放类别检测方法来演示其实用性。
translated by 谷歌翻译
Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF throughout the training process, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network's embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译
持续学习(CL)旨在制定模仿人类能力顺序学习新任务的能力,同时能够保留从过去经验获得的知识。在本文中,我们介绍了内存约束在线连续学习(MC-OCL)的新问题,这对存储器开销对可能算法可以用于避免灾难性遗忘的记忆开销。最多,如果不是全部,之前的CL方法违反了这些约束,我们向MC-OCL提出了一种算法解决方案:批量蒸馏(BLD),基于正则化的CL方法,有效地平衡了稳定性和可塑性,以便学习数据流,同时保留通过蒸馏解决旧任务的能力。我们在三个公开的基准测试中进行了广泛的实验评估,经验证明我们的方法成功地解决了MC-OCL问题,并实现了需要更高内存开销的先前蒸馏方法的可比准确性。
translated by 谷歌翻译
在这个不断变化的世界中,必须不断学习新概念的能力。但是,深层神经网络在学习新类别时会遭受灾难性的遗忘。已经提出了许多减轻这种现象的作品,而其中大多数要么属于稳定性困境,要么陷入了过多的计算或储存开销。受到梯度增强算法的启发,以逐渐适应目标模型和上一个合奏模型之间的残差,我们提出了一种新颖的两阶段学习范式寄养物,使该模型能够适应新的类别。具体而言,我们首先动态扩展新模块,以适合原始模型的目标和输出之间的残差。接下来,我们通过有效的蒸馏策略删除冗余参数和特征尺寸,以维护单个骨干模型。我们在不同的设置下验证CIFAR-100和Imagenet-100/1000的方法寄养。实验结果表明,我们的方法实现了最先进的性能。代码可在以下网址获得:https://github.com/g-u-n/eccv22-foster。
translated by 谷歌翻译
Many modern computer vision algorithms suffer from two major bottlenecks: scarcity of data and learning new tasks incrementally. While training the model with new batches of data the model looses it's ability to classify the previous data judiciously which is termed as catastrophic forgetting. Conventional methods have tried to mitigate catastrophic forgetting of the previously learned data while the training at the current session has been compromised. The state-of-the-art generative replay based approaches use complicated structures such as generative adversarial network (GAN) to deal with catastrophic forgetting. Additionally, training a GAN with few samples may lead to instability. In this work, we present a novel method to deal with these two major hurdles. Our method identifies a better embedding space with an improved contrasting loss to make classification more robust. Moreover, our approach is able to retain previously acquired knowledge in the embedding space even when trained with new classes. We update previous session class prototypes while training in such a way that it is able to represent the true class mean. This is of prime importance as our classification rule is based on the nearest class mean classification strategy. We have demonstrated our results by showing that the embedding space remains intact after training the model with new classes. We showed that our method preformed better than the existing state-of-the-art algorithms in terms of accuracy across different sessions.
translated by 谷歌翻译