The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting. However, task confusion is not well assessed within this framework, e.g., the discrepancy between classes of different tasks is not well learned (i.e., inter-task confusion, ITC), and certain priority is still given to the latest class batch (i.e., old-new confusion, ONC). We empirically validate the side effects of the two types of confusion. Meanwhile, a novel solution called Task Correlated Incremental Learning (TCIL) is proposed to encourage discriminative and fair feature utilization across tasks. TCIL performs a multi-level knowledge distillation to propagate knowledge learned from old tasks to the new one. It establishes information flow paths at both feature and logit levels, enabling the learning to be aware of old classes. Besides, attention mechanism and classifier re-scoring are applied to generate more fair classification scores. We conduct extensive experiments on CIFAR100 and ImageNet100 datasets. The results demonstrate that TCIL consistently achieves state-of-the-art accuracy. It mitigates both ITC and ONC, while showing advantages in battle with catastrophic forgetting even no rehearsal memory is reserved.
translated by 谷歌翻译
深度学习模型在逐步学习新任务时遭受灾难性遗忘。已经提出了增量学习,以保留旧课程的知识,同时学习识别新课程。一种典型的方法是使用一些示例来避免忘记旧知识。在这种情况下,旧类和新课之间的数据失衡是导致模型性能下降的关键问题。由于数据不平衡,已经设计了几种策略来纠正新类别的偏见。但是,他们在很大程度上依赖于新旧阶层之间偏见关系的假设。因此,它们不适合复杂的现实世界应用。在这项研究中,我们提出了一种假设不足的方法,即多粒性重新平衡(MGRB),以解决此问题。重新平衡方法用于减轻数据不平衡的影响;但是,我们从经验上发现,他们将拟合新的课程。为此,我们进一步设计了一个新颖的多晶正式化项,该项使模型还可以考虑除了重新平衡数据之外的类别的相关性。类层次结构首先是通过将语义或视觉上类似类分组来构建的。然后,多粒性正则化将单热标签向量转换为连续的标签分布,这反映了基于构造的类层次结构的目标类别和其他类之间的关系。因此,该模型可以学习类间的关系信息,这有助于增强新旧课程的学习。公共数据集和现实世界中的故障诊断数据集的实验结果验证了所提出的方法的有效性。
translated by 谷歌翻译
在这个不断变化的世界中,必须不断学习新概念的能力。但是,深层神经网络在学习新类别时会遭受灾难性的遗忘。已经提出了许多减轻这种现象的作品,而其中大多数要么属于稳定性困境,要么陷入了过多的计算或储存开销。受到梯度增强算法的启发,以逐渐适应目标模型和上一个合奏模型之间的残差,我们提出了一种新颖的两阶段学习范式寄养物,使该模型能够适应新的类别。具体而言,我们首先动态扩展新模块,以适合原始模型的目标和输出之间的残差。接下来,我们通过有效的蒸馏策略删除冗余参数和特征尺寸,以维护单个骨干模型。我们在不同的设置下验证CIFAR-100和Imagenet-100/1000的方法寄养。实验结果表明,我们的方法实现了最先进的性能。代码可在以下网址获得:https://github.com/g-u-n/eccv22-foster。
translated by 谷歌翻译
在学习新知识时,班级学习学习(CIL)与灾难性遗忘和无数据CIL(DFCIL)的斗争更具挑战性,而无需访问以前学过的课程的培训数据。尽管最近的DFCIL作品介绍了诸如模型反转以合成以前类的数据,但由于合成数据和真实数据之间的严重域间隙,它们无法克服遗忘。为了解决这个问题,本文提出了有关DFCIL的关系引导的代表学习(RRL),称为R-DFCIL。在RRL中,我们引入了关系知识蒸馏,以灵活地将新数据的结构关系从旧模型转移到当前模型。我们的RRL增强DFCIL可以指导当前的模型来学习与以前类的表示更好地兼容的新课程的表示,从而大大减少了在改善可塑性的同时遗忘。为了避免表示和分类器学习之间的相互干扰,我们在RRL期间采用本地分类损失而不是全球分类损失。在RRL之后,分类头将通过全球类平衡的分类损失进行完善,以解决数据不平衡问题,并学习新课程和以前类之间的决策界限。关于CIFAR100,Tiny-Imagenet200和Imagenet100的广泛实验表明,我们的R-DFCIL显着超过了以前的方法,并实现了DFCIL的新最新性能。代码可从https://github.com/jianzhangcs/r-dfcil获得。
translated by 谷歌翻译
深网络架构在不忘记以前的任务的情况下努力继续学习新任务。最近的趋势表明,基于参数扩展的动态架构可以在持续学习中有效地减少灾难性忘记。但是,现有方法通常需要在测试时需要任务标识符,需要复杂调整以平衡越来越多的参数,并且几乎不在任务中共享任何信息。结果,他们努力扩展到大量任务,而无需显着开销。在本文中,我们提出了一种基于专用编码器/解码器框架的变压器体系结构。批判性地,编码器和解码器在所有任务中共享。通过特殊令牌的动态扩展,我们专注于任务分发的解码器网络的各个向前。由于严格控制参数扩展,我们的策略缩小到大量任务,同时具有可忽略的内存和时间开销。此外,这种有效的策略不需要任何HyperParameter调整来控制网络的扩展。我们的模型在大型ImageNet100和ImageNet100上达到了Cifar100和最先进的表演,而参数比并发动态框架的参数越小。
translated by 谷歌翻译
Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks -a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatialbased distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively. 5
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译
传统的机器学习系统在封闭世界的环境下部署,这需要在离线培训过程之前的整个培训数据。但是,现实世界应用程序经常面临进入的新类,而模型应不断融合它们。学习范例称为类 - 增量学习(CIL)。我们提出了一个Python工具箱,实现了多个关键算法,用于类渐进式学习,以缓解机器学习界中的研究人员的负担。该工具箱包含CIL的许多创始工作的实现,例如EWC和ICARL,但还提供了最先进的算法,可用于进行新颖的基础研究。这个工具箱,名为python类 - 增量学习的python,可在https://github.com/g-u-n/pycil上获得
translated by 谷歌翻译
Conventionally, deep neural networks are trained offline, relying on a large dataset prepared in advance. This paradigm is often challenged in real-world applications, e.g. online services that involve continuous streams of incoming data. Recently, incremental learning receives increasing attention, and is considered as a promising solution to the practical challenges mentioned above. However, it has been observed that incremental learning is subject to a fundamental difficulty -catastrophic forgetting, namely adapting a model to new data often results in severe performance degradation on previous tasks or classes. Our study reveals that the imbalance between previous and new data is a crucial cause to this problem. In this work, we develop a new framework for incrementally learning a unified classifier, i.e. a classifier that treats both old and new classes uniformly. Specifically, we incorporate three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance. Experiments show that the proposed method can effectively rebalance the training process, thus obtaining superior performance compared to the existing methods. On CIFAR-100 and ImageNet, our method can reduce the classification errors by more than 6% and 13% respectively, under the incremental setting of 10 phases.
translated by 谷歌翻译
Lifelong person re-identification (LReID) is in significant demand for real-world development as a large amount of ReID data is captured from diverse locations over time and cannot be accessed at once inherently. However, a key challenge for LReID is how to incrementally preserve old knowledge and gradually add new capabilities to the system. Unlike most existing LReID methods, which mainly focus on dealing with catastrophic forgetting, our focus is on a more challenging problem, which is, not only trying to reduce the forgetting on old tasks but also aiming to improve the model performance on both new and old tasks during the lifelong learning process. Inspired by the biological process of human cognition where the somatosensory neocortex and the hippocampus work together in memory consolidation, we formulated a model called Knowledge Refreshing and Consolidation (KRC) that achieves both positive forward and backward transfer. More specifically, a knowledge refreshing scheme is incorporated with the knowledge rehearsal mechanism to enable bi-directional knowledge transfer by introducing a dynamic memory model and an adaptive working model. Moreover, a knowledge consolidation scheme operating on the dual space further improves model stability over the long term. Extensive evaluations show KRC's superiority over the state-of-the-art LReID methods on challenging pedestrian benchmarks.
translated by 谷歌翻译
A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a classincremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively.iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail.
translated by 谷歌翻译
本文在课堂增量学习中使用视觉变压器(VIT)研究。令人惊讶的是,天真地应用Vit替代卷积神经网络(CNNS)导致性能下降。我们的分析揭示了三个天然使用VIT的问题:(a)vit在课程中较小时具有非常缓慢的会聚,(b)在比CNN的模型中观察到新类的更多偏差,并且(c)适当的学习率Vit太低,无法学习良好的分类器。基于此分析,我们展示了这些问题可以简单地通过使用现有技术来解决:使用卷积杆,平衡FineTuning来纠正偏置,以及分类器的更高学习率。我们的简单解决方案名为Vitil(Vit用于增量学习),为所有三类增量学习设置实现了全新的最先进的保证金,为研究界提供了强大的基线。例如,在ImageNet-1000上,我们的体内体达到69.20%的前1个精度为500个初始类别的15个初始类别,5个增量步骤(每次100个新类),表现优于leulir + dde ​​1.69%。对于10个增量步骤(100个新课程)的更具挑战性的协议,我们的方法优于PODNet 7.27%(65.13%与57.86%)。
translated by 谷歌翻译
Modern machine learning suffers from catastrophic forgetting when learning new classes incrementally. The performance dramatically degrades due to the missing data of old classes. Incremental learning methods have been proposed to retain the knowledge acquired from the old classes, by using knowledge distilling and keeping a few exemplars from the old classes. However, these methods struggle to scale up to a large number of classes. We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes. Distinguishing between an increasing number of visually similar classes is particularly challenging, when the training data is unbalanced. We propose a simple and effective method to address this data imbalance issue. We found that the last fully connected layer has a strong bias towards the new classes, and this bias can be corrected by a linear model. With two bias parameters, our method performs remarkably well on two large datasets: ImageNet (1000 classes) and MS-Celeb-1M (10000 classes), outperforming the state-of-the-art algorithms by 11.1% and 13.2% respectively.
translated by 谷歌翻译
我们研究了类新型小说类发现的新任务(class-incd),该任务是指在未标记的数据集中发现新型类别的问题,该问题通过利用已在包含脱节的标签数据集上训练的预训练的模型,该模型已受过培训但是相关类别。除了发现新颖的课程外,我们还旨在维护模型识别先前看到的基本类别的能力。受到基于彩排的增量学习方法的启发,在本文中,我们提出了一种新颖的方法,以防止通过共同利用基类功能原型和特征级知识蒸馏来忘记对基础类的过去信息。我们还提出了一种自我训练的聚类策略,该策略同时将新颖的类别簇簇,并为基础和新颖类培训共同分类器。这使得我们的方法能够在课堂内设置中运行。我们的实验以三个共同的基准进行,表明我们的方法显着优于最先进的方法。代码可从https://github.com/oatmealliu/class-incd获得
translated by 谷歌翻译
在课堂增量学习(CIL)设置中,在每个学习阶段将类别组引入模型。目的是学习到目前为止观察到的所有类别的统一模型表现。鉴于视觉变压器(VIT)在常规分类设置中的最新流行,一个有趣的问题是研究其持续学习行为。在这项工作中,我们为CIL开发了一个伪造的双蒸馏变压器,称为$ \ textrm {d}^3 \ textrm {前} $。提出的模型利用混合嵌套的VIT设计,以确保数据效率和可扩展性对小数据集和大数据集。与最近的基于VIT的CIL方法相反,我们的$ \ textrm {d}^3 \ textrm {前} $在学习新任务并仍然适用于大量增量任务时不会动态扩展其体系结构。 $ \ textrm {d}^3 \ textrm {oft} $的CIL行为的改善归功于VIT设计的两个基本变化。首先,我们将增量学习视为一个长尾分类问题,其中大多数新课程的大多数样本都超过了可用于旧课程的有限范例。为了避免对少数族裔的偏见,我们建议动态调整逻辑,以强调保留与旧任务相关的表示形式。其次,我们建议在学习跨任务进行时保留空间注意图的配置。这有助于减少灾难性遗忘,通过限制模型以将注意力保留到最歧视区域上。 $ \ textrm {d}^3 \ textrm {以前} $在CIFAR-100,MNIST,SVHN和Imagenet数据集的增量版本上获得了有利的结果。
translated by 谷歌翻译
在课堂学习学习中,预计该模型将在保持以前课程的知识的同时,不断地学习新课程。这里的挑战在于保留该模型在功能空间中有效代表先前类的能力,同时调整其代表传入的新类。我们提出了两个基于蒸馏的目标,用于类增量学习,以利用特征空间的结构来维持以前的课程的准确性,并使学习新课程。在我们的第一个目标(称为跨空间聚类(CSC))中,我们建议使用先前模型的特征空间结构来表征优化的方向,这些方向可以最大程度地保留类 - 特定类的所有实例应集体优化,对,以及他们应该集体优化的人。除了最大程度地减少忘记之外,这种间接的鼓励模型将所有类的实例聚集在当前功能空间中,并引起牛群免疫的感觉,从而使班级的所有样本都可以将模型共同与遗忘班级共同打击模型。我们的第二个目标被称为受控转移(CT)从研究班间转移的研究的逐步学习。 CT明确近似于和条件,当前模型在逐步到达类和先验类之间的语义相似性上。这使模型可以学习类,以使其从相似的先前类中最大化正向转移,从而提高可塑性,并最大程度地减少不同先验类别的负向后转移,从而增强稳定性。我们在两个基准数据集上执行了广泛的实验,并在三种突出的课堂学习方法的顶部添加了我们的方法(CSCCT)。我们观察到各种实验环境的性能一致。
translated by 谷歌翻译
传统的检测网络通常需要丰富的标记训练样本,而人类可以只有几个例子逐步学习新概念。本文侧重于更具挑战性,而是逼真的类渐进的少量对象检测问题(IFSD)。它旨在逐渐逐渐地将新型对象的模型转移到几个注释的样本中,而不会灾难性地忘记先前学识的样本。为了解决这个问题,我们提出了一种新的方法,最小的方法可以减少遗忘,更少的培训资源和更强的转移能力。具体而言,我们首先介绍转移策略,以减少不必要的重量适应并改善IFSD的传输能力。在此基础上,我们使用较少的资源消耗方法整合知识蒸馏技术来缓解遗忘,并提出基于新的基于聚类的示例选择过程,以保持先前学习的更多辨别特征。作为通用且有效的方法,最多可以在很大程度上提高各种基准测试的IFSD性能。
translated by 谷歌翻译
当在新的类或新任务上逐步训练时,深度神经网络易于灾难性遗忘,因为对新数据的适应导致旧课程和任务的性能急剧下降。通过使用小记忆进行排练和知识蒸馏,已证明最近的方法可有效缓解灾难性的遗忘。然而,由于内存的尺寸有限,旧的和新类可用的数据量之间的大不平衡仍然存在,这导致模型的整体精度恶化。为了解决这个问题,我们建议使用平衡的软制跨熵损失,并表明它可以与进出的方法相结合,以便在某些情况下降低培训过程的计算成本,以提高其性能。对竞争的想象,Subimagenet和CiFar100数据集的实验显示了最艺术态度的结果。
translated by 谷歌翻译
终身学习旨在学习一系列任务,而无需忘记先前获得的知识。但是,由于隐私或版权原因,涉及的培训数据可能不是终身合法的。例如,在实际情况下,模型所有者可能希望不时启用或禁用特定任务或特定样本的知识。不幸的是,这种灵活的对知识转移的灵活控制在以前的增量或减少学习方法中,即使在问题设定的水平上也被忽略了。在本文中,我们探索了一种新颖的学习方案,称为学习,可回收遗忘(LIRF),该方案明确处理任务或特定于样本的知识去除和恢复。具体而言,LIRF带来了两个创新的方案,即知识存款和撤回,这使用户指定的知识从预先训练的网络中隔离开来,并在必要时将其注入。在知识存款过程中,从目标网络中提取了指定的知识并存储在存款模块中,同时保留了目标网络的不敏感或一般知识,并进一步增强。在知识提取期间,将带走知识添加回目标网络。存款和提取过程仅需在删除数据上对几个时期进行填充时期,从而确保数据和时间效率。我们在几个数据集上进行实验,并证明所提出的LIRF策略具有令人振奋的概括能力。
translated by 谷歌翻译
在从少数类(基类)开始的情况下,已经广泛研究了课堂学习学习(CIL)。取而代之的是,我们探索了一个研究不足的CIL现实环境,该设置是从在大量基类中进行预训练的强大模型开始。我们假设强大的基本模型可以为新颖的类别提供良好的表示,并且可以通过小型适应来进行渐进的学习。我们提出了一个2阶段的训练方案,i)功能增强 - 将部分的克隆部分克隆并在新型数据上进行微调,ii)融合 - 将基础和新型分类器组合到统一的分类器中。实验表明,所提出的方法在大型成像网数据集上的最先进的CIL方法明显优于最先进的CIL方法(例如,总体准确度 +最佳 +最佳精度为10%)。我们还建议和分析研究研究的实际CIL方案,例如与分布转移的基础新颖性重叠。我们提出的方法是鲁棒的,并概括了所有分析的CIL设置。代码可从https://github.com/amazon-research/sp-cil获得。
translated by 谷歌翻译