无监督的终身学习是指随着时间的流逝学习的能力,同时在没有监督的情况下记住以前的模式。以前的作品假设了有关传入数据(例如,了解类边界)的强大先验知识,这些数据是在复杂且不可预测的环境中无法获得的。在本文中,以现实世界情景的启发,我们通过类外的流媒体数据正式定义了在线无监督的终身学习问题,该数据是非IID和单次通道。由于缺乏标签和先验知识,该问题比现有的终身学习问题更具挑战性。为了解决这个问题,我们提出了自我监督的对比终身学习(比例),该学习提取并记住了知识。规模围绕三个主要组成部分进行设计:伪监督的对比损失,自我监督的遗忘损失以及统一子集选择的在线记忆更新。这三个组件旨在协作以最大程度地提高学习表现。我们的损失功能利用成对相似性,因此消除了对监督或先验知识的依赖。我们在IID和四个非IID数据流下进行了全面的规模实验。在所有设置上,缩放量优于最佳最新算法,在CIFAR-10,CIFAR-100和Subimagenet数据集上,提高了高达6.43%,5.23%和5.86%的KNN精度。
translated by 谷歌翻译
持续学习(CL)旨在开发单一模型适应越来越多的任务的技术,从而潜在地利用跨任务的学习以资源有效的方式。 CL系统的主要挑战是灾难性的遗忘,在学习新任务时忘记了早期的任务。为了解决此问题,基于重播的CL方法在遇到遇到任务中选择的小缓冲区中维护和重复培训。我们提出梯度Coreset重放(GCR),一种新颖的重播缓冲区选择和使用仔细设计的优化标准的更新策略。具体而言,我们选择并维护一个“Coreset”,其与迄今为止关于当前模型参数的所有数据的梯度紧密近似,并讨论其有效应用于持续学习设置所需的关键策略。在学习的离线持续学习环境中,我们在最先进的最先进的最先进的持续学习环境中表现出显着的收益(2%-4%)。我们的调查结果还有效地转移到在线/流媒体CL设置,从而显示现有方法的5%。最后,我们展示了持续学习的监督对比损失的价值,当与我们的子集选择策略相结合时,累计增益高达5%。
translated by 谷歌翻译
General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods. Code is available at https://github.com/lijincm/CoCa.
translated by 谷歌翻译
当自我监督的模型已经显示出比在规模上未标记的数据训练的情况下的监督对方的可比视觉表现。然而,它们的功效在持续的学习(CL)场景中灾难性地减少,其中数据被顺序地向模型呈现给模型。在本文中,我们表明,通过添加将表示的当前状态映射到其过去状态,可以通过添加预测的网络来无缝地转换为CL的蒸馏机制。这使我们能够制定一个持续自我监督的视觉表示的框架,学习(i)显着提高了学习象征的质量,(ii)与若干最先进的自我监督目标兼容(III)几乎没有近似参数调整。我们通过在各种CL设置中培训六种受欢迎的自我监督模型来证明我们的方法的有效性。
translated by 谷歌翻译
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).
translated by 谷歌翻译
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
translated by 谷歌翻译
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational cost, in which the main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones. With the specific focus on the class-incremental OCL scenario, i.e. OCL for classification, the recent advance incorporates the contrastive learning technique for learning more generalised feature representation to achieve the state-of-the-art performance but is still unable to fully resolve the catastrophic forgetting. In this paper, we follow the strategy of adopting contrastive learning but further introduce the semantically distinct augmentation technique, in which it leverages strong augmentation to generate more data samples, and we show that considering these samples semantically different from their original classes (thus being related to the out-of-distribution samples) in the contrastive learning mechanism contributes to alleviate forgetting and facilitate model stability. Moreover, in addition to contrastive learning, the typical classification mechanism and objective (i.e. softmax classifier and cross-entropy loss) are included in our model design for faster convergence and utilising the label information, but particularly equipped with a sampling strategy to tackle the tendency of favouring the new classes (i.e. model bias towards the recently learnt classes). Upon conducting extensive experiments on CIFAR-10, CIFAR-100, and Mini-Imagenet datasets, our proposed method is shown to achieve superior performance against various baselines.
translated by 谷歌翻译
现代ML方法在培训数据是IID,大规模和良好标记的时候Excel。在不太理想的条件下学习仍然是一个开放的挑战。在不利条件下,几次射击,持续的,转移和代表学习的子场在学习中取得了很大的进步;通过方法和见解,每个都提供了独特的优势。这些方法解决了不同的挑战,例如依次到达的数据或稀缺的训练示例,然而,在部署之前,ML系统将面临困难的条件。因此,需要可以处理实际设置中许多学习挑战的一般ML系统。为了促进一般ML方法目标的研究,我们介绍了一个新的统一评估框架 - 流体(灵活的顺序数据)。流体集成了几次拍摄,持续的,转移和表示学习的目标,同时能够比较和整合这些子场的技术。在流体中,学习者面临数据流,并且必须在选择如何更新自身时进行顺序预测,快速调整到新颖的类别,并处理更改的数据分布;虽然会计计算总额。我们对广泛的方法进行实验,这些方法阐述了新的洞察当前解决方案的优缺点并表明解决了新的研究问题。作为更一般方法的起点,我们展示了两种新的基线,其在流体上优于其他评估的方法。项目页面:https://raivn.cs.washington.edu/projects/fluid/。
translated by 谷歌翻译
最近的自我监督学习方法能够学习高质量的图像表示,并通过监督方法关闭差距。但是,这些方法无法逐步获取新的知识 - 事实上,它们实际上主要仅用为具有IID数据的预训练阶段。在这项工作中,我们在没有额外的记忆或重放的情况下调查持续学习制度的自我监督方法。为防止忘记以前的知识,我们提出了功能正规化的使用。我们将表明,朴素的功能正则化,也称为特征蒸馏,导致可塑性的低可塑性,因此严重限制了连续的学习性能。为了解决这个问题,我们提出了预测的功能正则化,其中一个单独的投影网络确保新学习的特征空间保留了先前的特征空间的信息,同时允许学习新功能。这使我们可以防止在保持学习者的可塑性时忘记。针对应用于自我监督的其他增量学习方法的评估表明我们的方法在不同场景和多个数据集中获得竞争性能。
translated by 谷歌翻译
先前的关于自我监督预训练的研究重点是联合培训方案,在该场景中,假定大量未标记的数据一次性地将其作为输入,只有那时才受过培训的学习者。不幸的是,这种问题设置通常是不切实际的,即使不是不可行的,因为许多现实世界的任务依赖于顺序学习,例如,数据是以流方式分散或收集的。在本文中,我们对通过流数据进行了对自我监督的预训练进行了首次彻底而专门的研究,旨在阐明这种被忽视的设置下的模型行为。具体而言,我们在来自ImageNet和域内的四类预训练流数据数据上预先培训超过500个模型,并在三种类型的下游任务和12个不同的下游数据集上对其进行评估。我们的研究表明,以某种方式超出了我们的期望,通过简单的数据重播或参数正则化,顺序的自我监督预训练的预训练证明是联合预训练的有效替代方法,因为前者的性能主要与这些培训相同后者。此外,灾难性的遗忘是顺序监督学习中的一个常见问题,在顺序的自学学习(SSL)中得到了极大的缓解,这是通过我们对损失景观中最小值的表示和敏锐度的全面经验分析来很好地证明的。因此,我们的发现表明,在实践中,对于SSL,可以主要通过顺序学习来代替繁琐的联合培训,这反过来又可以更广泛的潜在应用方案。
translated by 谷歌翻译
对于人工学习系统,随着时间的流逝,从数据流进行持续学习至关重要。对监督持续学习的新兴研究取得了长足的进步,而无监督学习中灾难性遗忘的研究仍然是空白的。在无监督的学习方法中,自居民学习方法在视觉表示上显示出巨大的潜力,而无需大规模标记的数据。为了改善自我监督学习的视觉表示,需要更大和更多的数据。在现实世界中,始终生成未标记的数据。这种情况为学习自我监督方法提供了巨大的优势。但是,在当前的范式中,将先前的数据和当前数据包装在一起并再次培训是浪费时间和资源。因此,迫切需要一种持续的自我监督学习方法。在本文中,我们首次尝试通过提出彩排方法来实现连续的对比自我监督学习,从而使以前的数据保持了一些典范。我们通过模仿旧网络通过一组保存的示例,通过模仿旧网络推断出的相似性分数分布,而不是将保存的示例与当前数据集结合到当前的培训数据集,而是利用自我监督的知识蒸馏将对比度信息传输到当前网络。此外,我们建立一个额外的样本队列,以帮助网络区分以前的数据和当前数据并在学习自己的功能表示时防止相互干扰。实验结果表明,我们的方法在CIFAR100和Imagenet-Sub上的性能很好。与基线的学习任务无需采用任何技术,我们将图像分类在CIFAR100上提高了1.60%,Imagenet-Sub上的2.86%,在10个增量步骤设置下对Imagenet-Full进行1.29%。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
古典机器学习者仅设计用于解决一项任务,而无需采用新的新兴任务或课程,而这种能力在现实世界中更实用和人类。为了解决这种缺点,阐述了持续的机器学习者,以表彰使用域和班级的任务流,不同的任务之间的转变。在本文中,我们提出了一种基于一个基于对比的连续学习方法,其能够处理多个持续学习场景。具体地,我们通过特征传播和对比表示学习来对准当前和先前的表示空间来弥合不同任务之间的域移位。为了进一步减轻特征表示的类别的班次,利用了监督的对比损失以使与不同类别的相同类的示例嵌入。广泛的实验结果表明,与一组尖端连续学习方法相比,六个连续学习基准中提出的方法的出色性能。
translated by 谷歌翻译
在线在野外的在线学习是机器学习中的一项非常艰巨的任务。在线持续学习中的非公平性可能会带来神经网络中的灾难性忘记。具体而言,使用SODA10M DataSet进行自动驾驶的在线持续学习在具有连续分布换档的极度长尾分布上呈现出额外的问题。为了解决这些问题,我们通过对着软标签蒸馏的对比和监督对比学习来提出多个深度度量代表学习,以提高模型泛化。此外,我们利用修改的类平衡的焦点损失,以便在课堂上敏感惩罚不平衡和易易易易感。我们还在指导下存储一些样本,用于排练的不确定性度量标准,并执行在线和定期内存更新。我们所提出的方法通过平均平均阶级准确度(AMCA)对验证的64.01%和64.53%AMCA进行了相当大的概括。
translated by 谷歌翻译
我们探索无任务持续学习(CL),其中培训模型以避免在没有明确的任务边界或身份的情况下造成灾难性的遗忘。在无任务CL上的许多努力中,一个值得注意的方法是基于内存的,存储和重放训练示例的子集。然而,由于CL模型不断更新,所以存储的示例的效用可以随时间缩短。这里,我们提出基于梯度的存储器编辑(GMED),该框架是通过梯度更新在连续输入空间中编辑存储的示例的框架,以便为重放创建更多的“具有挑战性”示例。 GMED编辑的例子仍然类似于其未编辑的形式,但可以在即将到来的模型更新中产生增加的损失,从而使未来的重播在克服灾难性遗忘方面更有效。通过施工,GMED可以与其他基于内存的CL算法一起无缝应用,以进一步改进。实验验证了GMED的有效性,以及我们最好的方法显着优于基线和以前的五个数据集中的最先进。可以在https://github.com/ink-usc/gmed找到代码。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
Continual Learning (CL) is an emerging machine learning paradigm that aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. To avoid performance decrease caused by forgetting, prior studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. Despite the promising results, since CL is often assumed to execute on mobile or IoT devices, the EM size is bounded by the small hardware memory capacity and makes it infeasible to meet the accuracy requirements for real-world applications. Specifically, all prior CL methods discard samples overflowed from the EM and can never retrieve them back for subsequent training steps, incurring loss of information that would exacerbate catastrophic forgetting. We explore a novel hierarchical EM management strategy to address the forgetting issue. In particular, in mobile and IoT devices, real-time data can be stored not just in high-speed RAMs but in internal storage devices as well, which offer significantly larger capacity than the RAMs. Based on this insight, we propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage without being interfered by the slow access speed of the storage. We call it Carousel Memory (CarM). As CarM is complementary to existing CL methods, we conduct extensive evaluations of our method with seven popular CL methods and show that CarM significantly improves the accuracy of the methods across different settings by large margins in final average accuracy (up to 28.4%) while retaining the same training efficiency.
translated by 谷歌翻译
The ubiquity of edge devices has led to a growing amount of unlabeled data produced at the edge. Deep learning models deployed on edge devices are required to learn from these unlabeled data to continuously improve accuracy. Self-supervised representation learning has achieved promising performances using centralized unlabeled data. However, the increasing awareness of privacy protection limits centralizing the distributed unlabeled image data on edge devices. While federated learning has been widely adopted to enable distributed machine learning with privacy preservation, without a data selection method to efficiently select streaming data, the traditional federated learning framework fails to handle these huge amounts of decentralized unlabeled data with limited storage resources on edge. To address these challenges, we propose a Federated on-device Contrastive learning framework with Coreset selection, which we call FedCoCo, to automatically select a coreset that consists of the most representative samples into the replay buffer on each device. It preserves data privacy as each client does not share raw data while learning good visual representations. Experiments demonstrate the effectiveness and significance of the proposed method in visual representation learning.
translated by 谷歌翻译
Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incoming data stream of a class is naturally multimodal. To address this issue, in this work, we first propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM). Specifically, the centroids and covariance matrices of the mixture model are adapted incrementally according to incoming data streams. The advantages are two-fold: (i) we can characterize more accurately complex data streams and (ii) by using centroids for each class produced by OT-MM, we can estimate the similarity of an unseen example to each class more reasonably when doing inference. Moreover, to combat the catastrophic forgetting in the CIL scenario, we further propose Dynamic Preservation. Particularly, after performing the dynamic preservation technique across data streams, the latent representations of the classes in the old and new tasks become more condensed themselves and more separate from each other. Together with a contraction feature extractor, this technique facilitates the model in mitigating the catastrophic forgetting. The experimental results on real-world datasets show that our proposed method can significantly outperform the current state-of-the-art baselines.
translated by 谷歌翻译
对非平稳数据流的持续学习(CL)仍然是深层神经网络(DNN)的长期挑战之一,因为它们容易出现灾难性的遗忘。 CL模型可以从自我监督的预训练中受益,因为它可以学习更具概括性的任务不可能的功能。但是,随着任务序列的长度的增加,自我监督的预训练的影响会减少。此外,域前训练数据分布和任务分布之间的域转移降低了学习表示的普遍性。为了解决这些局限性,我们建议任务不可知代表合并(TARC),这是CL的两阶段培训范式,它交织了任务 - 诺斯局和特定于任务的学习,从而自欺欺人的培训,然后为每个任务进行监督学习。为了进一步限制在自我监督阶段的偏差,我们在监督阶段采用了任务不可屈服的辅助损失。我们表明,我们的培训范式可以轻松地添加到基于内存或正则化的方法中,并在更具挑战性的CL设置中提供一致的性能增长。我们进一步表明,它导致更健壮和校准的模型。
translated by 谷歌翻译