When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.
translated by 谷歌翻译
In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process, data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision which expert to deploy at test time. We introduce a set of gating autoencoders that learn a representation for the task at hand, and, at test time, automatically forward the test sample to the relevant expert. This also brings memory efficiency as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert, with fine-tuning or learningwithout-forgetting, can be selected. We evaluate our method on image classification and video prediction problems.
translated by 谷歌翻译
There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to perform well uniformly.
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose a novel approach for lifelong learning, coined Memory Aware Synapses (MAS). It computes the importance of the parameters of a neural network in an unsupervised and online manner. Given a new sample which is fed to the network, MAS accumulates an importance measure for each parameter of the network, based on how sensitive the predicted output function is to a change in this parameter. When learning a new task, changes to important parameters can then be penalized, effectively preventing important knowledge related to previous tasks from being overwritten. Further, we show an interesting connection between a local version of our method and Hebb's rule, which is a model for the learning process in the brain. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding for predicting <subject, predicate, object> triplets. We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.
translated by 谷歌翻译
This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and largescale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/ arunmallya/packnet
translated by 谷歌翻译
Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10× or 100×? This paper takes a step towards clearing the clouds of mystery surrounding the relationship between 'enormous data' and visual deep learning. By exploiting the JFT-300M dataset which has more than 375M noisy labels for 300M images, we investigate how the performance of current vision tasks would change if this data was used for representation learning. Our paper delivers some surprising (and some expected) findings. First, we find that the performance on vision tasks increases logarithmically based on volume of training data size. Second, we show that representation learning (or pretraining) still holds a lot of promise. One can improve performance on many vision tasks by just training a better base model. Finally, as expected, we present new state-of-theart results for different vision tasks including image classification, object detection, semantic segmentation and human pose estimation. Our sincere hope is that this inspires vision community to not undervalue the data and develop collective efforts in building larger datasets.
translated by 谷歌翻译
Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 16 classification networks on 12 image classification datasets. We find that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy (r = 0.99 and 0.96, respectively). In the former setting, we find that this relationship is very sensitive to the way in which networks are trained on ImageNet; many common forms of regularization slightly improve ImageNet accuracy but yield penultimate layer features that are much worse for transfer learning. Additionally, we find that, on two small fine-grained image classification datasets, pretraining on ImageNet provides minimal benefits, indicating the learned features from Ima-geNet do not transfer well to fine-grained tasks. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested.
translated by 谷歌翻译
基于正规化的方法有利于缓解类渐进式学习中的灾难性遗忘问题。由于缺乏旧任务图像,如果分类器在新图像上产生类似的输出,它们通常会假设旧知识得到很好的保存。在本文中,我们发现他们的效果很大程度上取决于旧课程的性质:它们在彼此之间容易区分的课程上工作,但可能在更细粒度的群体上失败,例如,男孩和女孩。在SPIRIT中,此类方法将新数据项目投入到完全连接层中的权重向量中跨越的特征空间,对应于旧类。由此产生的预测在细粒度的旧课程上是相似的,因此,新分类器将逐步失去这些课程的歧视能力。为了解决这个问题,我们提出了一种无记忆生成的重播策略,通过直接从旧分类器生成代表性的旧图像并结合新的分类器培训的新数据来保留细粒度的旧阶级特征。为了解决所产生的样本的均化问题,我们还提出了一种分集体损失,使得产生的样品之间的Kullback Leibler(KL)发散。我们的方法最好是通过先前的基于正规化的方法补充,证明是为了易于区分的旧课程有效。我们验证了上述关于CUB-200-2011,CALTECH-101,CIFAR-100和微小想象的设计和见解,并表明我们的策略优于现有的无记忆方法,并具有清晰的保证金。代码可在https://github.com/xmengxin/mfgr获得
translated by 谷歌翻译
我们提出了一个统一的查看,即通过通用表示,一个深层神经网络共同学习多个视觉任务和视觉域。同时学习多个问题涉及最大程度地减少具有不同幅度和特征的多个损失函数的加权总和,从而导致一个损失的不平衡状态,与学习每个问题的单独模型相比,一个损失的不平衡状态主导了优化和差的结果。为此,我们提出了通过小容量适配器将多个任务/特定于域网络的知识提炼到单个深神经网络中的知识。我们严格地表明,通用表示在学习NYU-V2和CityScapes中多个密集的预测问题方面实现了最新的表现,来自视觉Decathlon数据集中的不同域中的多个图像分类问题以及MetadataSet中的跨域中的几个域中学习。最后,我们还通过消融和定性研究进行多次分析。
translated by 谷歌翻译
Despite significant accuracy improvement in convolutional neural networks (CNN) based object detectors, they often require prohibitive runtimes to process an image for real-time applications. State-of-the-art models often use very deep networks with a large number of floating point operations. Efforts such as model compression learn compact models with fewer number of parameters, but with much reduced accuracy. In this work, we propose a new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation [20] and hint learning [34]. Although knowledge distillation has demonstrated excellent improvements for simpler classification setups, the complexity of detection poses new challenges in the form of regression, region proposals and less voluminous labels. We address this through several innovations such as a weighted cross-entropy loss to address class imbalance, a teacher bounded loss to handle the regression component and adaptation layers to better learn from intermediate teacher distributions. We conduct comprehensive empirical evaluation with different distillation configurations over multiple datasets including PASCAL, KITTI, ILSVRC and MS-COCO. Our results show consistent improvement in accuracy-speed trade-offs for modern multi-class detection models.
translated by 谷歌翻译
深度神经网络在学习新任务时遭受灾难性遗忘的主要限制。在本文中,我们专注于语义细分中的课堂持续学习,其中新类别随着时间的推移,而在未保留以前的训练数据。建议的持续学习方案塑造了潜在的空间来减少遗忘,同时提高了对新型课程的识别。我们的框架是由三种新的组件驱动,我们还毫不费力地结合现有的技术。首先,匹配的原型匹配在旧类上强制执行潜在空间一致性,约束编码器在后续步骤中为先前看到的类生成类似的潜在潜在表示。其次,特征稀疏性允许在潜在空间中腾出空间以容纳新型课程。最后,根据他们的语义,在统一的同时撕裂不同类别的语义,对形成对比的学习。对Pascal VOC2012和ADE20K数据集的广泛评估展示了我们方法的有效性,显着优于最先进的方法。
translated by 谷歌翻译
We investigate methods for combining multiple selfsupervised tasks-i.e., supervised tasks where data can be collected without manual labeling-in order to train a single visual representation. First, we provide an apples-toapples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for "harmonizing" network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks-even via a naïve multihead architecture-always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction.
translated by 谷歌翻译
Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multitask learning. Specifically, we propose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.
translated by 谷歌翻译
持续学习(CL)旨在制定模仿人类能力顺序学习新任务的能力,同时能够保留从过去经验获得的知识。在本文中,我们介绍了内存约束在线连续学习(MC-OCL)的新问题,这对存储器开销对可能算法可以用于避免灾难性遗忘的记忆开销。最多,如果不是全部,之前的CL方法违反了这些约束,我们向MC-OCL提出了一种算法解决方案:批量蒸馏(BLD),基于正则化的CL方法,有效地平衡了稳定性和可塑性,以便学习数据流,同时保留通过蒸馏解决旧任务的能力。我们在三个公开的基准测试中进行了广泛的实验评估,经验证明我们的方法成功地解决了MC-OCL问题,并实现了需要更高内存开销的先前蒸馏方法的可比准确性。
translated by 谷歌翻译
卷积神经网络在分类方面表现出了显着的结果,但在即时学习新事物方面挣扎。我们提出了一种新颖的彩排方法,其中深度神经网络正在不断学习新的看不见的对象类别,而无需保存任何先前序列的数据。我们的方法称为召回,因为网络通过在培训新类别之前计算旧类别的逻辑来回忆类别。然后在培训期间使用这些,以避免更改旧类别。对于每个新序列,都会添加一个新的头部以适应新类别。为了减轻遗忘,我们提出了一种正规化策略,在该策略中我们用回归替换分类。此外,对于已知类别,我们提出了一个玛哈拉氏症损失,其中包括差异,以说明已知类别和未知类别之间的密度变化。最后,我们提供了一个用于持续学习的新颖数据集,尤其是适用于移动机器人(Hows-CL-25)上的对象识别的数据集,其中包括25个家庭对象类别的150,795个合成图像。我们的方法回忆起优于Core50和ICIFAR-100上的艺术现状,并在HOWS-CL-25上取得了最佳性能。
translated by 谷歌翻译
持续深度学习的领域是一个新兴领域,已经取得了很多进步。但是,同时仅根据图像分类的任务进行了大多数方法,这在智能车辆领域无关。直到最近才提出了班级开展语义分割的方法。但是,所有这些方法都是基于某种形式的知识蒸馏。目前,尚未对基于重播的方法进行调查,这些方法通常在连续的环境中用于对象识别。同时,尽管无监督的语义分割的域适应性获得了很多吸引力,但在持续环境中有关域内收入学习的调查并未得到充分研究。因此,我们工作的目的是评估和调整已建立的解决方案,以连续对象识别语义分割任务,并为连续语义分割的任务提供基线方法和评估协议。首先,我们介绍了类和域内的分割的评估协议,并分析了选定的方法。我们表明,语义分割变化的任务的性质在减轻与图像分类相比最有效的方法中最有效。特别是,在课堂学习中,学习知识蒸馏被证明是至关重要的工具,而在域内,学习重播方法是最有效的方法。
translated by 谷歌翻译