已经证明,深度神经网络的表现优于传统机器学习。但是,深网缺乏普遍性,也就是说,它们的性能不如由于域移动而从不同分布中绘制的新(测试)集中的表现。为了解决这一已知问题,已经提出了几种转移学习方法,其中训练有素的模型的知识被转移到另一个转移中,以通过不同的数据提高性能。但是,这些方法中的大多数都需要额外的培训步骤,或者它们遭受灾难性的遗忘,而训练有素的模型已经覆盖了以前学习的知识。我们采用使用网络聚合的新型转移学习方法来解决这两个问题。我们在统一框架中与聚合网络一起训练数据集特定网络。损失函数包括两个主要组成部分:特定于任务的损失(例如跨凝性)和聚合损失。提出的聚合损失使我们的模型可以了解如何通过聚合操作员聚集经过训练的深网参数。我们证明了所提出的方法在测试时间学习模型聚集,而无需进一步的训练步骤,从而减少了转移学习的负担为简单的算术操作。提出的方法达到了可比的性能W.R.T.基线。此外,如果聚合操作员有逆,我们将证明我们的模型还可以固有地允许选择性遗忘,即,聚合模型可以忘记训练它的数据集之一,并保留其他信息。
translated by 谷歌翻译
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestamping them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
translated by 谷歌翻译
This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and largescale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/ arunmallya/packnet
translated by 谷歌翻译
There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to perform well uniformly.
translated by 谷歌翻译
持续学习背后的主流范例一直在使模型参数调整到非静止数据分布,灾难性遗忘是中央挑战。典型方法在测试时间依赖排练缓冲区或已知的任务标识,以检索学到的知识和地址遗忘,而这项工作呈现了一个新的范例,用于持续学习,旨在训练更加简洁的内存系统而不在测试时间访问任务标识。我们的方法学会动态提示(L2P)预先训练的模型,以在不同的任务转换下顺序地学习任务。在我们提出的框架中,提示是小型可学习参数,这些参数在内存空间中保持。目标是优化提示,以指示模型预测并明确地管理任务不变和任务特定知识,同时保持模型可塑性。我们在流行的图像分类基准下进行全面的实验,具有不同挑战的持续学习环境,其中L2P始终如一地优于现有最先进的方法。令人惊讶的是,即使没有排练缓冲区,L2P即使没有排练缓冲,L2P也能实现竞争力的结果,并直接适用于具有挑战性的任务不可行的持续学习。源代码在https://github.com/google-Research/l2p中获得。
translated by 谷歌翻译
When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.
translated by 谷歌翻译
Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.
translated by 谷歌翻译
最近的自我监督学习方法能够学习高质量的图像表示,并通过监督方法关闭差距。但是,这些方法无法逐步获取新的知识 - 事实上,它们实际上主要仅用为具有IID数据的预训练阶段。在这项工作中,我们在没有额外的记忆或重放的情况下调查持续学习制度的自我监督方法。为防止忘记以前的知识,我们提出了功能正规化的使用。我们将表明,朴素的功能正则化,也称为特征蒸馏,导致可塑性的低可塑性,因此严重限制了连续的学习性能。为了解决这个问题,我们提出了预测的功能正则化,其中一个单独的投影网络确保新学习的特征空间保留了先前的特征空间的信息,同时允许学习新功能。这使我们可以防止在保持学习者的可塑性时忘记。针对应用于自我监督的其他增量学习方法的评估表明我们的方法在不同场景和多个数据集中获得竞争性能。
translated by 谷歌翻译
我们提出了一种有效的正则化战略(CW-TALAR),用于解决持续的学习问题。它使用由在由所有任务共享的底层神经网络的目标层上定义的两个概率分布之间的校准术语,该概率分布在由所有任务共享的底层神经网络的目标层,以及用于建模输出数据表示的克拉米 - WOLD发生器的简单架构。我们的策略在学习新任务时保留了目标层分发,但不需要记住以前的任务的数据集。我们执行涉及几个常见监督框架的实验,该框架证明了CW-TALAR方法的竞争力与一些现有的现有最先进的持续学习模型相比。
translated by 谷歌翻译
我们提出了一种模块化方法,将深神经网络(DNN)分解成小模块,从功能透视中重新编译到一些其他任务的新模型中。预计分解模块由于其体积小而具有可解释性和可验证性的优点。与基于重用模型的现有研究相比,涉及再培训的重复模型,例如传输学习模型,所提出的方法不需要再培训并且具有广泛的适用性,因为它可以容易地与现有的功能模块组合。所提出的方法利用重量掩模提取模块,可以应用于任意DNN。与现有研究不同,它不需要对网络架构的假设。要提取模块,我们设计了一种学习方法和损耗功能,可以最大化模块之间的共享权重。结果,可以重新编码提取的模块而不会大大增加。我们证明所提出的方法可以通过在模块之间共享重量来分解和重​​新测试具有高压缩比和高精度的DNN,并且优于现有方法。
translated by 谷歌翻译
终身机器学习或持续学习模型试图通过在一系列任务中累积知识来逐步学习。因此,这些模型学会更好,更快。它们用于各种智能系统,这些系统必须与人类或任何动态环境互动,例如,聊天和自驾车。更少的内存方法更常用于深度神经网络,该网络可容纳从其体系结构内的任务中的传入信息。它允许他们在所有已见的任务中表现良好。这些模型患有语义漂移或可塑性稳定性困境。现有模型使用Minkowski距离措施来确定要冻结,更新或重复的哪些节点。这些距离度量不提供更好的节点分离,因为它们易受高维稀疏向量。在我们提出的方法中,我们使用角距离来评估提供更好地分离节点的个体节点中的语义漂移,从而在稳定性和可塑性之间更好地平衡。所提出的方法通过在标准数据集上保持更高的准确性来实现最先进的模型。
translated by 谷歌翻译
人类在整个生命周期中不断学习,通过积累多样化的知识并为未来的任务进行微调。当出现类似目标时,神经网络会遭受灾难性忘记,在学习过程中跨顺序任务跨好任务的数据分布是否不固定。解决此类持续学习(CL)问题的有效方法是使用超网络为目标网络生成任务依赖权重。但是,现有基于超网的方法的持续学习性能受到整个层之间权重的独立性的假设,以维持参数效率。为了解决这一限制,我们提出了一种新颖的方法,该方法使用依赖关系保留超网络来为目标网络生成权重,同时还保持参数效率。我们建议使用基于复发的神经网络(RNN)的超网络,该网络可以有效地生成层权重,同时允许在它们的依赖关系中。此外,我们为基于RNN的超网络提出了新颖的正则化和网络增长技术,以进一步提高持续的学习绩效。为了证明所提出的方法的有效性,我们对几个图像分类持续学习任务和设置进行了实验。我们发现,基于RNN HyperNetworks的建议方法在所有这些CL设置和任务中都优于基准。
translated by 谷歌翻译
Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose a novel approach for lifelong learning, coined Memory Aware Synapses (MAS). It computes the importance of the parameters of a neural network in an unsupervised and online manner. Given a new sample which is fed to the network, MAS accumulates an importance measure for each parameter of the network, based on how sensitive the predicted output function is to a change in this parameter. When learning a new task, changes to important parameters can then be penalized, effectively preventing important knowledge related to previous tasks from being overwritten. Further, we show an interesting connection between a local version of our method and Hebb's rule, which is a model for the learning process in the brain. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding for predicting <subject, predicate, object> triplets. We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.
translated by 谷歌翻译
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
translated by 谷歌翻译
联合学习是一种新颖的框架,允许多个设备或机构在保留其私有数据时协同地培训机器学习模型。这种分散的方法易于遭受数据统计异质性的后果,无论是在不同的实体还是随着时间的推移,这可能导致缺乏会聚。为避免此类问题,在过去几年中提出了不同的方法。然而,数据可能在许多不同的方式中是异构的,并且当前的建议并不总是确定他们正在考虑的异质性的那种。在这项工作中,我们正式地分类数据统计异质性,并审查能够面对它的最显着的学习策略。与此同时,我们介绍了其他机器学习框架的方法,例如持续学习,也处理数据异质性,并且可以很容易地适应联邦学习设置。
translated by 谷歌翻译
In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process, data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision which expert to deploy at test time. We introduce a set of gating autoencoders that learn a representation for the task at hand, and, at test time, automatically forward the test sample to the relevant expert. This also brings memory efficiency as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert, with fine-tuning or learningwithout-forgetting, can be selected. We evaluate our method on image classification and video prediction problems.
translated by 谷歌翻译
尽管深度神经网络(DNNS)在封闭世界的学习方案中取得了令人印象深刻的分类性能,但它们通常无法概括地在动态的开放世界环境中看不见的类别,在这种环境中,概念数量无界的数量。相反,人类和动物学习者具有通过识别和适应新颖观察结果来逐步更新知识的能力。特别是,人类通过独家(唯一)基本特征集来表征概念,这些特征既用于识别已知类别和识别新颖性。受到自然学习者的启发,我们引入了稀疏的高级独特,低水平共享的特征表示(Shels),同时鼓励学习独家的高级功能和必不可少的,共享的低级功能。高级功能的排他性使DNN能够自动检测到分布(OOD)数据,而通过稀疏的低级功能可以有效利用容量,可以容纳新知识。最终的方法使用OOD检测来执行班级持续学习,而没有已知的类边界。我们表明,使用木材进行新颖性检测导致对各种基准数据集的最新OOD检测方法的统计显着改善。此外,我们证明了木木模型在课堂学习环境中减轻灾难性的遗忘,从而实现了一个组合的新颖性检测和住宿框架,该框架支持在开放世界中学习
translated by 谷歌翻译
模块化是持续学习(CL)的令人信服的解决方案,是相关任务建模的问题。学习和组合模块来解决不同的任务提供了一种抽象来解决CL的主要挑战,包括灾难性的遗忘,向后和向前传输跨任务以及子线性模型的增长。我们引入本地模块组成(LMC),该方法是模块化CL的方法,其中每个模块都提供了局部结构组件,其估计模块与输入的相关性。基于本地相关评分进行动态模块组合。我们展示了对任务身份(IDS)的不可知性来自(本地)结构学习,该结构学习是特定于模块和/或模型特定于以前的作品,使LMC适用于与以前的作品相比的更多CL设置。此外,LMC还跟踪输入分布的统计信息,并在检测到异常样本时添加新模块。在第一组实验中,LMC与最近的持续转移学习基准上的现有方法相比,不需要任务标识。在另一个研究中,我们表明结构学习的局部性允许LMC插入相关但未遵守的任务(OOD),以及在不同任务序列上独立于不同的任务序列培训的模块化网络,而无需任何微调。最后,在寻找LMC的限制,我们在30和100个任务的更具挑战性序列上研究它,展示了本地模块选择在存在大量候选模块时变得更具挑战性。在此设置中,与Oracle基准的基线相比,最佳执行LMC产生的模块更少,但它达到了较低的总体精度。 CodeBase可在https://github.com/oleksost/lmc下找到。
translated by 谷歌翻译
Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting.Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of the hippocampus as a short-term memory system in primate brain, we propose the Deep Generative Replay, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"). With only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. We test our methods in several sequential learning settings involving image classification tasks.
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译