We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestamping them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
translated by 谷歌翻译
终身机器学习或持续学习模型试图通过在一系列任务中累积知识来逐步学习。因此,这些模型学会更好,更快。它们用于各种智能系统,这些系统必须与人类或任何动态环境互动,例如,聊天和自驾车。更少的内存方法更常用于深度神经网络,该网络可容纳从其体系结构内的任务中的传入信息。它允许他们在所有已见的任务中表现良好。这些模型患有语义漂移或可塑性稳定性困境。现有模型使用Minkowski距离措施来确定要冻结,更新或重复的哪些节点。这些距离度量不提供更好的节点分离,因为它们易受高维稀疏向量。在我们提出的方法中,我们使用角距离来评估提供更好地分离节点的个体节点中的语义漂移,从而在稳定性和可塑性之间更好地平衡。所提出的方法通过在标准数据集上保持更高的准确性来实现最先进的模型。
translated by 谷歌翻译
Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.
translated by 谷歌翻译
当在具有不同分布的数据集上不断学习时,神经网络往往会忘记以前学习的知识,这一现象被称为灾难性遗忘。数据集之间的分配更改会导致更多的遗忘。最近,基于参数 - 隔离的方法在克服遗忘时具有巨大的潜力。但是,当他们在培训过程中修复每个数据集的神经路径时,他们的概括不佳,并且在推断过程中需要数据集标签。此外,他们不支持向后的知识转移,因为它们优先于过去的数据。在本文中,我们提出了一种名为ADAPTCL的新的自适应学习方法,该方法完全重复使用并在学习的参数上生长,以克服灾难性的遗忘,并允许在不需要数据集标签的情况下进行积极的向后传输。我们提出的技术通过允许最佳的冷冻参数重复使用在相同的神经路径上生长。此外,它使用参数级数据驱动的修剪来为数据分配同等优先级。我们对MNIST变体,域和食物新鲜度检测数据集进行了广泛的实验,而无需数据集标签。结果表明,我们所提出的方法优于替代基线,可以最大程度地减少遗忘和实现积极的向后知识转移。
translated by 谷歌翻译
持续学习的目标(CL)是随着时间的推移学习不同的任务。与CL相关的主要Desiderata是在旧任务上保持绩效,利用后者来改善未来任务的学习,并在培训过程中引入最小的开销(例如,不需要增长的模型或再培训)。我们建议通过固定密度的稀疏神经网络来解决这些避难所的神经启发性塑性适应(NISPA)体系结构。 NISPA形成了稳定的途径,可以从较旧的任务中保存知识。此外,NISPA使用连接重新设计来创建新的塑料路径,以重用有关新任务的现有知识。我们对EMNIST,FashionMnist,CIFAR10和CIFAR100数据集的广泛评估表明,NISPA的表现明显胜过代表性的最先进的持续学习基线,并且与盆地相比,它的可学习参数最多少了十倍。我们还认为稀疏是持续学习的重要组成部分。 NISPA代码可在https://github.com/burakgurbuz97/nispa上获得。
translated by 谷歌翻译
在持续学习中使用神经网络中的任务特定组件(CL)是一种令人信服的策略,可以解决固定容量模型中稳定性 - 塑性困境,而无需访问过去的数据。当前方法仅着重于选择一个新任务的子网络,以减少忘记过去任务。但是,这种选择可能会限制有助于将来学习的相关过去知识的前瞻性转移。我们的研究表明,当统一的分类器用于所有类别的任务课程学习(class-il)时,共同满足这两个目标是更具挑战性的,因为这很容易跨越任务之间的类之间的歧义。此外,当跨任务的课程相似性增加时,挑战就会增加。为了应对这一挑战,我们提出了一种名为AFAF的新CL方法,旨在避免忘记并允许使用Fix-apainality模型在IL类中向前转移。 AFAF分配了一个子网络,该子网络可以选择性地转移相关知识到新任务,同时保留过去的知识,重复一些先前分配的组件以利用固定容量,并在存在相似之处时解决类型。该实验表明,AFAF在为模型提供多种CL所需属性方面的有效性,同时在具有不同语义相似性的各种具有挑战性的基准上优于最先进的方法。
translated by 谷歌翻译
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.
translated by 谷歌翻译
When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
Graph learning is a popular approach for performing machine learning on graph-structured data. It has revolutionized the machine learning ability to model graph data to address downstream tasks. Its application is wide due to the availability of graph data ranging from all types of networks to information systems. Most graph learning methods assume that the graph is static and its complete structure is known during training. This limits their applicability since they cannot be applied to problems where the underlying graph grows over time and/or new tasks emerge incrementally. Such applications require a lifelong learning approach that can learn the graph continuously and accommodate new information whilst retaining previously learned knowledge. Lifelong learning methods that enable continuous learning in regular domains like images and text cannot be directly applied to continuously evolving graph data, due to its irregular structure. As a result, graph lifelong learning is gaining attention from the research community. This survey paper provides a comprehensive overview of recent advancements in graph lifelong learning, including the categorization of existing methods, and the discussions of potential applications and open research problems.
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
受到正规彩票假说(RLTH)的启发,该假说假设在密集网络中存在平稳(非二进制)子网,以实现密集网络的竞争性能,我们提出了几个播放类增量学习(FSCIL)方法。 to as \ emph {soft-subnetworks(softnet)}。我们的目标是逐步学习一系列会议,每个会议在每个课程中只包含一些培训实例,同时保留了先前学到的知识。软网络在基本训练会议上共同学习模型权重和自适应非二进制软面具,每个面具由主要和次要子网组成;前者的目的是最大程度地减少训练期间的灾难性遗忘,而后者的目的是避免在每个新培训课程中过度拟合一些样本。我们提供了全面的经验验证,表明我们的软网络通过超越基准数据集的最先进基准的性能来有效地解决了几个弹药的学习问题。
translated by 谷歌翻译
像人类一样自然而然地处理和保留新信息的能力是在训练神经网络时受到极大追捧的壮举。不幸的是,传统优化算法通常需要在培训时间和更新WRT期间可用的大量数据。培训过程完成后,新数据很难。实际上,当出现新数据或任务时,由于神经网络容易遭受灾难性遗忘,因此可能会丢失先前的进展。灾难性遗忘描述了当神经网络在获得新信息时完全忘记以前的知识时,这种现象。我们提出了一种新颖的培训算法,称为培训,通过解释我们利用层面相关性传播的方式,以保留神经网络在培训新数据时已经在先前任务中学习的信息。该方法在一系列基准数据集以及更复杂的数据上进行评估。我们的方法不仅成功地保留了神经网络中旧任务的知识,而且比其他最先进的解决方案更有效地进行了资源。
translated by 谷歌翻译
我们提出了一种模块化方法,将深神经网络(DNN)分解成小模块,从功能透视中重新编译到一些其他任务的新模型中。预计分解模块由于其体积小而具有可解释性和可验证性的优点。与基于重用模型的现有研究相比,涉及再培训的重复模型,例如传输学习模型,所提出的方法不需要再培训并且具有广泛的适用性,因为它可以容易地与现有的功能模块组合。所提出的方法利用重量掩模提取模块,可以应用于任意DNN。与现有研究不同,它不需要对网络架构的假设。要提取模块,我们设计了一种学习方法和损耗功能,可以最大化模块之间的共享权重。结果,可以重新编码提取的模块而不会大大增加。我们证明所提出的方法可以通过在模块之间共享重量来分解和重​​新测试具有高压缩比和高精度的DNN,并且优于现有方法。
translated by 谷歌翻译
人类在整个生命周期中不断学习,通过积累多样化的知识并为未来的任务进行微调。当出现类似目标时,神经网络会遭受灾难性忘记,在学习过程中跨顺序任务跨好任务的数据分布是否不固定。解决此类持续学习(CL)问题的有效方法是使用超网络为目标网络生成任务依赖权重。但是,现有基于超网的方法的持续学习性能受到整个层之间权重的独立性的假设,以维持参数效率。为了解决这一限制,我们提出了一种新颖的方法,该方法使用依赖关系保留超网络来为目标网络生成权重,同时还保持参数效率。我们建议使用基于复发的神经网络(RNN)的超网络,该网络可以有效地生成层权重,同时允许在它们的依赖关系中。此外,我们为基于RNN的超网络提出了新颖的正则化和网络增长技术,以进一步提高持续的学习绩效。为了证明所提出的方法的有效性,我们对几个图像分类持续学习任务和设置进行了实验。我们发现,基于RNN HyperNetworks的建议方法在所有这些CL设置和任务中都优于基准。
translated by 谷歌翻译
可扩展的网络已经证明了它们在处理灾难性遗忘问题方面的优势。考虑到不同的任务可能需要不同的结构,最近的方法设计了通过复杂技能适应不同任务的动态结构。他们的例程是首先搜索可扩展的结构,然后训练新任务,但是,这将任务分为多个培训阶段,从而导致次优或过度计算成本。在本文中,我们提出了一个名为E2-AEN的端到端可训练的可自适应扩展网络,该网络动态生成了新任务的轻量级结构,而没有任何精确的先前任务下降。具体而言,该网络包含一个功能强大的功能适配器的序列,用于扩大以前学习的表示新任务的表示形式,并避免任务干扰。这些适配器是通过基于自适应门的修剪策略来控制的,该策略决定是否可以修剪扩展的结构,从而根据新任务的复杂性动态地改变网络结构。此外,我们引入了一种新颖的稀疏激活正则化,以鼓励模型学习具有有限参数的区分特征。 E2-aen可以降低成本,并且可以以端到端的方式建立在任何饲喂前架构上。关于分类(即CIFAR和VDD)和检测(即可可,VOC和ICCV2021 SSLAD挑战)的广泛实验证明了提出的方法的有效性,从而实现了新的出色结果。
translated by 谷歌翻译
This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and largescale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/ arunmallya/packnet
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
AI的一个关键挑战是构建体现的系统,该系统在动态变化的环境中运行。此类系统必须适应更改任务上下文并持续学习。虽然标准的深度学习系统实现了最先进的静态基准的结果,但它们通常在动态方案中挣扎。在这些设置中,来自多个上下文的错误信号可能会彼此干扰,最终导致称为灾难性遗忘的现象。在本文中,我们将生物学启发的架构调查为对这些问题的解决方案。具体而言,我们表明树突和局部抑制系统的生物物理特性使网络能够以特定于上下文的方式动态限制和路由信息。我们的主要贡献如下。首先,我们提出了一种新颖的人工神经网络架构,该架构将活跃的枝形和稀疏表示融入了标准的深度学习框架中。接下来,我们在需要任务的适应性的两个单独的基准上研究这种架构的性能:Meta-World,一个机器人代理必须学习同时解决各种操纵任务的多任务强化学习环境;和一个持续的学习基准,其中模型的预测任务在整个训练中都会发生变化。对两个基准的分析演示了重叠但不同和稀疏的子网的出现,允许系统流动地使用最小的遗忘。我们的神经实现标志在单一架构上第一次在多任务和持续学习设置上取得了竞争力。我们的研究揭示了神经元的生物学特性如何通知深度学习系统,以解决通常不可能对传统ANN来解决的动态情景。
translated by 谷歌翻译
已经证明,深度神经网络的表现优于传统机器学习。但是,深网缺乏普遍性,也就是说,它们的性能不如由于域移动而从不同分布中绘制的新(测试)集中的表现。为了解决这一已知问题,已经提出了几种转移学习方法,其中训练有素的模型的知识被转移到另一个转移中,以通过不同的数据提高性能。但是,这些方法中的大多数都需要额外的培训步骤,或者它们遭受灾难性的遗忘,而训练有素的模型已经覆盖了以前学习的知识。我们采用使用网络聚合的新型转移学习方法来解决这两个问题。我们在统一框架中与聚合网络一起训练数据集特定网络。损失函数包括两个主要组成部分:特定于任务的损失(例如跨凝性)和聚合损失。提出的聚合损失使我们的模型可以了解如何通过聚合操作员聚集经过训练的深网参数。我们证明了所提出的方法在测试时间学习模型聚集,而无需进一步的训练步骤,从而减少了转移学习的负担为简单的算术操作。提出的方法达到了可比的性能W.R.T.基线。此外,如果聚合操作员有逆,我们将证明我们的模型还可以固有地允许选择性遗忘,即,聚合模型可以忘记训练它的数据集之一,并保留其他信息。
translated by 谷歌翻译