One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.
translated by 谷歌翻译
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC and other regularizationbased methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency. 1
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
持续学习旨在快速,不断地从一系列任务中学习当前的任务。与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据扩展(MMKDDA)提出一个名为Meta学习更新的新框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了来自情节记忆和当前任务的样品,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们对四个基准数据集的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。
translated by 谷歌翻译
Continual Learning is considered a key step toward next-generation Artificial Intelligence. Among various methods, replay-based approaches that maintain and replay a small episodic memory of previous samples are one of the most successful strategies against catastrophic forgetting. However, since forgetting is inevitable given bounded memory and unbounded tasks, how to forget is a problem continual learning must address. Therefore, beyond simply avoiding catastrophic forgetting, an under-explored issue is how to reasonably forget while ensuring the merits of human memory, including 1. storage efficiency, 2. generalizability, and 3. some interpretability. To achieve these simultaneously, our paper proposes a new saliency-augmented memory completion framework for continual learning, inspired by recent discoveries in memory completion separation in cognitive neuroscience. Specifically, we innovatively propose to store the part of the image most important to the tasks in episodic memory by saliency map extraction and memory encoding. When learning new tasks, previous data from memory are inpainted by an adaptive data generation module, which is inspired by how humans complete episodic memory. The module's parameters are shared across all tasks and it can be jointly trained with a continual learning classifier as bilevel optimization. Extensive experiments on several continual learning and image classification benchmarks demonstrate the proposed method's effectiveness and efficiency.
translated by 谷歌翻译
本文认为,连续学习方法可以通过分割多种模型的学习者的容量来利益。我们使用统计学习理论和实验分析来展示多种任务在单个型号培训时以非琐碎的方式互相交互。特定任务上的泛化误差可以随着协同任务培训,但在竞争任务训练时也可以恶化。该理论激励了我们名为Model动物园的方法,这是从升压文献的启发,增长了小型型号的集合,每个集中都在持续学习的一集中训练。我们展示了模型动物园的准确性提高了各种持续学习基准问题。
translated by 谷歌翻译
根据互补学习系统(CLS)理论〜\ cite {mcclelland1995there}在神经科学中,人类通过两个补充系统有效\ emph {持续学习}:一种快速学习系统,以海马为中心,用于海马,以快速学习细节,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验,个人体验的快速学习, ;以及位于新皮层中的缓慢学习系统,以逐步获取有关环境的结构化知识。在该理论的激励下,我们提出\ emph {dualnets}(对于双网络),这是一个一般的持续学习框架,该框架包括一个快速学习系统,用于监督从特定任务和慢速学习系统中的模式分离代表学习,用于表示任务的慢学习系统 - 不可知论的一般代表通过自我监督学习(SSL)。双网符可以无缝地将两种表示类型纳入整体框架中,以促进在深层神经网络中更好地持续学习。通过广泛的实验,我们在各种持续的学习协议上展示了双网络的有希望的结果,从标准离线,任务感知设置到具有挑战性的在线,无任务的场景。值得注意的是,在Ctrl〜 \ Cite {veniat2020202020202020202020202020202020202020202020202020202020202021- coite {ostapenko2021-continual}的基准中。此外,我们进行了全面的消融研究,以验证双nets功效,鲁棒性和可伸缩性。代码可在\ url {https://github.com/phquang/dualnet}上公开获得。
translated by 谷歌翻译
当代理在终身学习设置中遇到连续的新任务流时,它利用了从早期任务中获得的知识来帮助更好地学习新任务。在这种情况下,确定有效的知识表示成为一个具有挑战性的问题。大多数研究工作都建议将过去任务中的一部分示例存储在重播缓冲区中,将一组参数集成给每个任务,或通过引入正则化项来对参数进行过多的更新。尽管现有方法采用了一般任务无关的随机梯度下降更新规则,但我们提出了一个任务吸引的优化器,可根据任务之间的相关性调整学习率。我们通过累积针对每个任务的梯度来利用参数在更新过程中采取的方向。这些基于任务的累积梯度充当了在整个流中维护和更新的知识库。我们从经验上表明,我们提出的自适应学习率不仅说明了灾难性的遗忘,而且还允许积极的向后转移。我们还表明,在具有大量任务的复杂数据集中,我们的方法比终身学习中的几种最先进的方法更好。
translated by 谷歌翻译
持续学习旨在通过以在线学习方式利用过去获得的知识,同时能够在所有以前的任务上表现良好,从而学习一系列任务,这对人工智能(AI)系统至关重要,因此持续学习与传统学习模式相比,更适合大多数现实和复杂的应用方案。但是,当前的模型通常在每个任务上的类标签上学习一个通用表示基础,并选择有效的策略来避免灾难性的遗忘。我们假设,仅从获得的知识中选择相关且有用的零件比利用整个知识更有效。基于这一事实,在本文中,我们提出了一个新框架,名为“选择相关的在线持续学习知识(SRKOCL),该框架结合了一种额外的有效频道注意机制,以选择每个任务的特定相关知识。我们的模型还结合了经验重播和知识蒸馏,以避免灾难性的遗忘。最后,在不同的基准上进行了广泛的实验,竞争性实验结果表明,我们提出的SRKOCL是针对最先进的承诺方法。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
最近的自我监督学习方法能够学习高质量的图像表示,并通过监督方法关闭差距。但是,这些方法无法逐步获取新的知识 - 事实上,它们实际上主要仅用为具有IID数据的预训练阶段。在这项工作中,我们在没有额外的记忆或重放的情况下调查持续学习制度的自我监督方法。为防止忘记以前的知识,我们提出了功能正规化的使用。我们将表明,朴素的功能正则化,也称为特征蒸馏,导致可塑性的低可塑性,因此严重限制了连续的学习性能。为了解决这个问题,我们提出了预测的功能正则化,其中一个单独的投影网络确保新学习的特征空间保留了先前的特征空间的信息,同时允许学习新功能。这使我们可以防止在保持学习者的可塑性时忘记。针对应用于自我监督的其他增量学习方法的评估表明我们的方法在不同场景和多个数据集中获得竞争性能。
translated by 谷歌翻译
大多数元学习方法都假设存在于可用于基本知识的情节元学习的一组非常大的标记数据。这与更现实的持续学习范例形成对比,其中数据以包含不相交类的任务的形式逐步到达。在本文中,我们考虑了这个增量元学习(IML)的这个问题,其中类在离散任务中逐步呈现。我们提出了一种方法,我们调用了IML,我们称之为eCISODIC重播蒸馏(ERD),该方法将来自当前任务的类混合到当前任务中,当研究剧集时,来自先前任务的类别示例。然后将这些剧集用于知识蒸馏以最大限度地减少灾难性的遗忘。四个数据集的实验表明ERD超越了最先进的。特别是,在一次挑战的单次次数较挑战,长任务序列增量元学习场景中,我们将IML和联合训练与当前状态的3.5%/ 10.1%/ 13.4%之间的差距降低我们在Diered-ImageNet / Mini-ImageNet / CIFAR100上分别为2.6%/ 2.9%/ 5.0%。
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
持续学习研究的主要重点领域是通过设计新算法对分布变化更强大的新算法来减轻神经网络中的“灾难性遗忘”问题。尽管持续学习文献的最新进展令人鼓舞,但我们对神经网络的特性有助于灾难性遗忘的理解仍然有限。为了解决这个问题,我们不关注持续的学习算法,而是在这项工作中专注于模型本身,并研究神经网络体系结构对灾难性遗忘的“宽度”的影响,并表明宽度在遗忘遗产方面具有出人意料的显着影响。为了解释这种效果,我们从各个角度研究网络的学习动力学,例如梯度正交性,稀疏性和懒惰的培训制度。我们提供了与不同架构和持续学习基准之间的经验结果一致的潜在解释。
translated by 谷歌翻译
灾难性的遗忘是阻碍在持续学习环境中部署深度学习算法的一个重大问题。已经提出了许多方法来解决灾难性的遗忘问题,在学习新任务时,代理商在旧任务中失去了其旧任务的概括能力。我们提出了一项替代策略,可以通过知识合并(CFA)处理灾难性遗忘,该策略从多个专门从事以前任务的多个异构教师模型中学习了学生网络,并可以应用于当前的离线方法。知识融合过程以单头方式进行,只有选定数量的记忆样本,没有注释。教师和学生不需要共享相同的网络结构,可以使异质任务适应紧凑或稀疏的数据表示。我们将我们的方法与不同策略的竞争基线进行比较,证明了我们的方法的优势。
translated by 谷歌翻译
从一系列任务中学习一生对于人为一般情报的代理至关重要。这要求代理商不断学习和记住没有干扰的新知识。本文首先展示了使用神经网络的终身学习的基本问题,命名为Anterograde忘记,即保留和转移记忆可能会抑制新知识的学习。这归因于,由于它不断记住历史知识,因此神经网络的学习能力将减少,并且可能发生概念混淆的事实,因为它转移到当前任务的无关旧知识。这项工作提出了一个名为循环内存网络(CMN)的一般框架,以解决终身学习神经网络中的伪造遗忘。 CMN由两个单独的存储器网络组成,用于存储短期和长期存储器以避免容量收缩。传输单元被设计为连接这两个存储器网络,使得从长期存储器网络的知识转移到短期内存网络以减轻概念混淆,并且开发了存储器整合机制以将短期知识集成到其中知识累积的长期记忆网络。实验结果表明,CMN可以有效地解决了在几个与任务相关的,任务冲突,类增量和跨域基准测试中忘记的伪造遗忘。
translated by 谷歌翻译
随着时间的流逝,不断扩大知识并利用其快速推广到新任务的能力是人类语言智能的关键特征。然而,现有对新任务进行快速概括的模型(例如,很少的学习方法)主要是在固定数据集中的单个镜头中训练,无法动态扩展其知识;虽然不断学习算法并非专门设计用于快速概括。我们提出了一种新的学习设置,对几杆学习者(CLIF)的持续学习,以应对统一设置的两个学习设置的挑战。 CLIF假设模型从依次到达的一系列不同的NLP任务中学习,从而积累了知识,以改善对新任务的概括,同时还保留了较早所学的任务的性能。我们研究了在持续学习设置中如何影响概括能力,评估许多持续学习算法,并提出一种新型的正则适配器生成方法。我们发现,灾难性的遗忘影响着概括能力的程度远低于所见任务的表现。虽然持续学习算法仍然可以为概括能力带来可观的好处。
translated by 谷歌翻译
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
translated by 谷歌翻译
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.
translated by 谷歌翻译