在本文中,我们希望介绍有关克服神经网络中灾难性遗忘方法的某些问题的经验验证结果。首先,在引言中,我们将尝试详细描述灾难性遗忘的问题,并为那些尚不熟悉该主题的人克服它的方法。然后,我们将讨论我们在先前论文中提出的WVA方法的本质和局限性。此外,我们将介绍将WVA方法应用于权重梯度或优化步骤的问题,选择此方法中的最佳衰减功能,并根据顺序中的任务数量选择该方法的最佳超参数培训神经网络。
translated by 谷歌翻译
While deep learning has led to remarkable advances across diverse applications, it struggles in domains where the data distribution changes over the course of learning. In stark contrast, biological neural networks continually adapt to changing domains, possibly by leveraging complex molecular machinery to solve many tasks simultaneously. In this study, we introduce intelligent synapses that bring some of this biological complexity into artificial neural networks. Each synapse accumulates task relevant information over time, and exploits this information to rapidly store new memories without forgetting old ones. We evaluate our approach on continual learning of classification tasks, and show that it dramatically reduces forgetting while maintaining computational efficiency.
translated by 谷歌翻译
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
translated by 谷歌翻译
已知生物制剂在他们的生活过程中学习许多不同的任务,并且能够重新审视以前的任务和行为,而没有表现不损失。相比之下,人工代理容易出于“灾难性遗忘”,在以前任务上的性能随着所获取的新的任务而恶化。最近使用该方法通过鼓励参数保持接近以前任务的方法来解决此缺点。这可以通过(i)使用特定的参数正常数来完成,该参数正常数是在参数空间中映射合适的目的地,或(ii)通过将渐变投影到不会干扰先前任务的子空间来指导优化旅程。然而,这些方法通常在前馈和经常性神经网络中表现出子分子表现,并且经常性网络对支持生物持续学习的神经动力学研究感兴趣。在这项工作中,我们提出了自然的持续学习(NCL),一种统一重量正则化和预测梯度下降的新方法。 NCL使用贝叶斯重量正常化来鼓励在收敛的所有任务上进行良好的性能,并将其与梯度投影结合使用先前的精度,这可以防止在优化期间陷入灾难性遗忘。当应用于前馈和经常性网络中的连续学习问题时,我们的方法占据了标准重量正则化技术和投影的方法。最后,训练有素的网络演变了特定于任务特定的动态,这些动态被认为是学习的新任务,类似于生物电路中的实验结果。
translated by 谷歌翻译
持续学习依次解决学习不同任务的设置。尽管以前的许多解决方案,但大多数仍然遭受重大忘记或昂贵的记忆成本。在这项工作中,针对这些问题,我们首先通过信息理论的镜头来研究持续学习过程,并观察到在学习时从前一个任务中的参数丢失的遗忘。新任务。从这个角度来看,我们提出了一种名为位级信息保留(BLIP)的新的连续学习方法,其通过更新位电平的参数来保留模型参数的信息增益,这可以用参数量化方便地实现。更具体地,BLIP首先列举具有对新输入任务的权重量化的神经网络,然后估计由任务数据提供的每个参数上的信息增益,以确定要冻结的比特以防止遗忘。我们进行广泛的实验,从分类任务到加强学习任务,结果表明,我们的方法更好地生成了与以前最先进的结果相比的结果。实际上,昙花一现接近零忘记,同时只需要在连续学习中需要恒定的记忆开销。
translated by 谷歌翻译
本文研究了在连续学习框架中使用分类网络的固定架构培训深度学习模型的优化算法的新设计。训练数据是非平稳的,非平稳性是由一系列不同的任务施加的。我们首先分析了一个仅在隔离的学习任务的深层模型,并在网络参数空间中识别一个区域,其中模型性能接近恢复的最佳。我们提供的经验证据表明该区域类似于沿收敛方向扩展的锥体。我们研究了融合后优化器轨迹的主要方向,并表明沿着一些顶级主要方向旅行可以迅速将参数带到锥体之外,但其余方向并非如此。我们认为,当参数被限制以保持在训练过程中迄今为止遇到的单个任务的相交中,可以缓解持续学习环境中的灾难性遗忘。基于此观察结果,我们介绍了我们的方向约束优化(DCO)方法,在每个任务中,我们引入一个线性自动编码器以近似其相应的顶部禁止主要方向。然后将它们以正规化术语的形式合并到损失函数中,以便在不忘记的情况下学习即将到来的任务。此外,为了随着任务数量的增加而控制内存的增长,我们提出了一种称为压缩DCO(DCO-comp)的算法的内存效率版本,该版本为存储所有自动编码器的固定大小分配了存储器。我们从经验上证明,与其他基于最新正规化的持续学习方法相比,我们的算法表现出色。
translated by 谷歌翻译
持续学习研究的主要重点领域是通过设计新算法对分布变化更强大的新算法来减轻神经网络中的“灾难性遗忘”问题。尽管持续学习文献的最新进展令人鼓舞,但我们对神经网络的特性有助于灾难性遗忘的理解仍然有限。为了解决这个问题,我们不关注持续的学习算法,而是在这项工作中专注于模型本身,并研究神经网络体系结构对灾难性遗忘的“宽度”的影响,并表明宽度在遗忘遗产方面具有出人意料的显着影响。为了解释这种效果,我们从各个角度研究网络的学习动力学,例如梯度正交性,稀疏性和懒惰的培训制度。我们提供了与不同架构和持续学习基准之间的经验结果一致的潜在解释。
translated by 谷歌翻译
灾难性忘记破坏了深神网络(DNN)在诸如持续学习和终身学习等方案中的有效性。尽管已经提出了几种解决这个问题的方法,但有限的工作解释了为什么这些方法效果很好。本文的目的是更好地解释一种避免灾难性遗忘的普遍使用的技术:二次正则化。我们表明,二次正规化器可以通过在每次训练迭代时插值当前和先前的值来忘记过去的任务。在多次训练迭代中,这种插值操作降低了更重要的模型参数的学习率,从而最大程度地减少了它们的运动。我们的分析还揭示了二次正则化的两个缺点:(a)参数插值对训练超参数的依赖性通常会导致训练不稳定性,并且(b)(b)将较低的重要性分配到更深的层,这通常是DNNS中遗忘的地方。通过对操作顺序的简单修改,我们表明可以轻松避免这些缺点,从而在4.5%降低平均遗忘时的平均准确度增加6.2 \%。我们通过在不同的环境中培训2000多个模型来确认结果的鲁棒性。可在\ url {https://github.com/ekdeepslubana/qrforgetting}上获得代码
translated by 谷歌翻译
我们引入了一个新的培训范式,该范围对神经网络参数空间进行间隔约束以控制遗忘。当代持续学习(CL)方法从一系列数据流有效地培训神经网络,同时减少灾难性遗忘的负面影响,但它们不能提供任何确保的确保网络性能不会随着时间的流逝而无法控制地恶化。在这项工作中,我们展示了如何通过将模型的持续学习作为其参数空间的持续收缩来遗忘。为此,我们提出了Hypertrectangle训练,这是一种新的训练方法,其中每个任务都由参数空间中的超矩形表示,完全包含在先前任务的超矩形中。这种配方将NP-HARD CL问题降低到多项式时间,同时提供了完全防止遗忘的弹性。我们通过开发Intercontinet(间隔持续学习)算法来验证我们的主张,该算法利用间隔算术来有效地将参数区域建模为高矩形。通过实验结果,我们表明我们的方法在不连续的学习设置中表现良好,而无需存储以前的任务中的数据。
translated by 谷歌翻译
The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity. However, finding the right permutation is challenging, and current optimization techniques are not differentiable, which makes it difficult to integrate into a gradient-based optimization, and often leads to sub-optimal solutions. In this paper, we propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective. Unlike the current state-of-art, our method is differentiable and, therefore, easy to adapt to any task within the deep learning domain. Furthermore, we show the advantage of our re-basin method by proposing a new cost function that allows performing incremental learning by exploiting the linear mode connectivity property. The benefit of our method is compared against similar approaches from the literature, under several conditions for both optimal transport finding and linear mode connectivity. The effectiveness of our continual learning method based on re-basin is also shown for several common benchmark datasets, providing experimental results that are competitive with state-of-art results from the literature.
translated by 谷歌翻译
预训练的代表是现代深度学习成功的关键要素之一。但是,现有的关于持续学习方法的作品主要集中在从头开始逐步学习学习模型。在本文中,我们探讨了一个替代框架,以逐步学习,我们不断从预训练的表示中微调模型。我们的方法利用了预训练的神经网络的线性化技术来进行简单有效的持续学习。我们表明,这使我们能够设计一个线性模型,其中将二次参数正则方法作为最佳持续学习策略,同时享受神经网络的高性能。我们还表明,所提出的算法使参数正则化方法适用于类新问题。此外,我们还提供了一个理论原因,为什么在接受跨凝结损失训练的神经网络上,现有的参数空间正则化算法(例如EWC表现不佳)。我们表明,提出的方法可以防止忘记,同时在图像分类任务上实现高连续的微调性能。为了证明我们的方法可以应用于一般的持续学习设置,我们评估了我们在数据收入,任务收入和课堂学习问题方面的方法。
translated by 谷歌翻译
当在具有不同分布的数据集上不断学习时,神经网络往往会忘记以前学习的知识,这一现象被称为灾难性遗忘。数据集之间的分配更改会导致更多的遗忘。最近,基于参数 - 隔离的方法在克服遗忘时具有巨大的潜力。但是,当他们在培训过程中修复每个数据集的神经路径时,他们的概括不佳,并且在推断过程中需要数据集标签。此外,他们不支持向后的知识转移,因为它们优先于过去的数据。在本文中,我们提出了一种名为ADAPTCL的新的自适应学习方法,该方法完全重复使用并在学习的参数上生长,以克服灾难性的遗忘,并允许在不需要数据集标签的情况下进行积极的向后传输。我们提出的技术通过允许最佳的冷冻参数重复使用在相同的神经路径上生长。此外,它使用参数级数据驱动的修剪来为数据分配同等优先级。我们对MNIST变体,域和食物新鲜度检测数据集进行了广泛的实验,而无需数据集标签。结果表明,我们所提出的方法优于替代基线,可以最大程度地减少遗忘和实现积极的向后知识转移。
translated by 谷歌翻译
Interacting with a complex world involves continual learning, in which tasks and data distributions change over time. A continual learning system should demonstrate both plasticity (acquisition of new knowledge) and stability (preservation of old knowledge). Catastrophic forgetting is the failure of stability, in which new experience overwrites previous experience. In the brain, replay of past experience is widely believed to reduce forgetting, yet it has been largely overlooked as a solution to forgetting in deep reinforcement learning. Here, we introduce CLEAR, a replay-based method that greatly reduces catastrophic forgetting in multi-task reinforcement learning. CLEAR leverages off-policy learning and behavioral cloning from replay to enhance stability, as well as on-policy learning to preserve plasticity. We show that CLEAR performs better than state-of-the-art deep learning techniques for mitigating forgetting, despite being significantly less complicated and not requiring any knowledge of the individual tasks being learned.
translated by 谷歌翻译
机器学习中的一个重要问题是能够以顺序方式学习任务。如果有标准的一阶方法培训大多数模型忘记了在新任务上培训时忘记了先前学习的任务,这通常被称为灾难性遗忘。一种流行的克服遗忘方法是通过惩罚在以前任务上的模型来规范损失函数。例如,弹性重量整合(EWC)用二次形式正规,涉及基于过去数据的对角线矩阵构建。虽然EWC对于一些设置工作非常好,但即使在另外理想的条件下,如果对角线矩阵是先前任务的Hessian矩阵的近似近似,它也可以证明灾难性遗忘。我们提出了一种简单的方法来克服这一点:正规规范了与过去数据矩阵的草图草图的新任务的培训。这可以通过内存成本可提供克服灾难忘记线性模型和宽神经网络的灾难性忘记。本文的总体目标是在基于正规化的连续学习算法和内存成本下提供有关时的见解。
translated by 谷歌翻译
Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [7]) and Path Integral [26] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
二阶优化器被认为具有加快神经网络训练的潜力,但是由于曲率矩阵的尺寸巨大,它们通常需要近似值才能计算。最成功的近似家庭是Kronecker因块状曲率估计值(KFAC)。在这里,我们结合了先前工作的工具,以评估确切的二阶更新和仔细消融以建立令人惊讶的结果:由于其近似值,KFAC与二阶更新无关,尤其是,它极大地胜过真实的第二阶段更新。订单更新。这一挑战广泛地相信,并立即提出了为什么KFAC表现如此出色的问题。为了回答这个问题,我们提出了强烈的证据,表明KFAC近似于一阶算法,该算法在神经元上执行梯度下降而不是权重。最后,我们表明,这种优化器通常会在计算成本和数据效率方面改善KFAC。
translated by 谷歌翻译
为了检测现有的隐志算法,最近的切解方法通常在数据集上训练卷积神经网络(CNN)模型,该模型由相应的配对盖/stego图像组成。但是,对于那些切断的工具,完全重新训练CNN模型以使其对现有的隐志算法和新的新出现的隐志算法有效,这是无效和不切实际的。因此,现有的切解模型通常缺乏新的隐志算法的动态扩展性,这限制了其在现实情况下的应用。为了解决这个问题,我们建议基于切解分析的基于基于的参数重要性估计(APIE)学习方案。在此方案中,当对新的截然算法生成的新图像数据集进行训练时,其网络参数将有效,有效地更新,并充分考虑其在先前的培训过程中评估其重要性。这种方法可以指导切解模型来学习新的隐志算法的模式,而不会显着降低针对先前的横向志算法的可检测性。实验结果表明,提出的方案具有新兴新兴志志算法的可扩展性。
translated by 谷歌翻译