持续学习(CL,有时也称为增量学习)是机器学习的一种味道,在该口味中,通常会放松或省略固定数据分布的通常假设。当天然应用时,例如CL问题中的DNNS时,数据分布的变化会导致所谓的灾难性遗忘(CF)效应:突然丧失了先前的知识。尽管近年来已经为启用CL做出了许多重大贡献,但大多数作品都解决了受监督的(分类)问题。本文回顾了在其他环境中研究CL的文献,例如通过减少监督,完全无监督的学习和强化学习的学习。除了提出一个简单的模式用于分类CL方法W.R.T.他们的自主权和监督水平,我们讨论了与每种设置相关的具体挑战以及对CL领域的潜在贡献。
translated by 谷歌翻译
经典的机器学习算法通常假设绘制数据是i.i.d的。来自固定概率分布。最近,持续学习成为机器学习的快速增长领域,在该领域中,该假设放松,即数据分布是非平稳的,并且随着时间的推移而变化。本文通过上下文变量$ c $表示数据分布的状态。 $ c $的漂移导致数据分布漂移。上下文漂移可能会改变目标分布,输入分布或两者兼而有之。此外,分布漂移可能是突然的或逐渐的。在持续学习中,环境漂移可能会干扰学习过程并擦除以前学习的知识。因此,持续学习算法必须包括处理此类漂移的专业机制。在本文中,我们旨在识别和分类不同类型的上下文漂移和潜在的假设,以更好地表征各种持续学习的场景。此外,我们建议使用分布漂移框架来提供对连续学习领域常用的几个术语的更精确的定义。
translated by 谷歌翻译
我们介绍了一项关于在增强学习(RL)方案中使用持续学习(CL)方法的实证研究,据我们所知,该方法以前尚未描述。 CL是一个非常活跃的研究主题,与非平稳数据分布下的机器学习有关。尽管这自然适用于RL,但使用专用CL方法仍然很少见。这可能是由于以下事实:CL方法通常将CL问题分解为固定分布的不结合子任务,即这些子任务的发作是已知的,并且子任务是非矛盾的。在这项研究中,我们对RL问题中选定的CL方法进行了经验比较,在RL问题中,物理模拟的机器人必须按照视力遵循赛马场。为了使CL方法适用,我们限制了RL设置,并引入了已知发作的非冲突子任务,但是,它们并不脱节,并且从学习者的角度来看,其分布仍然非平稳。我们的结果表明,与“经验重播”的基线技术相比,专用的CL方法可以显着改善学习。
translated by 谷歌翻译
在动态环境中,持续增强学习(CRL)的关键挑战是,随着环境在其生命周期的变化,同时最大程度地减少对学习的信息的灾难性忘记,随着环境在其一生中的变化而变化。为了应对这一挑战,在本文中,我们提出了Dacorl,即动态自动持续RL。 Dacorl使用渐进式上下文化学习了上下文条件条件的策略,该策略会逐步将动态环境中的一系列固定任务群集成一系列上下文,并选择一个可扩展的多头神经网络以近似策略。具体来说,我们定义了一组具有类似动力学的任务,并将上下文推理形式化为在线贝叶斯无限高斯混合物集群的过程,这些过程是在环境特征上,诉诸在线贝叶斯推断,以推断上下文的后端分布。在以前的中国餐厅流程的假设下,该技术可以将当前任务准确地分类为先前看到的上下文,或者根据需要实例化新的上下文,而无需依靠任何外部指标来提前向环境变化发出信号。此外,我们采用了可扩展的多头神经网络,其输出层与新实例化的上下文同步扩展,以及一个知识蒸馏正规化项来保留学习任务的性能。作为一个可以与各种深度RL算法结合使用的一般框架,Dacorl在稳定性,整体性能和概括能力方面具有一致的优势,而不是现有方法,这是通过对几种机器人导航和Mujoco Socomotion任务进行的广泛实验来验证的。
translated by 谷歌翻译
Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.
translated by 谷歌翻译
持续学习领域(CL)寻求开发通过与非静止环境的交互累积随时间累积知识和技能的算法。在实践中,存在一种夸张的评估程序和算法解决方案(方法),每个潜在的潜在不相交的假设集。这种品种使得在CL困难中进行了衡量进展。我们提出了一种设置的分类,其中每个设置被描述为一组假设。从这个视图中出现了一棵树形的层次结构,更多的一般环境成为具有更严格假设的人的父母。这使得可以使用继承来共享和重用研究,因为开发给定设置的方法也使其直接适用于其任何孩子。我们将此想法实例化为名为SequoIa的公开软件框架,其特征来自持续监督学习(CSL)和持续加强学习(CRL)域的各种环境。除了来自外部图书馆的更专业的方法之外,SemoIa还包括一种易于延伸和定制的不断增长的方法。我们希望这一新的范式及其第一个实施可以帮助统一和加速CL的研究。您可以通过访问github.com/lebrice/squia来帮助我们长大树。
translated by 谷歌翻译
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
translated by 谷歌翻译
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for computational systems and autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. Although significant advances have been made in domain-specific learning with neural networks, extensive research efforts are required for the development of robust lifelong learning on autonomous agents and robots. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.
translated by 谷歌翻译
深度加强学习概括(RL)的研究旨在产生RL算法,其政策概括为在部署时间进行新的未经调整情况,避免对其培训环境的过度接受。如果我们要在现实世界的情景中部署强化学习算法,那么解决这一点至关重要,那么环境将多样化,动态和不可预测。该调查是这个新生领域的概述。我们为讨论不同的概括问题提供统一的形式主义和术语,在以前的作品上建立不同的概括问题。我们继续对现有的基准进行分类,以及用于解决泛化问题的当前方法。最后,我们提供了对现场当前状态的关键讨论,包括未来工作的建议。在其他结论之外,我们认为,采取纯粹的程序内容生成方法,基准设计不利于泛化的进展,我们建议快速在线适应和将RL特定问题解决作为未来泛化方法的一些领域,我们推荐在UniTexplorated问题设置中构建基准测试,例如离线RL泛化和奖励函数变化。
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
已知应用于任务序列的标准梯度下降算法可在深层神经网络中产生灾难性遗忘。当对序列中的新任务进行培训时,该模型会在当前任务上更新其参数,从而忘记过去的知识。本文探讨了我们在有限环境中扩展任务数量的方案。这些方案由与重复数据的长期任务组成。我们表明,在这种情况下,随机梯度下降可以学习,进步并融合到根据现有文献需要持续学习算法的解决方案。换句话说,我们表明该模型在没有特定的记忆机制的情况下执行知识保留和积累。我们提出了一个新的实验框架,即Scole(缩放量表),以研究在潜在无限序列中的知识保留和算法的积累。为了探索此设置,我们对1,000个任务的序列进行了大量实验,以更好地了解这种新的设置家庭。我们还提出了对香草随机梯度下降的轻微修改,以促进这种情况下的持续学习。 SCOLE框架代表了对实用训练环境的良好模拟,并允许长序列研究收敛行为。我们的实验表明,在短方案上以前的结果不能总是推断为更长的场景。
translated by 谷歌翻译
深度神经网络的强大学习能力使强化学习者能够直接从连续环境中学习有效的控制政策。从理论上讲,为了实现稳定的性能,神经网络假设I.I.D.不幸的是,在训练数据在时间上相关且非平稳的一般强化学习范式中,输入不存在。这个问题可能导致“灾难性干扰”和性能崩溃的现象。在本文中,我们提出智商,即干涉意识深度Q学习,以减轻单任务深度加固学习中的灾难性干扰。具体来说,我们求助于在线聚类,以实现在线上下文部门,以及一个多头网络和一个知识蒸馏正规化术语,用于保留学习上下文的政策。与现有方法相比,智商基于深Q网络,始终如一地提高稳定性和性能,并通过对经典控制和ATARI任务进行了广泛的实验。该代码可在以下网址公开获取:https://github.com/sweety-dm/interference-aware-ware-deep-q-learning。
translated by 谷歌翻译
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
translated by 谷歌翻译
恶意软件(恶意软件)分类为持续学习(CL)制度提供了独特的挑战,这是由于每天收到的新样本的数量以及恶意软件的发展以利用新漏洞。在典型的一天中,防病毒供应商将获得数十万个独特的软件,包括恶意和良性,并且在恶意软件分类器的一生中,有超过十亿个样品很容易积累。鉴于问题的规模,使用持续学习技术的顺序培训可以在减少培训和存储开销方面提供可观的好处。但是,迄今为止,还没有对CL应用于恶意软件分类任务的探索。在本文中,我们研究了11种应用于三个恶意软件任务的CL技术,涵盖了常见的增量学习方案,包括任务,类和域增量学习(IL)。具体而言,使用两个现实的大规模恶意软件数据集,我们评估了CL方法在二进制恶意软件分类(domain-il)和多类恶意软件家庭分类(Task-IL和类IL)任务上的性能。令我们惊讶的是,在几乎所有情况下,持续的学习方法显着不足以使训练数据的幼稚关节重播 - 在某些情况下,将精度降低了70个百分点以上。与关节重播相比,有选择性重播20%的存储数据的一种简单方法可以实现更好的性能,占训练时间的50%。最后,我们讨论了CL技术表现出乎意料差的潜在原因,希望它激发进一步研究在恶意软件分类域中更有效的技术。
translated by 谷歌翻译
联合学习是一种新颖的框架,允许多个设备或机构在保留其私有数据时协同地培训机器学习模型。这种分散的方法易于遭受数据统计异质性的后果,无论是在不同的实体还是随着时间的推移,这可能导致缺乏会聚。为避免此类问题,在过去几年中提出了不同的方法。然而,数据可能在许多不同的方式中是异构的,并且当前的建议并不总是确定他们正在考虑的异质性的那种。在这项工作中,我们正式地分类数据统计异质性,并审查能够面对它的最显着的学习策略。与此同时,我们介绍了其他机器学习框架的方法,例如持续学习,也处理数据异质性,并且可以很容易地适应联邦学习设置。
translated by 谷歌翻译
Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
尽管人工神经网络(ANN)取得了重大进展,但其设计过程仍在臭名昭著,这主要取决于直觉,经验和反复试验。这个依赖人类的过程通常很耗时,容易出现错误。此外,这些模型通常与其训练环境绑定,而没有考虑其周围环境的变化。神经网络的持续适应性和自动化对于部署后模型可访问性的几个领域至关重要(例如,IoT设备,自动驾驶汽车等)。此外,即使是可访问的模型,也需要频繁的维护后部署后,以克服诸如概念/数据漂移之类的问题,这可能是繁琐且限制性的。当前关于自适应ANN的艺术状况仍然是研究的过早领域。然而,一种自动化和持续学习形式的神经体系结构搜索(NAS)最近在深度学习研究领域中获得了越来越多的动力,旨在提供更强大和适应性的ANN开发框架。这项研究是关于汽车和CL之间交集的首次广泛综述,概述了可以促进ANN中充分自动化和终身可塑性的不同方法的研究方向。
translated by 谷歌翻译
我们研究深度神经网络中不同的输出层如何学习并忘记在持续的学习环境中。以下三个因素可能会影响输出层中的灾难性忘记:(1)权重修改,(2)干扰和(3)投影漂移。在本文中,我们的目标是提供更多关于如何改变输出层可以解决(1)和(2)的洞察。在几个连续学习情景中提出并评估了这些问题的一些潜在解决方案。我们表明,最佳执行类型的输出层取决于数据分布漂移和/或可用数据量。特别地,在某些情况下,在某些情况下,标准线性层将失败,结果改变参数化是足够的,以便实现显着更好的性能,从而引入持续学习算法,而是使用标准SGD训练模型。我们的分析和结果在连续学习场景中输出层动态的阐明,并表明了一种选择给定场景的最佳输出层的方法。
translated by 谷歌翻译
值得信赖的强化学习算法应有能力解决挑战性的现实问题,包括{Robustly}处理不确定性,满足{安全}的限制以避免灾难性的失败,以及在部署过程中{prencepentiming}以避免灾难性的失败}。这项研究旨在概述这些可信赖的强化学习的主要观点,即考虑其在鲁棒性,安全性和概括性上的内在脆弱性。特别是,我们给出严格的表述,对相应的方法进行分类,并讨论每个观点的基准。此外,我们提供了一个前景部分,以刺激有希望的未来方向,并简要讨论考虑人类反馈的外部漏洞。我们希望这项调查可以在统一的框架中将单独的研究汇合在一起,并促进强化学习的可信度。
translated by 谷歌翻译