Beyond Supervised Continual Learning: a Review

Benedikt Bagus

^{1}

Alexander Gepperth

^{1}

and Timothée Lesort

^{2}

^{1}

University of Applied Sciences Fulda - Department of Computer Science
Leipzigerstraße 123 36037 Fulda - Germany

^{2}

Montréal University (UdeM) Mila
6666 St-Urbain Street Montréal QC - Canada

Abstract

Continual Learning (CL, sometimes also termed incremental learning) is a flavor of machine learning where the usual assumption of stationary data distribution is relaxed or omitted. When naively applying, e.g., DNNs in CL problems, changes in the data distribution can cause the so-called catastrophic forgetting (CF) effect: an abrupt loss of previous knowledge. Although many significant contributions to enabling CL have been made in recent years, most works address supervised (classification) problems. This article reviews literature that study CL in other settings, such as learning with reduced supervision, fully unsupervised learning, and reinforcement learning. Besides proposing a simple schema for classifying CL approaches w.r.t. their level of autonomy and supervision, we discuss the specific challenges associated with each setting and the potential contributions to the field of CL in general.

1 Introduction

Continual learning is a field of machine learning where the data distribution is not static. It is a natural framework for many practical problems where the data arrives progressively, and the model learns continuously. For example, in robotics, robots need to adapt to their environments to interact and realize actions constantly, or recommendation systems also need to adapt constantly to the new content available and the new needs of users. However, in recent years, the field of continual learning has focused mainly on one type of classification scenario: class-incremental. This scenario evaluates how models can learn a class once and remember it when new class data arrives. While it is important to solve this problem, using only one type of scenario can lead to over-specialized solutions that cannot generalize to different settings. In this paper, we propose to review the literature dealing with other settings than the default one (class-incremental) and, more generally, fully supervised scenarios. The goal is to shed light on efforts made to diversify the evaluation of continual learning.

We introduce the continual learning framework and the goals of continual learning (Sec 2). Then, we describe the default scenario and its characteristics (Sec 3). In addition, we introduce a scenario that goes beyond the default scenario in supervised learning (Sec 4), unsupervised learning (Sec 5) and reinforcement learning (Sec 6).
Disclaimer: This article compares the differences between supervised continual learning (CL) and other settings. Each of these settings can have appropriate use cases and application fields. Therefore, the goal is not to push for a different kind of CL that is supposedly more \saynatural or \sayrealistic, but to point out that other feasible settings for CL exist, with partially overlapping challenges and solutions. Thus, we review existing literature, list commonly made assumptions, and point out remaining challenges specific to non-supervised continual learning. Moreover, benchmarking diversity is of high value if different benchmarks are built with the intent to evaluate one particular criterion (of which there are several). Benchmarks or scenarios that are not built for such purposes may contribute less to progress in CL.

2 Framework and goals of continual learning

Continual learning (CL) is a machine learning sub-field that studies learning under time-varying data distributions. This relaxes one of the fundamental assumptions of statistical learning theory [107], which states that the data follows a stationary distribution. One advantage of this assumption is its simplicity, whereas CL scenarios are very diverse, depending on the nature of the non-stationarity. In CL, it is therefore crucial to clearly define the scenario, the goals of learning, the evaluation measures, and the loss functions. The following section describes typical non-stationarities of the data distribution that have been considered in the literature (see also [28, 49]).

2.1 Data distribution drifts

CL under data distribution shifts needs memorization mechanisms adapted for the type of non-stationarity, which requires assumptions by the used algorithms in the face of an infinite number of possible ways a distribution can be non-stationary. For simplicity, we will list several typical definitions used for supervised learning (e.g., classification) but that can be generalized to other settings.

A simple way to categorize non-stationarities is based on class information. We may distinguish two types of shifts [28] in this case: concept shift, where the annotation of existing data changes [11] and virtual shift, where we get new data, but the annotation does not change. Usually, the term shift is employed for sudden changes in data distributions, whereas the term drift is used for gradual changes over time. In supervised CL, virtual shifts are the most common non-stationarities that have been studied. We can distinguish the special cases of virtual concept shift, implying new data and new labels, and domain shift, where new data of known labels are observed. Those two settings are also known as respectively class-incremental and instance-incremental [54, 106].

The objectives to be optimized may change over time as well [49], as in continual reinforcement learning [69, 41, 7].

2.2 Common CL constraints

If CL were not subject to constraints, there would be a simple solution to any scenario. It involves storing all incoming samples and re-training every time a decision is expected, [83]. This entails a time and memory complexity that is at least linear in the number of processed samples.

However, most CL proposals assume that memory is limited in some way, preventing this (obvious) solution. Many approaches similarly assume that the storage of samples is restricted. Other resources constraints subject of study are: computational cost, memory, data privacy, fast adaptation, inference speed, transfer. Other constraints that are more related to the reliability of approaches are stability and explainability. A discussion of CL constraints can be found in, e.g., [76], whereas evaluation measures that take these constraints into account are given in [54].

3 The default scenario for CL

The default scenario for CL.
The data stream is assumed to be partitioned into — Figure 1: The default scenario for CL. The data stream is assumed to be partitioned into sub-tasks defined by data and labels (targets). Data statistics within a sub-task are assumed to be stationary. In addition, sub-task data and labels are assumed to be disjoint, i.e., from different classes. The sub-task onsets are generally assumed to be known as well.

A particular supervised setting, which we will refer to as the default scenario, is currently dominating CL literature. It is based on a classification problem divided into a small number of sub-tasks. Virtual concept shift occurs abruptly at sub-task boundaries by the apparition of new samples belonging to previously unseen classes, see Fig 1. Usually, the sub-task onsets/boundaries are known. A consequence of this disjointness of sub-tasks is that no pure concept shift is involved: the annotation for a given data point will never change or be subject to conflict as learning progresses.

In the default scenario, the goal of CL is usually to learn the complete statistics of all sub-tasks as if they had been processed all at once, rather than one after the other.

Sometimes, samples are processed one by one, or all samples in a given sub-task are simultaneously available. In some works, the sub-task index is known at test time, which is used for selecting the correct head of a multi-headed DNN for inference.

The assumptions made in the default scenario are justified in many use cases. However, it is obvious that other scenarios, e.g., in robotics, may be found where they impose too severe restrictions. Moreover, many characteristics of the default scenario, such as \saydrift are abrupt or, \saytasks are not revisited lend themselves to benchmark overfitting. As an example, consider sub-task $T_{2}$ in Fig 1: here, a DNN could punish incorrect decisions for class \say $1$ more strongly than incorrect decisions for classes from the current sub-task since the default scenario assumes that sub-tasks are disjoint. Even if researchers do not consciously exploit these assumptions, the employed CL algorithms may still rely on them indirectly. It is thus fundamental to perform experiments in scenarios where these assumptions do not hold.

Thus, creating diverse benchmarks, as well as approaches that do not critically rely on the assumptions from the default scenario, should be an ongoing effort. This effort should be pushed notably by existing continual learning libraries such as Continuum [19], Avalanche [58] or Sequoia [69].

3.1 CL approaches for the default scenario

This section is not meant to be exhaustive, since the default scenario is not the focus of this article. Please refer to recent reviews [48, 8] for more details on CL methods for the default scenario.

Broad strategies for performing CL in the default scenario are regularization [46, 60], replay [86, 89, 96] and dynamic architectures [90, 23, 109, 72, 65]. Regularization penalizes changes to model parameters that are deemed important for past sub-tasks. This is usually achieved by adding penalty terms to the loss function, and it is implicitly assumed that new sub-tasks add only new data and classes. Dynamic architecture methods extend models over time in order to separate previously learned parameters from currently optimized ones, thus reducing cross-talk and catastrophic forgetting (but equally assuming that new sub-tasks contain only new data). Replay methods store received data for subsequent use in re-training (rehearsal). Instead of relying on stored data, re-training can also be performed using samples produced by generative models (generative replay).

Replay is known to be an effective method for preventing catastrophic forgetting (CF), especially in class-incremental settings [51, 48], but also for continual reinforcement learning [36, 103, 89] or unsupervised learning [50, 85, 62].

3.2 Metrics and evaluation procedures

In the default scenario, various measures related to the classification error are common, which have been discussed in, e.g., [40, 17, 63]. A common baseline is termed cumulative performance, obtained by evaluating models on the merged test sets from all sub-tasks, which corresponds to learning with stationary statistics. This baseline is often considered an upper bound for CL performance.

In addition, [60] proposed the notions of forward and backward transfer: forward transfer (FT) measures how training on sub-task $i$ impacts performance on a future sub-task $j > i$ . For backward transfer (BT), the impact on previous sub-tasks $j < i$ is considered. The common case in CL is negative BT indicating forgetting, but positive BT is theoretically possible as well.

Many authors, e.g., [46], assume that (although sub-tasks are presented sequentially) all sub-task data are available for model selection and hyper-parameter tuning. For example, tuning EWC’s regularization strengths $λ_{i}$ for each sub-task is often done in hindsight.

Some authors [95, 18, 21], especially in works using multi-head DNNs, assume that the sub-task ID is known during testing, although this does not seem to be the current consensus. In the limit where each sub-task contains only a single class, providing the sub-task ID at test time means providing the class label. Even if sub-tasks are more diverse, the sub-task ID contains significant information and may thus confer unfair advantages. The question of evaluation protocols in CL is discussed in [76, 78].

3.3 Benchmarks

Benchmarks for the default CL scenario are mostly derived from datasets such as MNIST, CIFAR10/100, Imagenet, SVHN etc. to create class-incremental or domain-incremental scenarios. The permuted MNIST benchmark, where successive sub-tasks are created by permuting all pixels according to a sub-task specific permutation scheme, was initially popular [46, 59, 113] but is less so now because it can, to good accuracy, be solved even without dedicated CL schemes [76]. Some authors used Atari games [67], Mujoco [102] or Meta-World [118] as benchmarks. CL specific variants of standard benchmarks such as, e.g., colored MNIST [43] are widely used as well since they can be used to investigate specific aspects of CL, see [25, 56].

4 Generalizations of the default scenario

The default scenario is convenient for evaluation and represents a rather controlled setting for CL. In less controlled settings, fully annotated data may not be available, or supplementary constraints may be imposed. We find it convenient to introduce a new taxonomy of CL approaches based on their level of autonomy.

4.1 Classifying the autonomy level of CL algorithms

The various applications of continual learning can be classified in autonomy levels, as for autonomous vehicles [45]. Obviously, CL should get harder as less and less human supervision is supplied. We identify two dimensions of autonomy, both of which will be discussed in-depth to characterize generalized supervised CL approaches better.

Objective Autonomy denotes the autonomy regarding the objective to achieve (labels, targets, rewards), which we group into 4 levels:

Level 0: Full data annotation: supervised.
Level 1: Sparse labelization: RL, active learning, sparse training.
Level 2: No annotation for training, query for fast adaptation.
Level 3: No annotation for training, zero-shot adaptation.

For objective supervision levels 2 and 3, continual training can be seen as pretraining for the unknown future objective task. We can note that if the scenario objective is unsupervised, then we can assume that it is similar to full data annotation.

Continual Learning Autonomy is concerned with autonomy regarding the distribution shifts (task label, task boundaries):

Level 0: Full task annotation at train and test.
Level 1: Full task annotation at train, no test task labels.
Level 2: Sparse task annotation at train, no test task labels: example task boundaries only without train task label.
Level 3: No task labels at all: task-agnostic.

This classification still holds for smooth transitions (concept drift). We can note that for class-incremental problems, the task-agnostic setting does not make sense since the task information is in the class labels. We can now characterize CL contributions by a pair \say( $i$ $j$ ), where $i$ represents the objective autonomy level, and $j$ stands for the CL autonomy level. Hence, the pair (0 1) describes a class-incremental scenario, i.e., a classification setting with fully annotated data and task labels for training but not for test data. It is important to note that those ratings classify the complexity of a given scenario and not approaches. For example, the default scenario (class-incremental) assesses a complexity level of (0 1), domain incremental without task labels would be (0 2), task agnostic continual reinforcement learning [10] (1 3). To validate approaches’ autonomy, they should be evaluated on adequate scenarios.

Investigating the cost of lowering or increasing a task’s complexity level is fundamental for applications of CL. We want to make our algorithms scale up to arbitrary complexity levels, but in practice, we would always choose the lowest possible complexity. If permitted by a given application, solving a scenario with a complexity level (0 1) is obviously more efficient than solving the same scenario with a level of (3 3).

4.2 Towards a generalization of the default setting

Some variants of supervised CL exist that alleviate the need for annotated data. The reduction of annotations can apply to restricted access to the task labels as in task-agnostic CL [119, 30], or by reducing the labels’ availability as in continual active learning [68, 74], or in semi-supervised continual learning, [98]. As described in Sec 4.1, reducing access to task supervision lead to evaluating a CL autonomy level of 2, and removing all access to it lead to evaluating a CL autonomy level of 3. On the other hand, reducing data annotation lead to an objective autonomy of 1 instead of 0 when the full annotation is available.

Among potential supplementary constraints, data can be streamed without the possibility of multi-epoch training as in online training [14, 93], or the data can be imbalanced [44, 15], or mixed with spurious features [55]. Scenarios where the annotations change over time (real concept drift) have been investigated in [49, 2, 11]. Some contributions [20] relax the condition of disjoint sub-tasks and assess the impact of several fundamental CL strategies such as regularization and replay. Yet others [75, 77] demonstrate that detecting sub-task boundaries autonomously is generally feasible by using density estimation methods.

To the generalized settings cited above, supplementary constraints (as discussed in Sec 2.2) may be added, making them even harder. If this brings CL closer to real-life applications, solutions are required that do not overfit a particular CL setting. Currently, the field of CL is fundamentally meta: implicitly, the goal is not to train the best possible model w.r.t. non-continual baselines, but rather to create algorithms that show maximal generalization to other CL settings. Therefore, experimenting with generalized supervised scenarios can assess algorithms’ robustness and improve generalized CL.

5 Unsupervised continual learning

Whether a machine learning task is considered supervised or not depends on the formulation of the loss function. In fact, no assumptions whatsoever are made concerning the loss in the definition of CL given in Sec 1. Therefore, CL is naturally transferred to unsupervised methods of machine learning, typical examples of which are density modeling, clustering, generative learning, and unsupervised representation learning.

5.1 Density modeling

Density modeling aims at approximating the probability density of a given set of data samples directly by minimizing a log-likelihood loss. Typically, this is achieved using mixture models, which model the data density as a weighted sum of $N$ parameterized component densities, e.g., multivariate Gaussian densities or Dirichlet distributions. Density modeling allows performing, among other functions, Bayesian inference and sampling. These functionalities spawned increased interest in mixture modeling a few years back, particularly in robotics [80, 97, 81, 47]. The main issue for CL in mixture modeling is that concept drift or shift may require an adaptation of $N$ , which motivates heuristics for adding and removing components. Current approaches using generative replay are proposed in, e.g., [85]. Mixture models usually adapt only a small subset of components for each update step due to their intrinsic reliance on distances instead of scalar products. This is why they are less prone to catastrophic forgetting than DNNs, an effect that has been demonstrated for self-organizing maps in [26] which are an approximation to Gaussian Mixture Models (GMMs), see [27]. Modeling the data density allows partitioning data space into Voronoi cells, in each of which a separate linear classifier model can be trained. This is the essence of the popular Locally Weighted Projection Regression (LWPR) algorithm [110] which was explicitly constructed for continual classification in robotics.

5.2 Clustering

Clustering is, in a certain sense, an approximation to density modeling, although the inference is limited to determining the precise component a given data sample was generated from. Clustering methods are normally trained using a k-means type of algorithm, which approximates gradient descent on a loss function that again approximates a GMM log-likelihood. CL for clustering algorithms faces the same basic issue as in density modeling: a potentially variable number $N$ of cluster centers during concept drift or shift. This has been demonstrated in, e.g., [79, 1, 6].

5.3 Generative learning

Generative learning aims to generate realistic samples (typically images) that are similar to a set of training data. Typical models are generative adversarial networks (GANs), variational auto-encoders (VAEs), PixelCNN, FLoW or GLoW, but many other variants have been proposed, see, e.g., [105] for a review. Training such generative models can be performed, e.g., in the CL default scenario introduced in Sec 3 (apart from the supervision information), which leads to catastrophic forgetting (CF) without additional measures. Several of the approaches used in supervised learning have been successfully applied to training generative models: knowledge distillation [94], EWC [120] and replay [111, 12, 52, 84, 50]. To our knowledge, no generic approaches that are specific to generative learning have been proposed, apart perhaps [108] where it is proposed to learn specific transformations. This, however, is very specific to a particular kind of (image) data and would have to be adapted if other kinds of data were targeted.

5.4 Continual representation learning

Unsupervised training for learning representations for downstream applications is a common use case for unsupervised learning. It was one of the motivations to develop various types of auto-encoders and generative models in the early days of deep learning. In CL, using an unsupervised criterion to learn representations might be useful to avoid representations that overfit a specific task and, at the same time, improve performance on downstream tasks [24, 85, 62]. Unsupervised pre-training can also be useful for learning a general feature extractor that can be frozen for future tasks [104, 71, 9].

5.5 Challenges of unsupervised CL

Unsupervised learning offers general learning criteria that can avoid the over-specialization of supervised training and reduce forgetting. Nevertheless, unsupervised CL faces the same challenges as supervised CL, and the default scenario for supervised CL can be transferred. Moreover, in practice, unsupervised training tends to be more complex than supervised training, especially for generation and density modeling, since it is harder to model a distribution than to determine a separating hypersurface between classes in data space. With the added complexity of CL, unsupervised learning can be a formidable problem, especially w.r.t. model and hyper-parameter selection.

6 Continual reinforcement learning

In reinforcement learning (RL), an agent learns to interact with its environment by choosing a specific action for each state based on a reward signal. The (unknown) underlying process is formalized as a Markov decision process (MDP), where an optimal policy maximizes an expected reward. This scenario is inherently a CL setting, since the distribution of the observed data depends on the specific policy. The evolution of the policy throughout the learning process will mechanically lead to the non-stationarity of the data distribution. Hence, RL requires the ability to cope with non-stationary data. However, supplementary non-stationary, for example, in the environment or in the objective to fulfill, can increase the training difficulty and lead to a continual setting. We will use the term Continual Reinforcement Learning (CRL) for denoting RL in settings that go beyond the usual assumptions of non-stationarity made in conventional RL.

6.1 Existing approaches

The works presented in [88, 101] introduce the importance of CL at an early stage and especially investigated them in the context of reinforcement learning. More recent works, e.g., [115, 89] revisit this area and consider additional aspects such as catastrophic forgetting. Some frameworks to guide future research have also been published [53, 41]. Both provide a comprehensive overview of the synergies between continual and reinforcement learning.

RL Approaches Experience replay [121, 89, 22] is the most common approach to counter non-stationarities in RL. Several variants are introduced, e.g., [91, 4, 33, 70, 31].

Continuous control, multi-task, and multi-goal are also research topics intersecting with continual reinforcement learning, but their scenarios are not always defined in a consistent fashion in the literature. In general, the goal is to enable transfer learning between policies, which, however, omits the capacity for forgetting or re-adaptation. Some works assume a static objective [3, 100, 99, 87, 92], others a static agent and/or environment [123, 117, 29] or none of both [116, 35, 39]. In contrast, multi-agent reinforcement learning is mostly related to some kind of joint training and is hence not related to CL.

For CRL, the agent needs to acquire new skills to handle time-varying conditions, such as changes in environment [73], observations or actions, and additionally must retain the old knowledge. A variety of approaches has been published, among which knowledge-based distillations [36, 103] and context-based decompositions [64, 122] are popular. Other works are concerned with the employed model [37, 38, 57, 32], off-policy algorithms [114], policy gradient [66] or a task-agnostic perspective [10]. Evaluations of known CL methods (e.g., GEM, A-GEM, and replay) are also applied in the RL domain [5, 7].

Benchmarks An overview of CRL environments can be found in [42]. Dedicated benchmarks which allow a systematical assessment are: Meta-World [118], Continual World [112] and L2Explorer [34].

Libraries Some libraries aim at unifying CRL development to improve comparability and accelerate progress: Sequoia [69], Avalanche rl [61], SaLinA [16], Reverb [13] and CORA [82].

6.2 Assumptions in CRL

Three assumptions are commonly made in CRL: Foremost, a decomposition into sub-tasks is assumed, even if their onset is unknown since most dedicated CL methods (see Sec 3.1) assume the existence of distinct sub-tasks. Another assumption concerns samples, which are assumed to be non-contradictory within sub-tasks, meaning the assessment of rewards changes only between sub-tasks, if it changes at all. Finally, it is a common assumption that knowledge of sub-task boundaries is provided. Most existing works are using information about sub-task boundaries as if they were provided by an oracle, without the possibility to recognize or determine them autonomously.

6.3 Challenges

In CRL, various types of drifts/shifts can appear:

Environment-related The agent successively observes its environment. Therefore, on a short timescale, observations will always be non-stationary, even if the environment is. In addition, the environment itself can change over time, or rapid modifications can be encountered (environment shift). This would result in novel states or transitions between these, resulting in an enlargement of the actual involved MDP.

Goal-related By maximizing the reward signal, the agent attains a defined objective. If the reward function changes, the agent experiences divergent information, leading to an inconsistent policy. In this setting, the definition of states, actions and transitions does not change, so the underlying MDP remains structurally intact. However, other rewards are assigned to previously learned mappings, enforcing changes of transition probabilities.

Agent-related The decreasing influence of exploration, regardless of whether off-policy methods such as Q-Learning or on-policy methods such as policy gradient are used, temporarily creates a source of non-stationarity, resulting in a time-varying sampling of the state-action space even with a static policy and a static environment. Additionally, it is easily possible that sensors or actuators degrade or undergo deliberate manipulations. Affecting possible actions, immediate effects on the MDP, while a changed perception of states also impacts transitions.

Sub-tasks and data acquisition For scenarios where the environment changes in a discrete fashion, we can introduce the notion of sub-tasks as in the default scenario for supervised CL, see Sec 3. A general challenge stems from the fact that samples are acquired as an online time series and have no balancing guarantees at all. Moreover, it is possible that similar states and actions appear in various sub-tasks, but with different assigned rewards, so sub-tasks are usually not disjoint and may even be contradictory, requiring un- or re-learning, a concept absent from the default supervised CL scenario. Depending on the type of non-stationarity, sub-task onset can be unknown, and the detection of boundaries may be difficult if the drifts are gradual rather than abrupt. In addition, the number of sub-tasks can be significantly higher than in supervised scenarios, up to a point where the entire concept of sub-tasks becomes questionable. Lastly, actions must be explicitly performed to transition to the appropriate subsequent state. Therefore, a generative or offline sampling is of limited usefulness, at least for exploration.

7 Discussion

The field of CL has expanded rapidly in recent years, which is why many aspects of CL are still fluid and not subject to a common consensus among researchers. This is evidenced by a wide variety of assumptions, evaluation metrics, see Sec 3.2 and constraints, see Sec 2.2. The so-called default scenario, see Sec 3, is the nearest thing to a commonly agreed scenario, yet many details fluctuate strongly between contributions. This leads to several interesting consequences and opportunities for further research:

CL comparability A direct consequence is the difficulty to directly compare results of different articles. This underscores the need, in CL more than in other domains of machine learning, to precisely describe evaluation procedures and, where possible, make use of existing libraries (see Sec 3 and 6.1) and evaluation procedures. Furthermore, as stated in Sec 2.2, CL is a multi-objective problem where achieving the cumulative baseline is important, but where other measures (see Sec 3.2) matter as well.

CL autonomy As explained in Sec 4.1, CL approaches should also be evaluated based on the complexity and autonomy of the scenario they can generalize to, to prevent them from overfitting to a specific CL scenario or assumptions.

CL scalability An aspect that is often omitted in current works in favor of quantitative performance measures is scalability. Depending on a potential application context, CL, even in the default scenario, may be faced with a huge number of sub-tasks, each again containing enormous amounts of samples. If this were not the case, the cumulative baseline, or equivalently some variant of GDumb (see Sec 3.2), would be a much less costly and superior (w.r.t. performance) alternative to using dedicated CL methods. So time and memory complexity for the case where the number of sub-tasks is large should be included in all new works on CL algorithms to ensure comparability, at least in this respect.

CL generalization As was shown in Sec 4, 5 and 6, the default CL scenario of Sec 3 can be generalized in many ways. Moreover, these chapters show that many open issues remain, both technical and conceptual, when attempting to generalize CL.

8 Conclusion

This review article attempts to give an overview of the current state of CL beyond the purely supervised default scenario, see Sec 3. We describe the various complexification of the default scenario and the different learning paradigms, and propose a classification based on the autonomy characteristics of algorithms. We believe that attempts to generalize CL pose important questions about the fundamental assumptions behind CL. We thus encourage CL researchers to carefully reflect upon the implicit, hidden assumptions in each CL approach they are dealing with and whether they can (and should) be relaxed. In a still-fluid field such as CL, a continuous re-examination of assumptions may lead to new solutions that strongly contribute to the advancement of the field.

References

[1] B. Aaron, D. E. Tamir, N. D. Rishe, and A. Kandel (2014) Dynamic incremental k-means clustering. In 2014 international conference on computational science and computational intelligence, Vol. 1, pp. 308–313. Cited by: §5.2.
[2] M. Abdelsalam, M. Faramarzi, S. Sodhani, and S. Chandar (2021) IIRC: incremental implicitly-refined classification. CVPR, pp. 11038–11047. External Links: Link Cited by: §4.2.
[3] H. B. Ammar, E. Eaton, P. Ruvolo, and M. Taylor (2014) Online multi-task learning for policy gradient methods. In International conference on machine learning, pp. 1206–1214. Cited by: §6.1.
[4] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba (2017) Hindsight experience replay. Advances in Neural Information Processing Systems 30. Cited by: §6.1.
[5] C. Atkinson, B. McCane, L. Szymanski, and A. Robins (2021) Pseudo-rehearsal: achieving deep reinforcement learning without catastrophic forgetting. Neurocomputing 428, pp. 291–307. Cited by: §6.1.
[6] A. M. Bagirov, J. Ugon, and D. Webb (2011) Fast modified global k-means algorithm for incremental cluster construction. Pattern recognition 44 (4), pp. 866–876. Cited by: §5.2.
[7] B. Bagus and A. Gepperth (2022) A study of continual learning methods for Q-learning. In International Joint Conference on Neural Networks (IJCNN), Cited by: §2.1, §6.1.
[8] E. Belouadah, A. Popescu, and I. Kanellos (2021) A comprehensive study of class incremental learning algorithms for visual tasks. Neural Networks 135, pp. 38–54. External Links: ISSN 0893-6080, Document, Link Cited by: §3.1.
[9] L. Caccia and J. Pineau (2021) SPeCiaL: self-supervised pretraining for continual learning. In IJCAI, Workshop on Continual Semi-Supervised Learning, Cited by: §5.4.
[10] M. Caccia, J. Mueller, T. Kim, L. Charlin, and R. Fakoor (2022) Task-agnostic continual reinforcement learning: in praise of a simple baseline. arXiv preprint arXiv:2205.14495. Cited by: §4.1, §6.1.
[11] M. Caccia, P. Rodriguez, O. Ostapenko, F. Normandin, M. Lin, L. Caccia, I. Laradji, I. Rish, A. Lacoste, D. Vazquez, and L. Charlin (2020) Online fast adaptation and knowledge accumulation: a new approach to continual learning. NeurIPS. External Links: Link Cited by: §2.1, §4.2.
[12] D. Campo, G. Slavic, M. Baydoun, L. Marcenaro, and C. Regazzoni (2020) Continual learning of predictive models in video sequences via variational autoencoders. In 2020 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 753–757. External Links: Document Cited by: §5.3.
[13] A. Cassirer, G. Barth-Maron, E. Brevdo, S. Ramos, T. Boyd, T. Sottiaux, and M. Kroiss (2021) Reverb: a framework for experience replay. arXiv preprint arXiv:2102.04736. Cited by: §6.1.
[14] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny (2019) Efficient lifelong learning with A-GEM. In International Conference on Learning Representations(ICLR), Cited by: §4.2.
[15] A. Chrysakis and M. Moens (2020) Online continual learning from imbalanced data. In International Conference on Machine Learning, pp. 1952–1961. Cited by: §4.2.
[16] L. Denoyer, A. de la Fuente, S. Duong, J. Gaya, P. Kamienny, and D. H. Thompson (2021) SaLinA: sequential learning of agents. arXiv preprint arXiv:2110.07910. Cited by: §6.1.
[17] N. Díaz-Rodríguez, V. Lomonaco, D. Filliat, and D. Maltoni (2018) Don’t forget, there is more than forgetting: new metrics for continual learning. arXiv preprint: 1810.13166. Cited by: §3.2.
[18] T. Doan, M. Abbana Bennani, B. Mazoure, G. Rabusseau, and P. Alquier (2021) A theoretical analysis of catastrophic forgetting through the NTK overlap matrix. In International Conference on Artificial Intelligence and Statistics, pp. 1072–1080. Cited by: §3.2.
[19] A. Douillard and T. Lesort (2021) Continuum: simple management of complex continual learning scenarios. arXiv preprint: 2102.06253. External Links: 2102.06253 Cited by: §3.
[20] N. Dzemidovich and A. Gepperth (2022) An empirical comparison of generators in replay-based continual learning. In European Symposium on Artificial Neural Networks(ESANN), Cited by: §4.2.
[21] M. Farajtabar, N. Azizan, A. Mott, and A. Li (2020) Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, pp. 3762–3773. External Links: Link Cited by: §3.2.
[22] W. Fedus, P. Ramachandran, R. Agarwal, Y. Bengio, H. Larochelle, M. Rowland, and W. Dabney (2020) Revisiting fundamentals of experience replay. In International Conference on Machine Learning, pp. 3061–3071. Cited by: §6.1.
[23] C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra (2017) PathNet: evolution channels gradient descent in super neural networks. CoRR abs/1701.08734. External Links: Link, 1701.08734 Cited by: §3.1.
[24] E. Fini, V. G. T. da Costa, X. Alameda-Pineda, E. Ricci, K. Alahari, and J. Mairal (2022) Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630. Cited by: §5.4.
[25] I. Gat, I. Schwartz, A. Schwing, and T. Hazan (2020) Removing bias in multi-modal classifiers: regularization by maximizing functional entropies. Advances in Neural Information Processing Systems 33, pp. 3197–3208. Cited by: §3.3.
[26] A. Gepperth and B. Pfülb (2020) A rigorous link between self-organizing maps and gaussian mixture models. In International Conference on Artificial Neural Networks (ICANN), Cited by: §5.1.
[27] A. Gepperth (2019) Incremental learning with a homeostatic self-organizing neural architecture. Neural Computing and Applications. Cited by: §5.1.
[28] A. Gepperth and B. Hammer (2016) Incremental learning algorithms and applications. In European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium. External Links: Link Cited by: §2.1, §2.
[29] A. Gupta, J. Yu, T. Z. Zhao, V. Kumar, A. Rovinsky, K. Xu, T. Devlin, and S. Levine (2021) Reset-free reinforcement learning via multi-task learning: learning dexterous manipulation behaviors without human intervention. In IEEE International Conference on Robotics and Automation (ICRA), pp. 6664–6671. Cited by: §6.1.
[30] X. He, J. Sygnowski, A. Galashov, A. A. Rusu, Y. W. Teh, and R. Pascanu (2019) Task agnostic continual learning via meta learning. ArXiv abs/1906.05201. External Links: Link Cited by: §4.2.
[31] H. Hu, J. Ye, G. Zhu, Z. Ren, and C. Zhang (2021) Generalizable episodic memory for deep reinforcement learning. arXiv preprint arXiv:2103.06469. Cited by: §6.1.
[32] Y. Huang, K. Xie, H. Bharadhwaj, and F. Shkurti (2021) Continual model-based reinforcement learning with hypernetworks. In IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805. Cited by: §6.1.
[33] D. Isele and A. Cosgun (2018) Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §6.1.
[34] E. C. Johnson, E. Q. Nguyen, B. Schreurs, C. S. Ewulum, C. Ashcraft, N. M. Fendley, M. M. Baker, A. New, and G. K. Vallabha (2022) L2Explorer: a lifelong reinforcement learning assessment environment. arXiv preprint arXiv:2203.07454. Cited by: §6.1.
[35] D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman (2021) MT-opt: continuous multi-task robotic reinforcement learning at scale. arXiv preprint arXiv:2104.08212. Cited by: §6.1.
[36] R. T. Kalifou, H. Caselles-Dupré, T. Lesort, T. Sun, N. Diaz-Rodriguez, and D. Filliat (2019) Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. In ICML Workshop on Multi-Task and Lifelong Learning, Vol. 4. Cited by: §3.1, §6.1.
[37] C. Kaplanis, M. Shanahan, and C. Clopath (2018) Continual reinforcement learning with complex synapses. In International Conference on Machine Learning, pp. 2497–2506. Cited by: §6.1.
[38] C. Kaplanis, M. Shanahan, and C. Clopath (2019) Policy consolidation for continual reinforcement learning. arXiv preprint arXiv:1902.00255. Cited by: §6.1.
[39] S. Kelly, T. Voegerl, W. Banzhaf, and C. Gondro (2021) Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genetic Programming and Evolvable Machines 22 (4), pp. 573–605. Cited by: §6.1.
[40] R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan (2018) Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. External Links: Link Cited by: §3.2.
[41] K. Khetarpal, M. Riemer, I. Rish, and D. Precup (2020) Towards continual reinforcement learning: a review and perspectives. arXiv preprint arXiv:2012.13490. Cited by: §2.1, §6.1.
[42] K. Khetarpal, S. Sodhani, S. Chandar, and D. Precup (2018) Environments for lifelong reinforcement learning. arXiv preprint arXiv:1811.10732. Cited by: §6.1.
[43] B. Kim, H. Kim, K. Kim, S. Kim, and J. Kim (2019) Learning not to learn: training deep neural networks with biased data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9012–9020. Cited by: §3.3.
[44] C. D. Kim, J. Jeong, and G. Kim (2020) Imbalanced continual learning with partitioning reservoir sampling. In Proceedings of the IEEE European Conference on Computer Vision (ECCV), External Links: Link Cited by: §4.2.
[45] B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Transactions on Intelligent Transportation Systems. Cited by: §4.1.
[46] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. (2017) Overcoming catastrophic forgetting in neural networks. Proc. of the national academy of sciences. External Links: Link Cited by: §3.1, §3.2, §3.3.
[47] M. Kristan, D. Skocaj, and A. Leonardis (2008) Incremental learning with gaussian mixture models. In Computer vision winter workshop, pp. 25–32. Cited by: §5.1.
[48] M. D. Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars (2019) Continual learning: a comparative study on how to defy forgetting in classification tasks. arXiv preprint: 1909.08383. External Links: 1909.08383 Cited by: §3.1, §3.1.
[49] T. Lesort, M. Caccia, and I. Rish (2021) Understanding continual learning settings with data distribution drift analysis. arXiv preprint arXiv:2104.01678. Cited by: §2.1, §2, §4.2.
[50] T. Lesort, H. Caselles-Dupré, M. Garcia-Ortiz, J. Goudou, and D. Filliat (2019) Generative models from the perspective of continual learning. In International Joint Conference on Neural Networks(IJCNN), Cited by: §3.1, §5.3.
[51] T. Lesort, T. George, and I. Rish (2021) Continual learning in deep networks: an analysis of the last layer. arXiv preprint arXiv:2106.01834. External Links: Link Cited by: §3.1.
[52] T. Lesort, A. Gepperth, A. Stoian, and D. Filliat (2019) Marginal replay vs conditional replay for continual learning. In International Conference on Artificial Neural Networks, pp. 466–480. External Links: Link Cited by: §5.3.
[53] T. Lesort, V. Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz-Rodríguez (2019) Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. Information fusion 58, pp. 52–68. Cited by: §6.1.
[54] T. Lesort, V. Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz-Rodríguez (2020) Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. Information Fusion 58, pp. 52 – 68. External Links: ISSN 1566-2535, Document, Link Cited by: §2.1, §2.2.
[55] T. Lesort (2022) Continual feature selection: spurious features in continual learning. arXiv preprint arXiv:2203.01012. Cited by: §4.2.
[56] Y. Li and N. Vasconcelos (2019) Repair: removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9572–9581. Cited by: §3.3.
[57] Y. L. Lo and S. Ghiassian (2019) Overcoming catastrophic interference in online reinforcement learning with dynamic self-organizing maps. arXiv preprint arXiv:1910.13213. Cited by: §6.1.
[58] V. Lomonaco, L. Pellegrini, A. Cossu, G. Graffieti, and A. Carta (2021) Avalanche: an end-to-end library for continual learning. Github repository. External Links: Link Cited by: §3.
[59] D. Lopez-Paz and M. Ranzato (2017) Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, Cited by: §3.3.
[60] D. Lopez-Paz and M. Ranzato (2017) Gradient episodic memory for continual learning. Advances in neural information processing systems 30. Cited by: §3.1, §3.2.
[61] N. Lucchesi, A. Carta, V. Lomonaco, and D. Bacciu (2022) Avalanche rl: a continual reinforcement learning library. In International Conference on Image Analysis and Processing, pp. 524–535. Cited by: §6.1.
[62] D. Madaan, J. Yoon, Y. Li, Y. Liu, and S. J. Hwang (2021) Representational continuity for unsupervised continual learning. In International Conference on Learning Representations(ICLR), Cited by: §3.1, §5.4.
[63] D. Maltoni and V. Lomonaco (2019) Continuous learning in single-incremental-task scenarios. Neural Networks 116, pp. 56–73. Cited by: §3.2.
[64] J. A. Mendez, H. van Seijen, and E. Eaton (2021) Modular lifelong reinforcement learning via neural composition. In International Conference on Learning Representations, Cited by: §6.1.
[65] J. A. Mendez, H. van Seijen, and E. Eaton (2022) Modular lifelong reinforcement learning via neural composition. In International Conference on Learning Representations(ICLR), External Links: Link Cited by: §3.1.
[66] J. Mendez, B. Wang, and E. Eaton (2020) Lifelong policy gradient learning of factored policies for faster training without forgetting. Advances in Neural Information Processing Systems 33, pp. 14398–14409. Cited by: §6.1.
[67] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing Atari with deep reinforcement learning. arXiv preprint: 1312.5602. Note: cite arxiv:1312.5602Comment: NIPS Deep Learning Workshop 2013 Cited by: §3.3.
[68] M. Mundt, Y. W. Hong, I. Pliushch, and V. Ramesh (2020) A wholistic view of continual learning with deep neural networks: forgotten lessons and the bridge to active and open world learning. arXiv preprint arXiv:2009.01797. Cited by: §4.2.
[69] F. Normandin, F. Golemo, O. Ostapenko, P. Rodriguez, M. D. Riemer, J. Hurtado, K. Khetarpal, D. Zhao, R. Lindeborg, T. Lesort, et al. (2021) Sequoia: a software framework to unify continual learning research. arXiv preprint arXiv:2108.01005. Cited by: §2.1, §3, §6.1.
[70] G. Novati and P. Koumoutsakos (2019) Remember and forget for experience replay. In International Conference on Machine Learning, pp. 4851–4860. Cited by: §6.1.
[71] O. Ostapenko, T. Lesort, P. Rodríguez, M. R. Arefin, A. Douillard, I. Rish, and L. Charlin (2022) Foundational models for continual learning: an empirical study of latent replay. arXiv. External Links: Document, Link Cited by: §5.4.
[72] O. Ostapenko, P. Rodriguez, M. Caccia, and L. Charlin (2021) Continual learning via local module composition. In Advances in Neural Information Processing Systems, Cited by: §3.1.
[73] S. Padakandla (2021) A survey of reinforcement learning algorithms for dynamically varying environments. ACM Computing Surveys (CSUR) 54 (6), pp. 1–25. Cited by: §6.1.
[74] M. Perkonigg, J. Hofmanninger, and G. Langs (2021) Continual active learning for efficient adaptation of machine learning models to changing image acquisition. In International Conference on Information Processing in Medical Imaging, pp. 649–660. Cited by: §4.2.
[75] B. Pfülb, B. Bagus, and Gepperth,A (2021) Continual learning with fully probabilistic models. In CVPR Workshop CLVISION Findings paper, Cited by: §4.2.
[76] B. Pfülb and A. Gepperth (2019) A comprehensive, application-oriented study of catastrophic forgetting in dnns. In International Conference on Learning Representations (ICLR), Cited by: §2.2, §3.2, §3.3.
[77] B. Pfülb and A. Gepperth (2021) Overcoming catastrophic forgetting with gaussian mixture replay. In International Joint Conference on Neural Networks(IJCNN), Cited by: §4.2.
[78] B. Pfülb (2022) Continual learning with deep learning methods in an application-oriented context. Ph.D. Thesis, University of Applied Sciences Fulda, GermanyUniversity of Applied Sciences Fulda, Germany. Cited by: §3.2.
[79] D. T. Pham, S. S. Dimov, and C. Nguyen (2004) An incremental k-means algorithm. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218 (7), pp. 783–795. Cited by: §5.2.
[80] R. C. Pinto and P. M. Engel (2015) A fast incremental gaussian mixture model. PloS one 10 (10), pp. e0139931. Cited by: §5.1.
[81] D. Pokrajac, A. Lazarevic, and L. J. Latecki (2007) Incremental local outlier detection for data streams. In 2007 IEEE symposium on computational intelligence and data mining, pp. 504–515. Cited by: §5.1.
[82] S. Powers, E. Xing, E. Kolve, R. Mottaghi, and A. Gupta (2021) CORA: benchmarks, baselines, and metrics as a platform for continual reinforcement learning agents. arXiv preprint arXiv:2110.10067. Cited by: §6.1.
[83] A. Prabhu, P. H. Torr, and P. K. Dokania (2020) GDumb: a simple approach that questions our progress in continual learning. In European Conference on Computer Vision, pp. 524–540. Cited by: §2.2.
[84] J. Ramapuram, M. Gregorova, and A. Kalousis (2017) Lifelong generative modeling. arXiv preprint arXiv:1705.09847. External Links: Link Cited by: §5.3.
[85] D. Rao, F. Visin, A. A. Rusu, Y. W. Teh, R. Pascanu, and R. Hadsell (2019) Continual unsupervised representation learning. arXiv preprint: 1910.14481. External Links: 1910.14481 Cited by: §3.1, §5.1, §5.4.
[86] S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert (2017) iCARL: incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010. External Links: Link Cited by: §3.1.
[87] J. Ribeiro, F. S. Melo, and J. Dias (2019) Multi-task learning and catastrophic forgetting in continual reinforcement learning. arXiv preprint arXiv:1909.10008. Cited by: §6.1.
[88] M. B. Ring (1994) Continual learning in reinforcement environments. Ph.D. Thesis, University of Texas at Austin Austin, Texas 78712. External Links: Link Cited by: §6.1.
[89] D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne (2019) Experience replay for continual learning. Advances in Neural Information Processing Systems 32. Cited by: §3.1, §3.1, §6.1, §6.1.
[90] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell (2016-06) Progressive neural networks. arXiv preprint: 1606.04671. Cited by: §3.1.
[91] T. Schaul, J. Quan, I. Antonoglou, and D. Silver (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952. Cited by: §6.1.
[92] R. Schiewer and L. Wiskott (2021) Modular networks prevent catastrophic interference in model-based multi-task reinforcement learning. In International Conference on Machine Learning, Optimization, and Data Science, pp. 299–313. Cited by: §6.1.
[93] J. Schwarz, J. Luketina, W. M. Czarnecki, A. Grabska-Barwinska, Y. W. Teh, R. Pascanu, and R. Hadsell (2018) Progress & compress: a scalable framework for continual learning. In ICML, External Links: Link Cited by: §4.2.
[94] A. Seff, A. Beatson, D. Suo, and H. Liu (2017) Continual learning in generative adversarial nets. arXiv preprint: 1705.08395. External Links: Document Cited by: §5.3.
[95] J. Serra, D. Suris, M. Miron, and A. Karatzoglou (2018-10–15 Jul) Overcoming catastrophic forgetting with hard attention to the task. ICML 80, pp. 4548–4557. External Links: Link Cited by: §3.2.
[96] H. Shin, J. K. Lee, J. Kim, and J. Kim (2017) Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pp. 2990–2999. External Links: Link Cited by: §3.1.
[97] K. Shmelkov, C. Schmid, and K. Alahari (2017) Incremental learning of object detectors without catastrophic forgetting. CoRR abs/1708.06977. External Links: Link, 1708.06977 Cited by: §5.1.
[98] J. Smith, J. Balloch, Y. Hsu, and Z. Kira (2021) Memory-efficient semi-supervised continual learning: the world is its own replay buffer. In 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. External Links: Link Cited by: §4.2.
[99] A. Y. Sorokin and M. S. Burtsev (2019) Continual and multi-task reinforcement learning with shared episodic memory. arXiv preprint arXiv:1905.02662. Cited by: §6.1.
[100] Y. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu (2017) Distral: robust multitask reinforcement learning. Advances in Neural Information Processing Systems 30. Cited by: §6.1.
[101] S. Thrun and T. M. Mitchell (1995) Lifelong robot learning. Robotics and autonomous systems 15 (1-2), pp. 25–46. Cited by: §6.1.
[102] E. Todorov, T. Erez, and Y. Tassa (2012) MuJoCo: a physics engine for model-based control.. In IROS, pp. 5026–5033. External Links: ISBN 978-1-4673-1737-5, Link Cited by: §3.3.
[103] R. Traoré, H. Caselles-Dupré, T. Lesort, T. Sun, G. Cai, N. Díaz-Rodríguez, and D. Filliat (2019) DisCoRL: continual reinforcement learning via policy distillation. arXiv preprint arXiv:1907.05855. Cited by: §3.1, §6.1.
[104] R. Traoré, H. Caselles-Dupré, T. Lesort, T. Sun, G. Cai, N. D. Rodríguez, and D. Filliat (2019) DisCoRL: continual reinforcement learning via policy distillation. CoRR abs/1907.05855. Cited by: §5.4.
[105] C. G. Turhan and H. S. Bilge (2018) Recent trends in deep generative models: a review. In 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp. 574–579. Cited by: §5.3.
[106] G. M. van de Ven and A. S. Tolias (2019) Three scenarios for continual learning. arXiv preprint arXiv:1904.07734. External Links: Link Cited by: §2.1.
[107] V. Vapnik (1999) The nature of statistical learning theory. Springer science & business media. Cited by: §2.
[108] S. Varshney, V. K. Verma, P. Srijith, L. Carin, and P. Rai (2021) CAM-GAN: continual adaptation modules for generative adversarial networks. In Thirty-Fifth Conference on Neural Information Processing Systems, External Links: Link Cited by: §5.3.
[109] T. Veniat, L. Denoyer, and M. Ranzato (2021) Efficient continual learning with modular networks and task-driven priors. In International Conference on Learning Representations, External Links: Link Cited by: §3.1.
[110] S. Vijayakumar and S. Schaal (2000) Locally weighted projection regression: an o (n) algorithm for incremental real time learning in high dimensional space. In International Conference on Machine Learning (ICML), Vol. 1, pp. 288–293. Cited by: §5.1.
[111] F. Wiewel and B. Yang (2019) Continual learning for anomaly detection with variational autoencoder. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 3837–3841. External Links: Document Cited by: §5.3.
[112] M. Wołczyk, M. Zajac, R. Pascanu, L. Kucinski, and P. Miłoś (2021) Continual world: a robotic benchmark for continual reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 34. Cited by: §6.1.
[113] M. Wortsman, V. Ramanujan, R. Liu, A. Kembhavi, M. Rastegari, J. Yosinski, and A. Farhadi (2020) Supermasks in superposition. In Advances in Neural Information Processing Systems, External Links: ISBN 9781713829546 Cited by: §3.3.
[114] A. Xie, J. Harrison, and C. Finn (2020) Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701. Cited by: §6.1.
[115] J. Xu and Z. Zhu (2018) Reinforced continual learning. Advances in Neural Information Processing Systems 31. Cited by: §6.1.
[116] Z. Xu, K. Wu, Z. Che, J. Tang, and J. Ye (2020) Knowledge transfer in multi-task deep reinforcement learning for continuous control. Advances in Neural Information Processing Systems 33, pp. 15146–15155. Cited by: §6.1.
[117] R. Yang, H. Xu, Y. Wu, and X. Wang (2020) Multi-task reinforcement learning with soft modularization. Advances in Neural Information Processing Systems 33, pp. 4767–4777. Cited by: §6.1.
[118] T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine (2020) Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pp. 1094–1100. Cited by: §3.3, §6.1.
[119] C. Zeno, I. Golan, E. Hoffer, and D. Soudry (2018) Task agnostic continual learning using online variational bayes. arXiv preprint: 1803.10123. External Links: 1803.10123 Cited by: §4.2.
[120] M. Zhai, L. Chen, F. Tung, J. He, M. Nawhal, and G. Mori (2019) Lifelong gan: continual learning for conditional image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2759–2768. Cited by: §5.3.
[121] S. Zhang and R. S. Sutton (2017) A deeper look at experience replay. arXiv preprint arXiv:1712.01275. Cited by: §6.1.
[122] T. Zhang, X. Wang, B. Liang, and B. Yuan (2022) Catastrophic interference in reinforcement learning: a solution based on context division and knowledge distillation. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §6.1.
[123] R. Zhao, X. Sun, and V. Tresp (2019) Maximum entropy-regularized multi-goal reinforcement learning. In International Conference on Machine Learning, pp. 7553–7562. Cited by: §6.1.