Continual learning (CL) learns a sequence of tasks incrementally. There are two popular CL settings, class incremental learning (CIL) and task incremental learning (TIL). A major challenge of CL is catastrophic forgetting (CF). While a number of techniques are already available to effectively overcome CF for TIL, CIL remains to be highly challenging. So far, little theoretical study has been done to provide a principled guidance on how to solve the CIL problem. This paper performs such a study. It first shows that probabilistically, the CIL problem can be decomposed into two sub-problems: Within-task Prediction (WP) and Task-id Prediction (TP). It further proves that TP is correlated with out-of-distribution (OOD) detection, which connects CIL and OOD detection. The key conclusion of this study is that regardless of whether WP and TP or OOD detection are defined explicitly or implicitly by a CIL algorithm, good WP and good TP or OOD detection are necessary and sufficient for good CIL performances. Additionally, TIL is simply WP. Based on the theoretical result, new CIL methods are also designed, which outperform strong baselines in both CIL and TIL settings by a large margin.
translated by 谷歌翻译
为了获得更高的样本效率和同时获得卓越的最终性能,这是深入增强学习(DRL)的主要挑战之一。以前的工作可以应对这些挑战之一,但通常未能同时解决这些挑战。在本文中,我们尝试同时解决这两个挑战。为了实现这一目标,我们首先将这些挑战分解为两个经典的RL问题:数据丰富性和探索探索权衡取舍。然后,我们将这两个问题投入到培训数据分配优化问题中,即在有限的互动中获得所需的培训数据,并通过i)同时解决这些数据,并通过i)明确的建模和控制行为政策的能力和多样性以及ii)。使用单调数据分布优化对行为策略的选择性/采样分布的粒度和自适应控制。最后,我们将此过程集成到广义策略迭代(GPI)中,并获得一个更通用的框架,称为广义数据分布迭代(GDI)。我们使用GDI框架来介绍从DQN到Agent57的著名RL方法的基于操作的版本。总结了GDI优势与GPI的理论保证。我们还展示了我们在街机学习环境(ALE)方面的最先进(SOTA)表现,其中我们的算法达到了9620.33%的平均人类正常得分(HNS),1146.39%的中位数HNS,仅使用22种人类世界记录超越200m培训框架。我们的性能与Agent57相当,而我们消耗了500倍的数据。我们认为,在获得啤酒中真正的超人人类代理人之前,还有很长的路要走。
translated by 谷歌翻译
深度Q网络(DQN)通过将深度学习(DL)与加强学习(RL)组合,这已经注意到,已经注意到所获取的数据的分布在训练过程中将改变。 DQN发现此属性可能会导致培训不稳定,因此它提出了处理财产缺点的有效方法。而不是专注于不利的方面,我们发现RL为缓解估计的数据分布与地面真理数据分布之间的差距,而是在监督学习(SL)未能这样做的情况下是至关重要的。从这种新的角度来看,我们将称为广义策略迭代(GPI)的RL的基本范例扩展到更广泛的版本中,该版本称为广义数据分发迭代(GDI)。我们看到大规模的RL算法和技术可以统一到GDI范式,这可以被认为是GDI的特殊情况之一。我们提供理论证明,为什么GDI比GPI更好,以及它如何运作。已经提出了基于GDI的几种实用算法来验证它的有效性和广泛性。经验实验证明我们在街机学习环境(ALE)上的最先进的(SOTA)性能,其中我们的算法达到了9620.98%的平均值人类规范化得分(HNS),1146.39%的中位数HNS和22人的世界历史突破(HWRB )仅使用仅200M的训练框架。我们的工作旨在引领RL研究进入征服人类世界纪录的旅程,并在表现和效率上寻求真正的超人代理。
translated by 谷歌翻译
我们研究了无模型增强学习的问题,该问题通常按照广义政策迭代(GPI)的原则解决。尽管GPI通常是策略评估和策略改进之间的相互作用,但大多数传统的无模型方法都假定粒度的独立性和GPI步骤的其他细节,尽管它们之间存在固有的联系。在本文中,我们提出了一种方法,该方法使政策评估和策略改进之间的不一致性正常,从而导致冲突的GPI解决方案,并减少了功能近似错误。为此,我们制定了一种新颖的学习范式,其中采取政策评估步骤等同于对执行政策改进的一些补偿,从而有效地减轻了两个GPI步骤之间的梯度冲突。我们还表明,我们提出的解决方案的形式等同于执行熵登记的策略改进,因此阻止该政策被困在次优的解决方案中。我们进行了广泛的实验,以评估我们在街机学习环境(ALE)方面的方法。经验结果表明,我们的方法在主要评估领域的表现优于几个强基础。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings, consistently.
translated by 谷歌翻译