Deep transfer learning (DTL) has formed a long-term quest toward enabling deep neural networks (DNNs) to reuse historical experiences as efficiently as humans. This ability is named knowledge transferability. A commonly used paradigm for DTL is firstly learning general knowledge (pre-training) and then reusing (fine-tuning) them for a specific target task. There are two consensuses of transferability of pre-trained DNNs: (1) a larger domain gap between pre-training and downstream data brings lower transferability; (2) the transferability gradually decreases from lower layers (near input) to higher layers (near output). However, these consensuses were basically drawn from the experiments based on natural images, which limits their scope of application. This work aims to study and complement them from a broader perspective by proposing a method to measure the transferability of pre-trained DNN parameters. Our experiments on twelve diverse image classification datasets get similar conclusions to the previous consensuses. More importantly, two new findings are presented, i.e., (1) in addition to the domain gap, a larger data amount and huge dataset diversity of downstream target task also prohibit the transferability; (2) although the lower layers learn basic image features, they are usually not the most transferable layers due to their domain sensitivity.
translated by 谷歌翻译
Well-annotated medical datasets enable deep neural networks (DNNs) to gain strong power in extracting lesion-related features. Building such large and well-designed medical datasets is costly due to the need for high-level expertise. Model pre-training based on ImageNet is a common practice to gain better generalization when the data amount is limited. However, it suffers from the domain gap between natural and medical images. In this work, we pre-train DNNs on ultrasound (US) domains instead of ImageNet to reduce the domain gap in medical US applications. To learn US image representations based on unlabeled US videos, we propose a novel meta-learning-based contrastive learning method, namely Meta Ultrasound Contrastive Learning (Meta-USCL). To tackle the key challenge of obtaining semantically consistent sample pairs for contrastive learning, we present a positive pair generation module along with an automatic sample weighting module based on meta-learning. Experimental results on multiple computer-aided diagnosis (CAD) problems, including pneumonia detection, breast cancer classification, and breast tumor segmentation, show that the proposed self-supervised method reaches state-of-the-art (SOTA). The codes are available at https://github.com/Schuture/Meta-USCL.
translated by 谷歌翻译
Deep transfer learning has been widely used for knowledge transmission in recent years. The standard approach of pre-training and subsequently fine-tuning, or linear probing, has shown itself to be effective in many down-stream tasks. Therefore, a challenging and ongoing question arises: how to quantify cross-task transferability that is compatible with transferred results while keeping self-consistency? Existing transferability metrics are estimated on the particular model by conversing source and target tasks. They must be recalculated with all existing source tasks whenever a novel unknown target task is encountered, which is extremely computationally expensive. In this work, we highlight what properties should be satisfied and evaluate existing metrics in light of these characteristics. Building upon this, we propose Principal Gradient Expectation (PGE), a simple yet effective method for assessing transferability across tasks. Specifically, we use a restart scheme to calculate every batch gradient over each weight unit more than once, and then we take the average of all the gradients to get the expectation. Thus, the transferability between the source and target task is estimated by computing the distance of normalized principal gradients. Extensive experiments show that the proposed transferability metric is more stable, reliable and efficient than SOTA methods.
translated by 谷歌翻译
深层模型必须学习强大而可转移的表示形式,以便在新领域上表现良好。尽管已经提出了域转移方法(例如,域的适应性,域的概括)来学习跨域的可转移表示,但通常将它们应用于在Imagenet上预先训练的重置骨架。因此,现有作品很少关注预训练对域转移任务的影响。在本文中,我们对领域适应和泛化的预训练进行了广泛的研究和深入分析,即:网络体系结构,大小,训练损失和数据集。我们观察到,仅使用最先进的主链优于现有的最先进的域适应基线,并将新的基本线设置为Office-Home和Domainnet在10.7 \%和5.5 \%上提高。我们希望这项工作可以为未来的领域转移研究提供更多见解。
translated by 谷歌翻译
With the ever-growing model size and the limited availability of labeled training data, transfer learning has become an increasingly popular approach in many science and engineering domains. For classification problems, this work delves into the mystery of transfer learning through an intriguing phenomenon termed neural collapse (NC), where the last-layer features and classifiers of learned deep networks satisfy: (i) the within-class variability of the features collapses to zero, and (ii) the between-class feature means are maximally and equally separated. Through the lens of NC, our findings for transfer learning are the following: (i) when pre-training models, preventing intra-class variability collapse (to a certain extent) better preserves the intrinsic structures of the input data, so that it leads to better model transferability; (ii) when fine-tuning models on downstream tasks, obtaining features with more NC on downstream data results in better test accuracy on the given task. The above results not only demystify many widely used heuristics in model pre-training (e.g., data augmentation, projection head, self-supervised learning), but also leads to more efficient and principled fine-tuning method on downstream tasks that we demonstrate through extensive experimental results.
translated by 谷歌翻译
由于标记数据的稀缺性,使用在ImageNet上预先培训的模型是遥感场景分类中的事实上标准。虽然最近,几个较大的高分辨率遥感(HRRS)数据集具有建立新基准的目标,但在这些数据集上从头开始的培训模型的尝试是零星的。在本文中,我们显示来自划痕的训练模型在几个较新数据集中产生可比较的结果,可以进行微调在想象中预先培训的模型。此外,在HRRS数据集上学到的表示,更好地或至少与想象中学到的那些类似的人力出现场景分类任务转移到其他HRRS场景分类任务。最后,我们表明,在许多情况下,通过使用域名数据的第二轮预训练,即域 - 自适应预训练,获得最佳表示。源代码和预先训练的模型可用于\ url {https://github.com/risojevicv/rssc-transfer。}
translated by 谷歌翻译
转移学习是一种标准技术,可以将知识从一个领域转移到另一个领域。对于医学成像中的应用,尽管域之间的任务和图像特征差异,但从Imagenet转移已成为事实上的方法。但是,尚不清楚哪些因素决定了哪些因素以及在何种程度上转移学习到医疗领域是有用的。最近,人们对源域重复使用的特征的长期假设最近受到质疑。通过在几个医学图像基准数据集上进行的一系列实验,我们探讨了传输学习,数据大小,模型的容量和电感偏置以及源域和目标域之间的距离之间的关系。我们的发现表明,在大多数情况下,转移学习是有益的,我们表征了重要的角色重复使用在其成功方面。
translated by 谷歌翻译
Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.
translated by 谷歌翻译
We introduce a novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN. As the DNN maps from the input space to the output space through many layers sequentially, we define the distilled knowledge to be transferred in terms of flow between layers, which is calculated by computing the inner product between features from two layers. When we compare the student DNN and the original network with the same size as the student DNN but trained without a teacher network, the proposed method of transferring the distilled knowledge as the flow between two layers exhibits three important phenomena: (1) the student DNN that learns the distilled knowledge is optimized much faster than the original model; (2) the student DNN outperforms the original DNN; and (3) the student DNN can learn the distilled knowledge from a teacher DNN that is trained at a different task, and the student DNN outperforms the original DNN that is trained from scratch.
translated by 谷歌翻译
ImageNet pre-training has enabled state-of-the-art results on many tasks. In spite of its recognized contribution to generalization, we observed in this study that ImageNet pre-training also transfers adversarial non-robustness from pre-trained model into fine-tuned model in the downstream classification tasks. We first conducted experiments on various datasets and network backbones to uncover the adversarial non-robustness in fine-tuned model. Further analysis was conducted on examining the learned knowledge of fine-tuned model and standard model, and revealed that the reason leading to the non-robustness is the non-robust features transferred from ImageNet pre-trained model. Finally, we analyzed the preference for feature learning of the pre-trained model, explored the factors influencing robustness, and introduced a simple robust ImageNet pre-training solution. Our code is available at \url{https://github.com/jiamingzhang94/ImageNet-Pretraining-transfers-non-robustness}.
translated by 谷歌翻译
微调被广泛应用于图像分类任务中,作为转移学习方法。它重新使用源任务中的知识来学习和获得目标任务中的高性能。微调能够减轻培训数据不足和新数据昂贵标签的挑战。但是,标准微调在复杂的数据分布中的性能有限。为了解决这个问题,我们提出了适应性的多调整方法,该方法可适应地确定每个数据样本的微调策略。在此框架中,定义了多个微调设置和一个策略网络。适应性多调整中的策略网络可以动态地调整为最佳权重,以将不同的样本馈入使用不同的微调策略训练的模型。我们的方法的表现优于标准的微调方法1.69%,数据集FGVC-Aircraft和可描述的纹理优于2.79%,在Stanford Cars,CIFAR-10和时尚范围内产生可比的性能。
translated by 谷歌翻译
跨域很少的学习(CD-FSL)最近几乎没有目标样本在源和目标域之间存在极端差异,最近引起了极大的关注。对于CD-FSL,最近的研究通常开发了基于转移学习的方法,该方法预先培训了受欢迎的标记源域数据集的神经网络,然后将其传输到目标域数据。尽管标记的数据集可以为目标数据提供合适的初始参数,但源和目标之间的域差异可能会阻碍目标域上的微调。本文提出了一种简单而功能强大的方法,该方法在适应目标数据之前将源域上拟合的参数重新传递。重新运行重置源预训练模型的特定于源特异性参数,从而促进了目标域上的微调,从而改善了几乎没有射击性能。
translated by 谷歌翻译
转移学习已成为利用计算机视觉中预先训练模型的流行方法。然而,在不执行计算上昂贵的微调的情况下,难以量化哪个预先训练的源模型适用于特定目标任务,或者相反地,可以容易地适应预先训练的源模型的任务。在这项工作中,我们提出了高斯Bhattacharyya系数(GBC),一种用于量化源模型和目标数据集之间的可转换性的新方法。在第一步中,我们在由源模型定义的特征空间中嵌入所有目标图像,并表示使用每类高斯。然后,我们使用Bhattacharyya系数估计它们的成对类可分离性,从而产生了一种简单有效的源模型转移到目标任务的程度。我们在数据集和架构选择的上下文中评估GBC在图像分类任务上。此外,我们还对更复杂的语义分割转移性估算任务进行实验。我们证明GBC在语义分割设置中大多数评估标准上的最先进的可转移性度量,匹配图像分类中的数据集转移性的最高方法的性能,并且在图像分类中执行最佳的架构选择问题。
translated by 谷歌翻译
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -from 1 example per class to 1 M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.
translated by 谷歌翻译
自我监督的学习(SSL)通过大量未标记的数据的先知,在各种医学成像任务上取得了出色的性能。但是,对于特定的下游任务,仍然缺乏有关如何选择合适的借口任务和实现细节的指令书。在这项工作中,我们首先回顾了医学成像分析领域中自我监督方法的最新应用。然后,我们进行了广泛的实验,以探索SSL中的四个重要问题用于医学成像,包括(1)自我监督预处理对不平衡数据集的影响,(2)网络体系结构,(3)上游任务对下游任务和下游任务和下游任务的适用性(4)SSL和常用政策用于深度学习的堆叠效果,包括数据重新采样和增强。根据实验结果,提出了潜在的指南,以在医学成像中进行自我监督预处理。最后,我们讨论未来的研究方向并提出问题,以了解新的SSL方法和范式时要注意。
translated by 谷歌翻译
Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.
translated by 谷歌翻译
在Imagenet或其他大规模数据数据上的预培训模型导致计算机愿景的主要进步,尽管伴随着与策划成本,隐私,使用权和道德问题相关的缺点。在本文中,我们首次研究了基于由图形模拟器生成的合成数据到来自非常不同的域的下游任务的培训模型的可转换性。在使用此类合成数据进行预培训时,我们发现不同任务的下游性能受到不同配置的不同配置(例如,照明,对象姿势,背景等),并且没有单尺寸适合 - 所有解决方案。因此,更好地将合成的预训练数据量身定制到特定的下游任务,以获得最佳性能。我们介绍Task2SIM,一个统一的模型将下游任务表示映射到最佳模拟参数,以为它们生成合成的预训练数据。 Task2SIM通过培训学习此映射,以查找一组“看到”任务上的最佳参数集。曾经训练过,它可以用于预测一个新颖的“看不见”任务的最佳仿真参数,而无需额外的培训。鉴于每级图像数量的预算,我们具有20个不同的下游任务的广泛实验,显示了Task2SIM的任务 - 自适应预训练数据导致明显更好的下游性能,而不是在看见和看不见的任务上的非自适应选择模拟参数。它甚至是竞争对手的真实图像的竞争力。
translated by 谷歌翻译
Few-shot learning aims to fast adapt a deep model from a few examples. While pre-training and meta-training can create deep models powerful for few-shot generalization, we find that pre-training and meta-training focuses respectively on cross-domain transferability and cross-task transferability, which restricts their data efficiency in the entangled settings of domain shift and task shift. We thus propose the Omni-Training framework to seamlessly bridge pre-training and meta-training for data-efficient few-shot learning. Our first contribution is a tri-flow Omni-Net architecture. Besides the joint representation flow, Omni-Net introduces two parallel flows for pre-training and meta-training, responsible for improving domain transferability and task transferability respectively. Omni-Net further coordinates the parallel flows by routing their representations via the joint-flow, enabling knowledge transfer across flows. Our second contribution is the Omni-Loss, which introduces a self-distillation strategy separately on the pre-training and meta-training objectives for boosting knowledge transfer throughout different training stages. Omni-Training is a general framework to accommodate many existing algorithms. Evaluations justify that our single framework consistently and clearly outperforms the individual state-of-the-art methods on both cross-task and cross-domain settings in a variety of classification, regression and reinforcement learning problems.
translated by 谷歌翻译
本文关注的是将许多预训练的深神经网络(DNN)(称为检查点)排名,以将学习转移到下游任务。由于广泛使用了DNN,我们可能很容易从各种来源收集数百个检查站。他们中的哪个将最好的人转移到我们感兴趣的下游任务?为了彻底回答这个问题,我们建立了一个神经检查点排名基准(Neucrab),并研究一些直观的排名措施。这些措施是通用的,适用于不同输出类型的检查点,而无需知道如何对哪个数据集进行检查。它们还产生了低计算成本,使它们实际上有意义。我们的结果表明,检查点提取的特征的线性可分离性是可传递性的强烈指标。我们还达到了一种新的排名NLEEP,这在实验中带来了最佳性能。
translated by 谷歌翻译
Transferring the knowledge learned from large scale datasets (e.g., ImageNet) via fine-tuning offers an effective solution for domain-specific fine-grained visual categorization (FGVC) tasks (e.g., recognizing bird species or car make & model). In such scenarios, data annotation often calls for specialized domain knowledge and thus is difficult to scale. In this work, we first tackle a problem in large scale FGVC. Our method won first place in iNaturalist 2017 large scale species classification challenge. Central to the success of our approach is a training scheme that uses higher image resolution and deals with the long-tailed distribution of training data. Next, we study transfer learning via fine-tuning from large scale datasets to small scale, domainspecific FGVC datasets. We propose a measure to estimate domain similarity via Earth Mover's Distance and demonstrate that transfer learning benefits from pre-training on a source domain that is similar to the target domain by this measure. Our proposed transfer learning outperforms Im-ageNet pre-training and obtains state-of-the-art results on multiple commonly used FGVC datasets.
translated by 谷歌翻译