State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges for each operation, which may lead to sub-optimal policies. To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations. RangeAugment uses an auxiliary loss based on image similarity as a measure to control the range of magnitudes of augmentation operations. As a result, RangeAugment has a single scalar parameter for search, image similarity, which we simply optimize via linear search. RangeAugment integrates seamlessly with any model and learns model- and task-specific augmentation policies. With extensive experiments on the ImageNet dataset across different networks, we show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations. Experimental results on semantic segmentation, object detection, foundation models, and knowledge distillation further shows RangeAugment's effectiveness.
translated by 谷歌翻译
Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic and learnable temperature. Specifically, following an easy-to-hard curriculum, we gradually increase the distillation loss w.r.t. the temperature, leading to increased distillation difficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing knowledge distillation frameworks and brings general improvements at a negligible additional computation cost. Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate the effectiveness of our method. Our code is available at https://github.com/zhengli97/CTKD.
translated by 谷歌翻译
Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detection. While these strategies were optimized for improving validation accuracy, they also led to state-of-the-art results in semi-supervised learning and improved robustness to common corruptions of images. An obstacle to a large-scale adoption of these methods is a separate search phase which increases the training complexity and may substantially increase the computational cost. Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size. Automated augmentation policies are often found by training small models on small datasets and subsequently applied to train larger models. In this work, we remove both of these obstacles. RandAugment has a significantly reduced search space which allows it to be trained on the target task with no need for a separate proxy task. Furthermore, due to the parameterization, the regularization strength may be tailored to different model and dataset sizes. RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous automated augmentation approaches on CIFAR-10/100, SVHN, and ImageNet. On the ImageNet dataset we achieve 85.0% accuracy, a 0.6% increase over the previous state-of-the-art and 1.0% increase over baseline augmentation. On object detection, RandAugment leads to 1.0-1.3% improvement over baseline augmentation, and is within 0.3% mAP of AutoAugment on COCO. Finally, due to its interpretable hyperparameter, RandAugment may be used to investigate the role of data augmentation with varying model and dataset size. Code is available online. 1 * Authors contributed equally.1 github.com/tensorflow/tpu/tree/master/models/ official/efficientnet
translated by 谷歌翻译
Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al. [1], for example, show a contrasting result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+. 1 ⇤ Authors contributed equally. 1 Code and checkpoints for our models are available at https://github.com/tensorflow/tpu/tree/ master/models/official/detection/projects/self_training 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
translated by 谷歌翻译
近年来,计算机视觉社区中最受欢迎的技术之一就是深度学习技术。作为一种数据驱动的技术,深层模型需要大量准确标记的培训数据,这在许多现实世界中通常是无法访问的。数据空间解决方案是数据增强(DA),可以人为地从原始样本中生成新图像。图像增强策略可能因数据集而有所不同,因为不同的数据类型可能需要不同的增强以促进模型培训。但是,DA策略的设计主要由具有领域知识的人类专家决定,这被认为是高度主观和错误的。为了减轻此类问题,一个新颖的方向是使用自动数据增强(AUTODA)技术自动从给定数据集中学习图像增强策略。 Autoda模型的目的是找到可以最大化模型性能提高的最佳DA策略。这项调查从图像分类的角度讨论了Autoda技术出现的根本原因。我们确定标准自动赛车模型的三个关键组件:搜索空间,搜索算法和评估功能。根据他们的架构,我们提供了现有图像AUTODA方法的系统分类法。本文介绍了Autoda领域的主要作品,讨论了他们的利弊,并提出了一些潜在的方向以进行未来的改进。
translated by 谷歌翻译
Synthetic data offers the promise of cheap and bountiful training data for settings where lots of labeled real-world data for tasks is unavailable. However, models trained on synthetic data significantly underperform on real-world data. In this paper, we propose Proportional Amplitude Spectrum Training Augmentation (PASTA), a simple and effective augmentation strategy to improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. PASTA involves perturbing the amplitude spectrums of the synthetic images in the Fourier domain to generate augmented views. We design PASTA to perturb the amplitude spectrums in a structured manner such that high-frequency components are perturbed relatively more than the low-frequency ones. For the tasks of semantic segmentation (GTAV to Real), object detection (Sim10K to Real), and object recognition (VisDA-C Syn to Real), across a total of 5 syn-to-real shifts, we find that PASTA outperforms more complex state-of-the-art generalization methods while being complementary to the same.
translated by 谷歌翻译
Jitendra Malik once said, "Supervision is the opium of the AI researcher". Most deep learning techniques heavily rely on extreme amounts of human labels to work effectively. In today's world, the rate of data creation greatly surpasses the rate of data annotation. Full reliance on human annotations is just a temporary means to solve current closed problems in AI. In reality, only a tiny fraction of data is annotated. Annotation Efficient Learning (AEL) is a study of algorithms to train models effectively with fewer annotations. To thrive in AEL environments, we need deep learning techniques that rely less on manual annotations (e.g., image, bounding-box, and per-pixel labels), but learn useful information from unlabeled data. In this thesis, we explore five different techniques for handling AEL.
translated by 谷歌翻译
数据增强方法丰富具有增强数据的数据集以提高神经网络的性能。最近,已经出现了自动化数据增强方法,自动设计增强策略。现有工作侧重于图像分类和对象检测,而我们提供了关于语义图像分割的第一次研究,并引入了两种新方法:\ Textit {Smartaugment}和\ Textit {SmartSamplingAugment}。 Smartaugment使用贝叶斯优化来搜索增强策略的丰富空间,并在我们考虑的所有语义细分任务中实现了新的最先进的性能。 SmartSamplingAugment,一种具有固定增强策略的简单参数方法,可与现有的资源密集型方法竞争性能,并且优于廉价的最先进的数据增强方法。此外,我们分析了数据增强超参数的影响,互动和重要性,并进行了融合研究,这确认了我们的设计选择,背后是Smartaugment和SmartSamplingAugment。最后,我们将提供我们的源代码以进行再现性,并促进进一步的研究。
translated by 谷歌翻译
We propose a novel end-to-end curriculum learning approach for sparsely labelled animal datasets leveraging large volumes of unlabelled data to improve supervised species detectors. We exemplify the method in detail on the task of finding great apes in camera trap footage taken in challenging real-world jungle environments. In contrast to previous semi-supervised methods, our approach adjusts learning parameters dynamically over time and gradually improves detection quality by steering training towards virtuous self-reinforcement. To achieve this, we propose integrating pseudo-labelling with curriculum learning policies and show how learning collapse can be avoided. We discuss theoretical arguments, ablations, and significant performance improvements against various state-of-the-art systems when evaluating on the Extended PanAfrican Dataset holding approx. 1.8M frames. We also demonstrate our method can outperform supervised baselines with significant margins on sparse label versions of other animal datasets such as Bees and Snapshot Serengeti. We note that performance advantages are strongest for smaller labelled ratios common in ecological applications. Finally, we show that our approach achieves competitive benchmarks for generic object detection in MS-COCO and PASCAL-VOC indicating wider applicability of the dynamic learning concepts introduced. We publish all relevant source code, network weights, and data access details for full reproducibility. The code is available at https://github.com/youshyee/DCL-Detection.
translated by 谷歌翻译
变压器模型在处理各种视觉任务方面表现出了有希望的有效性。但是,与训练卷积神经网络(CNN)模型相比,训练视觉变压器(VIT)模型更加困难,并且依赖于大规模训练集。为了解释这一观察结果,我们做出了一个假设,即\ textit {vit模型在捕获图像的高频组件方面的有效性较小,而不是CNN模型},并通过频率分析对其进行验证。受这一发现的启发,我们首先研究了现有技术从新的频率角度改进VIT模型的影响,并发现某些技术(例如,randaugment)的成功可以归因于高频组件的更好使用。然后,为了补偿这种不足的VIT模型能力,我们提出了HAT,该HAT可以通过对抗训练直接增强图像的高频组成部分。我们表明,HAT可以始终如一地提高各种VIT模型的性能(例如VIT-B的 +1.2%,Swin-B的 +0.5%),尤其是提高了仅使用Imagenet-的高级模型Volo-D5至87.3% 1K数据,并且优势也可以维持在分发数据的数据上,并转移到下游任务。该代码可在以下网址获得:https://github.com/jiawangbai/hat。
translated by 谷歌翻译
神经网络可以从单个图像中了解视觉世界的内容是什么?虽然它显然不能包含存在的可能对象,场景和照明条件 - 在所有可能的256 ^(3x224x224)224尺寸的方形图像中,它仍然可以在自然图像之前提供强大的。为了分析这一假设,我们通过通过监控掠夺教师的知识蒸馏来制定一种训练神经网络的培训神经网络。有了这个,我们发现上述问题的答案是:“令人惊讶的是,很多”。在定量术语中,我们在CiFar-10/100上找到了94%/ 74%的前1个精度,在想象中,通过将这种方法扩展到音频,84%的语音组合。在广泛的分析中,我们解除了增强,源图像和网络架构的选择,以及在从未见过熊猫的网络中发现“熊猫神经元”。这项工作表明,一个图像可用于推断成千上万的对象类,并激励关于增强和图像的基本相互作用的更新的研究议程。
translated by 谷歌翻译
In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This method gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the approach to the knowledge distillation framework and propose a data-based distillation method named ``Teaching what you Should Teach (TST)''. To be specific, TST contains a neural network-based data augmentation module with the priori bias, which can assist in finding what the teacher is good at while the student are not by learning magnitudes and probabilities to generate suitable samples. By training the data augmentation module and the generalized distillation paradigm in turn, a student model that has excellent generalization ability can be created. To verify the effectiveness of TST, we conducted extensive comparative experiments on object recognition (CIFAR-100 and ImageNet-1k), detection (MS-COCO), and segmentation (Cityscapes) tasks. As experimentally demonstrated, TST achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct intriguing studies of TST, including how to solve the performance degradation caused by the stronger teacher and what magnitudes and probabilities are needed for the distillation framework.
translated by 谷歌翻译
卷积神经网络已广泛应用于医学图像分割,并取得了相当大的性能。但是,性能可能会受到训练数据(源域)和测试数据(目标域)之间域间隙的显着影响。为了解决此问题,我们提出了一种基于数据操作的域泛化方法,称为域概括(AADG)的自动增强。我们的AADG框架可以有效地采样数据增强策略,从而产生新的领域并从适当的搜索空间中多样化训练集。具体而言,我们介绍了一项新的代理任务,以最大程度地提高了多个增强新颖的域之间的多样性,该域通过单位球体空间中的凹痕距离来衡量,从而使自动化的增强可牵引。对抗性训练和深入的强化学习有效地搜索了目标。全面执行了11个公开底部的底面图像数据集的定量和定性实验(四个用于视网膜血管分割,四个用于视盘和杯子和杯(OD/OC)分割(OD/OC)分割,视网膜病变细分进行了三个)。两个用于视网膜脉管系统分割的八八个数据集进一步涉及验证跨模式泛化。我们提出的AADG通过视网膜船,OD/OC和病变细分任务的相当大的利润来表现出最新的概括性能,并优于现有方法。学到的政策在经验上得到了证实为模型不平衡,并且可以很好地转移到其他模型中。源代码可在https://github.com/crazorback/aadg上找到。
translated by 谷歌翻译
Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many subpolicies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-theart. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars. * Work performed as a member of the Google Brain Residency Program.† Equal contribution.
translated by 谷歌翻译
最近,数据增强已成为视觉识别任务的现代培训食谱的重要组成部分。但是,尽管有效性,但很少探索视频识别的数据增强。很少有用于视频识别的现有增强食谱通过将相同的操作应用于整个视频框架来天真地扩展图像增强方法。我们的主要思想是,每帧的增强操作的大小都需要随着时间的推移而更改,以捕获现实世界视频的时间变化。在训练过程中,应使用更少的额外超参数来尽可能多地生成这些变化。通过这种动机,我们提出了一个简单而有效的视频数据增强框架Dynaaugment。每个帧上增强操作的大小通过有效的机制,傅立叶采样更改,该采样将各种,平滑和现实的时间变化参数化。 Dynaaugment还包括一个适用于视频的扩展搜索空间,用于自动数据增强方法。 Dynaaugment在实验上表明,从各种视频模型的静态增强中可以改善其他性能室。具体而言,我们在各种视频数据集和任务上显示了Dynaaugment的有效性:大规模视频识别(Kinetics-400和Sothings-Something-v2),小规模视频识别(UCF-101和HMDB-51),精细元素视频识别(潜水48和FINEGYM),早餐的视频动作细分,Thumos'14上的视频动作本地化以及MOT17DET上的视频对象检测。 Dynaaugment还使视频模型能够学习更广泛的表示形式,以改善损坏视频的模型鲁棒性。
translated by 谷歌翻译
量化的神经网络通常需要较小的内存占用和较低的计算复杂性,这对于有效部署至关重要。然而,量化不可避免地导致原始网络的分布分发,这通常会降低性能。为了解决这个问题,已经制定了大规模的努力,但大多数现有方法缺乏统计因素,依赖于几种手动配置。在本文中,我们提出了一种自适应映射量化方法,以学习模型内固有的最佳潜在子分布,并用混凝土高斯混合物(GM)平稳地近似。特别地,网络权重被符合GM - 近似的子分布。该子分布随着直接任务客观优化引导的共同调整模式中的重量更新而发展。在各种现代架构上的图像分类和物体检测的充分实验证明了所提出的方法的有效性,泛化性能和可转移性。此外,开发了用于移动CPU的有效部署流,在Octa-Core ARM CPU上实现高达7.46 $ \ Times $推理加速。代码在https://github.com/runpeidong/dgms公开发布。
translated by 谷歌翻译
我们向您展示一次(YOCO)进行数据增强。 Yoco将一张图像切成两片,并在每件零件中单独执行数据增强。应用YOCO改善了每个样品的增强的多样性,并鼓励神经网络从部分信息中识别对象。 Yoco享受无参数,轻松使用的属性,并免费提供几乎所有的增强功能。进行了彻底的实验以评估其有效性。我们首先证明Yoco可以无缝地应用于不同的数据增强,神经网络体系结构,并在CIFAR和Imagenet分类任务上带来性能提高,有时会超过传统的图像级增强。此外,我们显示了Yoco益处对比的预培训,以更强大的表示,可以更好地转移到多个下游任务。最后,我们研究了Yoco的许多变体,并经验分析了各个设置的性能。代码可在GitHub上找到。
translated by 谷歌翻译
在本文中,我们考虑了语义分割中域概括的问题,该问题旨在仅使用标记的合成(源)数据来学习强大的模型。该模型有望在看不见的真实(目标)域上表现良好。我们的研究发现,图像样式的变化在很大程度上可以影响模型的性能,并且样式特征可以通过图像的频率平均值和标准偏差来很好地表示。受此启发,我们提出了一种新颖的对抗性增强(Advstyle)方法,该方法可以在训练过程中动态生成硬性化的图像,因此可以有效防止该模型过度适应源域。具体而言,AdvStyle将样式功能视为可学习的参数,并通过对抗培训对其进行更新。学习的对抗性风格功能用于构建用于健壮模型训练的对抗图像。 AdvStyle易于实现,并且可以轻松地应用于不同的模型。对两个合成到现实的语义分割基准的实验表明,Advstyle可以显着改善看不见的真实域的模型性能,并表明我们可以实现最新技术的状态。此外,可以将AdvStyle用于域通用图像分类,并在考虑的数据集上产生明显的改进。
translated by 谷歌翻译
神经网络在医疗图像分割任务上的成功通常依赖于大型标记的数据集用于模型培训。但是,由于数据共享和隐私问题,获取和手动标记大型医疗图像集是资源密集的,昂贵的,有时是不切实际的。为了应对这一挑战,我们提出了一个通用的对抗数据增强框架Advchain,旨在提高培训数据对医疗图像分割任务的多样性和有效性。 AdvChain通过动态数据增强来增强数据,从而产生随机链接的光线像和几何转换,以类似于现实而又具有挑战性的成像变化以扩展训练数据。通过在培训期间共同优化数据增强模型和分割网络,可以生成具有挑战性的示例,以增强下游任务的网络可推广性。所提出的对抗数据增强不依赖生成网络,可以用作通用分割网络中的插件模块。它在计算上是有效的,适用于低声监督和半监督学习。我们在两个MR图像分割任务上分析和评估该方法:心脏分割和前列腺分割具有有限的标记数据。结果表明,所提出的方法可以减轻对标记数据的需求,同时提高模型泛化能力,表明其在医学成像应用中的实际价值。
translated by 谷歌翻译
蒙面自动编码器已成为自我监督的视觉表示学习的流行培训范例。这些模型随机掩盖了输入的一部分,并根据目标表示形式重建蒙版部分。在本文中,我们首先表明,对目标表示的仔细选择对于学习良好表示形式不必要,因为不同的目标倾向于得出相似的模型。在这一观察结果的驱动下,我们提出了一个多阶段掩盖的蒸馏管道,并使用随机初始化的模型作为教师,使我们能够有效地训练高容量模型,而无需仔细设计目标表示形式。有趣的是,我们进一步探索了能力较大的教师,获得具有出色转移能力的蒸馏学生。在分类,转移学习,对象检测和语义分割的不同任务上,使用自举的教师(DBOT)执行掩盖知识蒸馏的建议方法优于先前的自我监督方法,而不是非平凡的边缘。我们希望我们的发现以及拟议的方法能够激励人们重新考虑目标表征在预训练的蒙面自动编码器中的作用。
translated by 谷歌翻译