我们向您展示一次(YOCO)进行数据增强。 Yoco将一张图像切成两片,并在每件零件中单独执行数据增强。应用YOCO改善了每个样品的增强的多样性,并鼓励神经网络从部分信息中识别对象。 Yoco享受无参数,轻松使用的属性,并免费提供几乎所有的增强功能。进行了彻底的实验以评估其有效性。我们首先证明Yoco可以无缝地应用于不同的数据增强,神经网络体系结构,并在CIFAR和Imagenet分类任务上带来性能提高,有时会超过传统的图像级增强。此外,我们显示了Yoco益处对比的预培训,以更强大的表示,可以更好地转移到多个下游任务。最后,我们研究了Yoco的许多变体,并经验分析了各个设置的性能。代码可在GitHub上找到。
translated by 谷歌翻译
自我监督学习的最新进展证明了多种视觉任务的有希望的结果。高性能自我监督方法中的一个重要成分是通过培训模型使用数据增强,以便在嵌入空间附近的相同图像的不同增强视图。然而,常用的增强管道整体地对待图像,忽略图像的部分的语义相关性-e.g。主题与背景 - 这可能导致学习杂散相关性。我们的工作通过调查一类简单但高度有效的“背景增强”来解决这个问题,这鼓励模型专注于语义相关内容,劝阻它们专注于图像背景。通过系统的调查,我们表明背景增强导致在各种任务中跨越一系列最先进的自我监督方法(MOCO-V2,BYOL,SWAV)的性能大量改进。 $ \ SIM $ + 1-2%的ImageNet收益,使得与监督基准的表现有关。此外,我们发现有限标签设置的改进甚至更大(高达4.2%)。背景技术增强还改善了许多分布换档的鲁棒性,包括天然对抗性实例,想象群-9,对抗性攻击,想象成型。我们还在产生了用于背景增强的显着掩模的过程中完全无监督的显着性检测进展。
translated by 谷歌翻译
In contrastive self-supervised learning, the common way to learn discriminative representation is to pull different augmented "views" of the same image closer while pushing all other images further apart, which has been proven to be effective. However, it is unavoidable to construct undesirable views containing different semantic concepts during the augmentation procedure. It would damage the semantic consistency of representation to pull these augmentations closer in the feature space indiscriminately. In this study, we introduce feature-level augmentation and propose a novel semantics-consistent feature search (SCFS) method to mitigate this negative effect. The main idea of SCFS is to adaptively search semantics-consistent features to enhance the contrast between semantics-consistent regions in different augmentations. Thus, the trained model can learn to focus on meaningful object regions, improving the semantic representation ability. Extensive experiments conducted on different datasets and tasks demonstrate that SCFS effectively improves the performance of self-supervised learning and achieves state-of-the-art performance on different downstream tasks.
translated by 谷歌翻译
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies on CI-FAR and ImageNet classification tasks, as well as on the Im-ageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gains in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances. Source code and pretrained models are available at https://github.com/clovaai/CutMix-PyTorch.
translated by 谷歌翻译
在真实世界的机器学习应用中,可靠和安全的系统必须考虑超出标准测试设置精度的性能测量。这些其他目标包括分销(OOD)鲁棒性,预测一致性,对敌人的抵御能力,校准的不确定性估计,以及检测异常投入的能力。然而,提高这些目标的绩效通常是一种平衡行为,即今天的方法无法在不牺牲其他安全轴上的性能的情况下实现。例如,对抗性培训改善了对抗性鲁棒性,但急剧降低了其他分类器性能度量。同样,强大的数据增强和正则化技术往往提高鲁棒性,但损害异常检测,提出了对所有现有安全措施的帕累托改进是可能的。为满足这一挑战,我们设计了利用诸如分数形的图片的自然结构复杂性设计新的数据增强策略,这优于众多基线,靠近帕累托 - 最佳,并圆形提高安全措施。
translated by 谷歌翻译
本文首先揭示令人惊讶的发现:没有任何学习,随机初始化的CNN可以令人惊讶地定位对象。也就是说,CNN具有归纳偏差,以自然地关注物体,在本文中被命名为Tobias(“对象是在视线处的”)。进一步分析并成功地应用于自我监督学习(SSL)的经验感应偏差。鼓励CNN学习专注于前景对象的表示,通过将每个图像转换为具有不同背景的各种版本,其中前景和背景分离被托比亚引导。实验结果表明,建议的托比亚斯显着提高了下游任务,尤其是对象检测。本文还表明,托比亚斯对不同尺寸的训练集具有一致的改进,并且更具弹性变化了图像增强。代码可在https://github.com/cupidjay/tobias获得。
translated by 谷歌翻译
Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.
translated by 谷歌翻译
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -from 1 example per class to 1 M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.
translated by 谷歌翻译
在本文中,我们询问视觉变形金刚(VIT)是否可以作为改善机器学习模型对抗逃避攻击的对抗性鲁棒性的基础结构。尽管较早的作品集中在改善卷积神经网络上,但我们表明VIT也非常适合对抗训练以实现竞争性能。我们使用自定义的对抗训练配方实现了这一目标,该配方是在Imagenet数据集的一部分上使用严格的消融研究发现的。与卷积相比,VIT的规范培训配方建议强大的数据增强,部分是为了补偿注意力模块的视力归纳偏置。我们表明,该食谱在用于对抗训练时可实现次优性能。相比之下,我们发现省略所有重型数据增强,并添加一些额外的零件($ \ varepsilon $ -Warmup和更大的重量衰减),从而大大提高了健壮的Vits的性能。我们表明,我们的配方在完整的Imagenet-1k上概括了不同类别的VIT体系结构和大规模模型。此外,调查了模型鲁棒性的原因,我们表明,在使用我们的食谱时,在训练过程中产生强烈的攻击更加容易,这会在测试时提高鲁棒性。最后,我们通过提出一种量化对抗性扰动的语义性质并强调其与模型的鲁棒性的相关性来进一步研究对抗训练的结果。总体而言,我们建议社区应避免将VIT的规范培训食谱转换为在对抗培训的背景下进行强大的培训和重新思考常见的培训选择。
translated by 谷歌翻译
对比度学习(CL)方法有效地学习数据表示,而无需标记监督,在该方法中,编码器通过单VS-MONY SOFTMAX跨透镜损失将每个正样本在多个负样本上对比。通过利用大量未标记的图像数据,在Imagenet上预先训练时,最近的CL方法获得了有希望的结果,这是一个具有均衡图像类的曲制曲线曲线集。但是,当对野外图像进行预训练时,它们往往会产生较差的性能。在本文中,为了进一步提高CL的性能并增强其对未经保育数据集的鲁棒性,我们提出了一种双重的CL策略,该策略将其内部查询的正(负)样本对比,然后才能决定多么强烈地拉动(推)。我们通过对比度吸引力和对比度排斥(CACR)意识到这一策略,这使得查询不仅发挥了更大的力量来吸引更遥远的正样本,而且可以驱除更接近的负面样本。理论分析表明,CACR通过考虑正/阴性样品的分布之间的差异来概括CL的行为,而正/负样品的分布通常与查询独立进行采样,并且它们的真实条件分布给出了查询。我们证明了这种独特的阳性吸引力和阴性排斥机制,这有助于消除在数据集的策划较低时尤其有益于数据及其潜在表示的统一先验分布的需求。对许多标准视觉任务进行的大规模大规模实验表明,CACR不仅在表示学习中的基准数据集上始终优于现有的CL方法,而且在对不平衡图像数据集进行预训练时,还表现出更好的鲁棒性。
translated by 谷歌翻译
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive selfsupervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by Sim-CLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-ofthe-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100× fewer labels. 1
translated by 谷歌翻译
最近先进的无监督学习方法使用暹罗样框架来比较来自同一图像的两个“视图”以进行学习表示。使两个视图独特是一种保证无监督方法可以学习有意义的信息的核心。但是,如果使用用于生成两个视图的增强不足够强度,此类框架有时会易碎过度装备,导致培训数据上的过度自信的问题。此缺点会阻碍模型,从学习微妙方差和细粒度信息。为了解决这个问题,在这项工作中,我们的目标是涉及在无监督的学习中的标签空间上的距离概念,并让模型通过混合输入数据空间来了解正面或负对对之间的柔和程度,以便协同工作输入和损耗空间。尽管其概念性简单,我们凭借解决的解决方案 - 无监督图像混合(UN-MIX),我们可以从转换的输入和相应的新标签空间中学习Subtler,更强大和广义表示。广泛的实验在CiFar-10,CiFar-100,STL-10,微小的想象和标准想象中进行了流行的无人监督方法SIMCLR,BYOL,MOCO V1和V2,SWAV等。我们所提出的图像混合物和标签分配策略可以获得一致的改进在完全相同的超参数和基础方法的培训程序之后1〜3%。代码在https://github.com/szq0214/un-mix上公开提供。
translated by 谷歌翻译
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve saliency-guided procedures but minimal focus on the arbitrary path, leaving the randomization domain unexplored. In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes. By combining the best elements of randomness and saliency utilization, our method balances speed, simplicity, and accuracy. We name our method R-Mix following the concept of "Random Mix-up". We demonstrate its effectiveness in generalization, weakly supervised object localization, calibration, and robustness to adversarial attacks. Finally, in order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies based on the classifier's performance, reducing dependency on human-designed objectives and hyperparameter tuning. Extensive experiments further show that the agent is capable of performing at the cutting-edge level, laying the foundation for a fully automatic mix-up. Our code is released at [https://github.com/minhlong94/Random-Mixup].
translated by 谷歌翻译
自我监督的方法(SSL)通过最大化两个增强视图之间的相互信息,裁剪是一种巨大的成功,其中裁剪是一种流行的增强技术。裁剪区域广泛用于构造正对,而裁剪后的左侧区域很少被探讨在现有方法中,尽管它们在一起构成相同的图像实例并且两者都有助于对类别的描述。在本文中,我们首次尝试从完整的角度来展示两种地区的重要性,并提出称为区域对比学习(RegionCl)的简单但有效的借口任务。具体地,给定两个不同的图像,我们随机从具有相同大小的每个图像随机裁剪区域(称为粘贴视图)并将它们交换以分别与左区域(称为CANVAS视图)一起组成两个新图像。然后,可以根据以下简单标准提供对比度对,即,每个视图是(1)阳性,其视图从相同的原始图像增强,并且与从其他图像增强的视图增强的视图。对于对流行的SSL方法进行微小的修改,RegionCL利用这些丰富的对并帮助模型区分来自画布和粘贴视图的区域特征,因此学习更好的视觉表示。 Imagenet,Coco和Citycapes上的实验表明,RegionCL通过大型边缘改善Moco V2,Densecl和Simsiam,并在分类,检测和分割任务上实现最先进的性能。代码将在https://github.com/annbless/regioncl.git上获得。
translated by 谷歌翻译
最近自我监督学习成功的核心组成部分是裁剪数据增强,其选择要在自我监督损失中用作正视图的图像的子区域。底层假设是给定图像的随机裁剪和调整大小的区域与感兴趣对象的信息共享信息,其中学习的表示将捕获。这种假设在诸如想象网的数据集中大多满足,其中存在大,以中心为中心的对象,这很可能存在于完整图像的随机作物中。然而,在诸如OpenImages或Coco的其他数据集中,其更像是真实世界未保健数据的代表,通常存在图像中的多个小对象。在这项工作中,我们表明,基于通常随机裁剪的自我监督学习在此类数据集中表现不佳。我们提出用从对象提案算法获得的作物取代一种或两种随机作物。这鼓励模型学习对象和场景级别语义表示。使用这种方法,我们调用对象感知裁剪,导致对分类和对象检测基准的场景裁剪的显着改进。例如,在OpenImages上,我们的方法可以使用基于Moco-V2的预训练来实现8.8%的提高8.8%地图。我们还显示了对Coco和Pascal-Voc对象检测和分割任务的显着改善,通过最先进的自我监督的学习方法。我们的方法是高效,简单且通用的,可用于最现有的对比和非对比的自我监督的学习框架。
translated by 谷歌翻译
尽管最近通过剩余网络的代表学习中的自我监督方法取得了进展,但它们仍然对ImageNet分类基准进行了高度的监督学习,限制了它们在性能关键设置中的适用性。在MITROVIC等人的现有理论上洞察中建立2021年,我们提出了RELICV2,其结合了明确的不变性损失,在各种适当构造的数据视图上具有对比的目标。 Relicv2在ImageNet上实现了77.1%的前1个分类准确性,使用线性评估使用Reset50架构和80.6%,具有较大的Reset型号,优于宽边缘以前的最先进的自我监督方法。最值得注意的是,RelicV2是使用一系列标准Reset架构始终如一地始终优先于类似的对比较中的监督基线的第一个表示学习方法。最后,我们表明,尽管使用Reset编码器,Relicv2可与最先进的自我监控视觉变压器相媲美。
translated by 谷歌翻译
语言变形金刚的成功主要归因于屏蔽语言建模(MLM)的借口任务,其中文本首先被致以语义有意义的作品。在这项工作中,我们研究了蒙面图像建模(MIM),并指出使用语义有意义的视觉销售器的优缺点。我们提出了一个自我监督的框架IBOT,可以使用在线标记器执行蒙版预测。具体而言,我们在蒙面的补丁令牌上进行自我蒸馏,并将教师网络作为在线标记器,以及在课堂上的自蒸馏来获取视觉语义。在线销售器与MIM目标和分配的多级培训管道共同学习,销售器需要预先预先培训。通过在Imagenet-1K上达到81.6%的线性探测精度和86.3%的微调精度来展示IBOT的突出。除了最先进的图像分类结果之外,我们强调了新兴的局部语义模式,这有助于模型对共同损坏获得强大的鲁棒性,并在密集的下游任务中实现领先的结果,例如,对象检测,实例分割和语义细分。
translated by 谷歌翻译
Building instance segmentation models that are dataefficient and can handle rare object categories is an important challenge in computer vision. Leveraging data augmentations is a promising direction towards addressing this challenge. Here, we perform a systematic study of the Copy-Paste augmentation (e.g., [13,12]) for instance segmentation where we randomly paste objects onto an image. Prior studies on Copy-Paste relied on modeling the surrounding visual context for pasting the objects. However, we find that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines. Furthermore, we show Copy-Paste is additive with semi-supervised methods that leverage extra data through pseudo labeling (e.g. self-training). On COCO instance segmentation, we achieve 49.1 mask AP and 57.3 box AP, an improvement of +0.6 mask AP and +1.5 box AP over the previous state-of-the-art. We further demonstrate that Copy-Paste can lead to significant improvements on the LVIS benchmark. Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3.6 mask AP on rare categories.
translated by 谷歌翻译
数据增强已被广泛用于改善深形网络的性能。提出了许多方法,例如丢弃,正则化和图像增强,以避免过度发出和增强神经网络的概括。数据增强中的一个子区域是图像混合和删除。这种特定类型的增强混合两个图像或删除图像区域以隐藏或制定困惑的图像的某些特征,以强制它强调图像中对象的整体结构。与此方法培训的模型表明,与未执行混合或删除的培训相比,该模型表现得很好。这种培训方法实现的额外福利是对图像损坏的鲁棒性。由于其最近的计算成本低,因此提出了许多图像混合和删除技术。本文对这些设计的方法提供了详细的审查,在三个主要类别中划分增强策略,切割和删除,切割和混合和混合。纸张的第二部分是评估这些方法的图像分类,微小的图像识别和对象检测方法,其中显示了这类数据增强提高了深度神经网络的整体性能。
translated by 谷歌翻译
神经网络可以从单个图像中了解视觉世界的内容是什么?虽然它显然不能包含存在的可能对象,场景和照明条件 - 在所有可能的256 ^(3x224x224)224尺寸的方形图像中,它仍然可以在自然图像之前提供强大的。为了分析这一假设,我们通过通过监控掠夺教师的知识蒸馏来制定一种训练神经网络的培训神经网络。有了这个,我们发现上述问题的答案是:“令人惊讶的是,很多”。在定量术语中,我们在CiFar-10/100上找到了94%/ 74%的前1个精度,在想象中,通过将这种方法扩展到音频,84%的语音组合。在广泛的分析中,我们解除了增强,源图像和网络架构的选择,以及在从未见过熊猫的网络中发现“熊猫神经元”。这项工作表明,一个图像可用于推断成千上万的对象类,并激励关于增强和图像的基本相互作用的更新的研究议程。
translated by 谷歌翻译