使用超越欧几里德距离的神经网络,深入的Bregman分歧测量数据点的分歧,并且能够捕获分布的发散。在本文中,我们提出了深深的布利曼对视觉表现的对比学习的分歧,我们的目标是通过基于功能Bregman分歧培训额外的网络来提高自我监督学习中使用的对比损失。与完全基于单点之间的分歧的传统对比学学习方法相比,我们的框架可以捕获分布之间的发散,这提高了学习表示的质量。我们展示了传统的对比损失和我们提出的分歧损失优于基线的结合,并且最先前的自我监督和半监督学习的大多数方法在多个分类和对象检测任务和数据集中。此外,学习的陈述在转移到其他数据集和任务时概括了良好。源代码和我们的型号可用于补充,并将通过纸张释放。
translated by 谷歌翻译
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive selfsupervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by Sim-CLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-ofthe-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100× fewer labels. 1
translated by 谷歌翻译
对比度学习是视觉表示学习最成功的方法之一,可以通过在学习的表示上共同执行聚类来进一步提高其性能。但是,现有的联合聚类和对比度学习的方法在长尾数据分布上表现不佳,因为多数班级压倒了少数群体的损失,从而阻止了学习有意义的表示形式。由此激励,我们通过适应偏见的对比损失,以避免群集中的少数群体类别的不平衡数据集来开发一种新颖的联合聚类和对比度学习框架。我们表明,我们提出的修改后的对比损失和分歧聚类损失可改善多个数据集和学习任务的性能。源代码可从https://anonymon.4open.science/r/ssl-debiased-clustering获得
translated by 谷歌翻译
从积极和未标记的(PU)数据中学习是一种设置,学习者只能访问正面和未标记的样本,而没有关于负面示例的信息。这种PU环境在各种任务中非常重要,例如医学诊断,社交网络分析,金融市场分析和知识基础完成,这些任务也往往本质上是不平衡的,即大多数示例实际上是负面的。但是,大多数现有的PU学习方法仅考虑人工平衡的数据集,目前尚不清楚它们在不平衡和长尾数据分布的现实情况下的表现如何。本文提议通过强大而有效的自我监督预处理来应对这一挑战。但是,培训传统的自我监督学习方法使用高度不平衡的PU分布需要更好的重新重新制定。在本文中,我们提出\ textit {Impulses},这是\ usewanced {im}平衡\下划线{p} osive \ unesive \ usepline {u} nlabeLed \ underline {l}的统一表示的学习框架{p}。 \下划线{s}削弱了debiase预训练。 Impulses使用大规模无监督学习的通用组合以及对比度损失和额外重新持续的PU损失的一般组合。我们在多个数据集上进行了不同的实验,以表明Impuls能够使先前最新的错误率减半,即使与先前给出的真实先验的方法相比。此外,即使在无关的数据集上进行了预处理,我们的方法也表现出对事先错误指定和卓越性能的鲁棒性。我们预计,这种稳健性和效率将使从业者更容易在其他感兴趣的PU数据集上获得出色的结果。源代码可在\ url {https://github.com/jschweisthal/impulses}中获得
translated by 谷歌翻译
We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches 74.3% top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and 79.6% with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub. 3 * Equal contribution; the order of first authors was randomly selected.
translated by 谷歌翻译
Contrastive learning has become a key component of self-supervised learning approaches for computer vision. By learning to embed two augmented versions of the same image close to each other and to push the embeddings of different images apart, one can train highly transferable visual representations. As revealed by recent studies, heavy data augmentation and large sets of negatives are both crucial in learning such representations. At the same time, data mixing strategies, either at the image or the feature level, improve both supervised and semi-supervised learning by synthesizing novel examples, forcing networks to learn more robust features. In this paper, we argue that an important aspect of contrastive learning, i.e. the effect of hard negatives, has so far been neglected. To get more meaningful negative samples, current top contrastive self-supervised learning approaches either substantially increase the batch sizes, or keep very large memory banks; increasing memory requirements, however, leads to diminishing returns in terms of performance. We therefore start by delving deeper into a top-performing framework and show evidence that harder negatives are needed to facilitate better and faster learning. Based on these observations, and motivated by the success of data mixing, we propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead. We exhaustively ablate our approach on linear classification, object detection, and instance segmentation and show that employing our hard negative mixing procedure improves the quality of visual representations learned by a state-of-the-art self-supervised learning method.Project page: https://europe.naverlabs.com/mochi 34th Conference on Neural Information Processing Systems (NeurIPS 2020),
translated by 谷歌翻译
对比性自我监督表示方法学习方法最大程度地提高了正对之间的相似性,同时倾向于最大程度地减少负对之间的相似性。但是,总的来说,负面对之间的相互作用被忽略了,因为它们没有根据其特定差异和相似性而采用的特殊机制来对待负面对。在本文中,我们提出了扩展的动量对比(Xmoco),这是一种基于MOCO家族配置中提出的动量编码单元的遗产,一种自我监督的表示方法。为此,我们引入了交叉一致性正则化损失,并通过该损失将转换一致性扩展到不同图像(负对)。在交叉一致性正则化规则下,我们认为与任何一对图像(正或负)相关的语义表示应在借口转换下保留其交叉相似性。此外,我们通过在批处理上的负面对上实施相似性的均匀分布来进一步规范训练损失。可以轻松地将所提出的正规化添加到现有的自我监督学习算法中。从经验上讲,我们报告了标准Imagenet-1K线性头部分类基准的竞争性能。此外,通过将学习的表示形式转移到常见的下游任务中,我们表明,将Xmoco与普遍使用的增强功能一起使用可以改善此类任务的性能。我们希望本文的发现是研究人员考虑自我监督学习中负面例子的重要相互作用的动机。
translated by 谷歌翻译
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that bridges contrastive learning with clustering. PCL not only learns low-level features for the task of instance discrimination, but more importantly, it encodes semantic structures discovered by clustering into the learned embedding space. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks with substantial improvement in low-resource transfer learning. Code and pretrained models are available at https://github.com/salesforce/PCL.
translated by 谷歌翻译
最近先进的无监督学习方法使用暹罗样框架来比较来自同一图像的两个“视图”以进行学习表示。使两个视图独特是一种保证无监督方法可以学习有意义的信息的核心。但是,如果使用用于生成两个视图的增强不足够强度,此类框架有时会易碎过度装备,导致培训数据上的过度自信的问题。此缺点会阻碍模型,从学习微妙方差和细粒度信息。为了解决这个问题,在这项工作中,我们的目标是涉及在无监督的学习中的标签空间上的距离概念,并让模型通过混合输入数据空间来了解正面或负对对之间的柔和程度,以便协同工作输入和损耗空间。尽管其概念性简单,我们凭借解决的解决方案 - 无监督图像混合(UN-MIX),我们可以从转换的输入和相应的新标签空间中学习Subtler,更强大和广义表示。广泛的实验在CiFar-10,CiFar-100,STL-10,微小的想象和标准想象中进行了流行的无人监督方法SIMCLR,BYOL,MOCO V1和V2,SWAV等。我们所提出的图像混合物和标签分配策略可以获得一致的改进在完全相同的超参数和基础方法的培训程序之后1〜3%。代码在https://github.com/szq0214/un-mix上公开提供。
translated by 谷歌翻译
Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or "views") of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a "swapped" prediction mechanism where we predict the code of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.
translated by 谷歌翻译
通过对比学习,自我监督学习最近在视觉任务中显示了巨大的潜力,这旨在在数据集中区分每个图像或实例。然而,这种情况级别学习忽略了实例之间的语义关系,有时不希望地从语义上类似的样本中排斥锚,被称为“假否定”。在这项工作中,我们表明,对于具有更多语义概念的大规模数据集来说,虚假否定的不利影响更为重要。为了解决这个问题,我们提出了一种新颖的自我监督的对比学习框架,逐步地检测并明确地去除假阴性样本。具体地,在训练过程之后,考虑到编码器逐渐提高,嵌入空间变得更加语义结构,我们的方法动态地检测增加的高质量假否定。接下来,我们讨论两种策略,以明确地在对比学习期间明确地消除检测到的假阴性。广泛的实验表明,我们的框架在有限的资源设置中的多个基准上表现出其他自我监督的对比学习方法。
translated by 谷歌翻译
我们介绍了代表学习(CARL)的一致分配,通过组合来自自我监督对比学习和深层聚类的思路来学习视觉表现的无监督学习方法。通过从聚类角度来看对比学习,Carl通过学习一组一般原型来学习无监督的表示,该原型用作能量锚来强制执行给定图像的不同视图被分配给相同的原型。与与深层聚类的对比学习的当代工作不同,Carl建议以在线方式学习一组一般原型,使用梯度下降,而无需使用非可微分算法或k手段来解决群集分配问题。卡尔在许多代表性学习基准中超越了竞争对手,包括线性评估,半监督学习和转移学习。
translated by 谷歌翻译
对比度学习最近在无监督的视觉表示学习中显示出巨大的潜力。在此轨道中的现有研究主要集中于图像内不变性学习。学习通常使用丰富的图像内变换来构建正对,然后使用对比度损失最大化一致性。相反,相互影响不变性的优点仍然少得多。利用图像间不变性的一个主要障碍是,尚不清楚如何可靠地构建图像间的正对,并进一步从它们中获得有效的监督,因为没有配对注释可用。在这项工作中,我们提出了一项全面的实证研究,以更好地了解从三个主要组成部分的形象间不变性学习的作用:伪标签维护,采样策略和决策边界设计。为了促进这项研究,我们引入了一个统一的通用框架,该框架支持无监督的内部和间形内不变性学习的整合。通过精心设计的比较和分析,揭示了多个有价值的观察结果:1)在线标签收敛速度比离线标签更快; 2)半硬性样品比硬否定样品更可靠和公正; 3)一个不太严格的决策边界更有利于形象间的不变性学习。借助所有获得的食谱,我们的最终模型(即InterCLR)对多个标准基准测试的最先进的内图内不变性学习方法表现出一致的改进。我们希望这项工作将为设计有效的无监督间歇性不变性学习提供有用的经验。代码:https://github.com/open-mmlab/mmselfsup。
translated by 谷歌翻译
尽管增加了大量的增强家庭,但只有几个樱桃采摘的稳健增强政策有利于自我监督的图像代表学习。在本文中,我们提出了一个定向自我监督的学习范式(DSSL),其与显着的增强符号兼容。具体而言,我们在用标准增强的视图轻度增强后调整重增强策略,以产生更难的视图(HV)。 HV通常具有与原始图像较高的偏差而不是轻度增强的标准视图(SV)。与以前的方法不同,同等对称地将所有增强视图对称地最大化它们的相似性,DSSL将相同实例的增强视图视为部分有序集(具有SV $ \ LeftrightArrow $ SV,SV $ \左路$ HV),然后装备一个定向目标函数尊重视图之间的衍生关系。 DSSL可以轻松地用几行代码实现,并且对于流行的自我监督学习框架非常灵活,包括SIMCLR,Simsiam,Byol。对CiFar和Imagenet的广泛实验结果表明,DSSL可以稳定地改善各种基线,其兼容性与更广泛的增强。
translated by 谷歌翻译
许多最近的自我监督学习方法在图像分类和其他任务上表现出了令人印象深刻的表现。已经使用了一种令人困惑的多种技术,并不总是清楚地了解其收益的原因,尤其是在组合使用时。在这里,我们将图像的嵌入视为点粒子,并将模型优化视为该粒子系统上的动态过程。我们的动态模型结合了类似图像的吸引力,避免局部崩溃的局部分散力以及实现颗粒的全球均匀分布的全局分散力。动态透视图突出了使用延迟参数图像嵌入(a la byol)以及同一图像的多个视图的优点。它还使用纯动态的局部分散力(布朗运动),该分散力比其他方法显示出改善的性能,并且不需要其他粒子坐标的知识。该方法称为MSBREG,代表(i)多视质心损失,它施加了吸引力的力来将不同的图像视图嵌入到其质心上,(ii)奇异值损失,将粒子系统推向空间均匀的密度( iii)布朗扩散损失。我们评估MSBREG在ImageNet上的下游分类性能以及转移学习任务,包括细粒度分类,多类对象分类,对象检测和实例分段。此外,我们还表明,将我们的正则化术语应用于其他方法,进一步改善了其性能并通过防止模式崩溃来稳定训练。
translated by 谷歌翻译
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the Ima-geNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions, and is more stable to hyperparameter settings such as optimizers and data augmentations. Our loss function is simple to implement and reference TensorFlow code is released at https://t.ly/supcon 1 .
translated by 谷歌翻译
对比度学习(CL)方法有效地学习数据表示,而无需标记监督,在该方法中,编码器通过单VS-MONY SOFTMAX跨透镜损失将每个正样本在多个负样本上对比。通过利用大量未标记的图像数据,在Imagenet上预先训练时,最近的CL方法获得了有希望的结果,这是一个具有均衡图像类的曲制曲线曲线集。但是,当对野外图像进行预训练时,它们往往会产生较差的性能。在本文中,为了进一步提高CL的性能并增强其对未经保育数据集的鲁棒性,我们提出了一种双重的CL策略,该策略将其内部查询的正(负)样本对比,然后才能决定多么强烈地拉动(推)。我们通过对比度吸引力和对比度排斥(CACR)意识到这一策略,这使得查询不仅发挥了更大的力量来吸引更遥远的正样本,而且可以驱除更接近的负面样本。理论分析表明,CACR通过考虑正/阴性样品的分布之间的差异来概括CL的行为,而正/负样品的分布通常与查询独立进行采样,并且它们的真实条件分布给出了查询。我们证明了这种独特的阳性吸引力和阴性排斥机制,这有助于消除在数据集的策划较低时尤其有益于数据及其潜在表示的统一先验分布的需求。对许多标准视觉任务进行的大规模大规模实验表明,CACR不仅在表示学习中的基准数据集上始终优于现有的CL方法,而且在对不平衡图像数据集进行预训练时,还表现出更好的鲁棒性。
translated by 谷歌翻译
在对比学习中,最近的进步表现出了出色的表现。但是,绝大多数方法仅限于封闭世界的环境。在本文中,我们通过挖掘开放世界的环境来丰富表示学习的景观,其中新颖阶级的未标记样本自然可以在野外出现。为了弥合差距,我们引入了一个新的学习框架,开放世界的对比学习(Opencon)。Opencon应对已知和新颖阶级学习紧凑的表现的挑战,并促进了一路上的新颖性发现。我们证明了Opencon在挑战基准数据集中的有效性并建立竞争性能。在Imagenet数据集上,Opencon在新颖和总体分类精度上分别胜过当前最佳方法的最佳方法,分别胜过11.9%和7.4%。我们希望我们的工作能为未来的工作打开新的大门,以解决这一重要问题。
translated by 谷歌翻译
我们通过以端到端的方式对大规模未标记的数据集进行分类,呈现扭曲,简单和理论上可解释的自我监督的表示学习方法。我们使用Softmax操作终止的暹罗网络,以产生两个增强图像的双类分布。没有监督,我们强制执行不同增强的班级分布。但是,只需最小化增强之间的分歧将导致折叠解决方案,即,输出所有图像的相同类概率分布。在这种情况下,留下有关输入图像的信息。为了解决这个问题,我们建议最大化输入和课程预测之间的互信息。具体地,我们最小化每个样品的分布的熵,使每个样品的课程预测是对每个样品自信的预测,并最大化平均分布的熵,以使不同样品的预测变得不同。以这种方式,扭曲可以自然地避免没有特定设计的折叠解决方案,例如非对称网络,停止梯度操作或动量编码器。因此,扭曲优于各种任务的最先进的方法。特别是,在半监督学习中,扭曲令人惊讶地表现出令人惊讶的是,使用Reset-50作为骨干的1%ImageNet标签实现61.2%的顶级精度,以前的最佳结果为6.2%。代码和预先训练的模型是给出的:https://github.com/byteDance/twist
translated by 谷歌翻译
The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations. Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as "pearl") that learns invariant representations based on pretext tasks. We use PIRL with a commonly used pretext task that involves solving jigsaw puzzles. We find that PIRL substantially improves the semantic quality of the learned image representations. Our approach sets a new stateof-the-art in self-supervised learning from images on several popular benchmarks for self-supervised learning. Despite being unsupervised, PIRL outperforms supervised pre-training in learning image representations for object detection. Altogether, our results demonstrate the potential of self-supervised representations with good invariance properties.
translated by 谷歌翻译