This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that bridges contrastive learning with clustering. PCL not only learns low-level features for the task of instance discrimination, but more importantly, it encodes semantic structures discovered by clustering into the learned embedding space. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks with substantial improvement in low-resource transfer learning. Code and pretrained models are available at https://github.com/salesforce/PCL.
translated by 谷歌翻译
通过对比学习,自我监督学习最近在视觉任务中显示了巨大的潜力,这旨在在数据集中区分每个图像或实例。然而,这种情况级别学习忽略了实例之间的语义关系,有时不希望地从语义上类似的样本中排斥锚,被称为“假否定”。在这项工作中,我们表明,对于具有更多语义概念的大规模数据集来说,虚假否定的不利影响更为重要。为了解决这个问题,我们提出了一种新颖的自我监督的对比学习框架,逐步地检测并明确地去除假阴性样本。具体地,在训练过程之后,考虑到编码器逐渐提高,嵌入空间变得更加语义结构,我们的方法动态地检测增加的高质量假否定。接下来,我们讨论两种策略,以明确地在对比学习期间明确地消除检测到的假阴性。广泛的实验表明,我们的框架在有限的资源设置中的多个基准上表现出其他自我监督的对比学习方法。
translated by 谷歌翻译
Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or "views") of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a "swapped" prediction mechanism where we predict the code of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.
translated by 谷歌翻译
我们介绍了代表学习(CARL)的一致分配,通过组合来自自我监督对比学习和深层聚类的思路来学习视觉表现的无监督学习方法。通过从聚类角度来看对比学习,Carl通过学习一组一般原型来学习无监督的表示,该原型用作能量锚来强制执行给定图像的不同视图被分配给相同的原型。与与深层聚类的对比学习的当代工作不同,Carl建议以在线方式学习一组一般原型,使用梯度下降,而无需使用非可微分算法或k手段来解决群集分配问题。卡尔在许多代表性学习基准中超越了竞争对手,包括线性评估,半监督学习和转移学习。
translated by 谷歌翻译
对比度学习最近在无监督的视觉表示学习中显示出巨大的潜力。在此轨道中的现有研究主要集中于图像内不变性学习。学习通常使用丰富的图像内变换来构建正对,然后使用对比度损失最大化一致性。相反,相互影响不变性的优点仍然少得多。利用图像间不变性的一个主要障碍是,尚不清楚如何可靠地构建图像间的正对,并进一步从它们中获得有效的监督,因为没有配对注释可用。在这项工作中,我们提出了一项全面的实证研究,以更好地了解从三个主要组成部分的形象间不变性学习的作用:伪标签维护,采样策略和决策边界设计。为了促进这项研究,我们引入了一个统一的通用框架,该框架支持无监督的内部和间形内不变性学习的整合。通过精心设计的比较和分析,揭示了多个有价值的观察结果:1)在线标签收敛速度比离线标签更快; 2)半硬性样品比硬否定样品更可靠和公正; 3)一个不太严格的决策边界更有利于形象间的不变性学习。借助所有获得的食谱,我们的最终模型(即InterCLR)对多个标准基准测试的最先进的内图内不变性学习方法表现出一致的改进。我们希望这项工作将为设计有效的无监督间歇性不变性学习提供有用的经验。代码:https://github.com/open-mmlab/mmselfsup。
translated by 谷歌翻译
现有的深度聚类方法依赖于对比学习的对比学习,这需要否定例子来形成嵌入空间,其中所有情况都处于良好分离状态。但是,否定的例子不可避免地引起阶级碰撞问题,损害了群集的表示学习。在本文中,我们探讨了对深度聚类的非对比表示学习,被称为NCC,其基于Byol,一种没有负例的代表性方法。首先,我们建议将一个增强的实例与嵌入空间中的另一个视图的邻居对齐,称为正抽样策略,该域避免了由否定示例引起的类碰撞问题,从而提高了集群内的紧凑性。其次,我们建议鼓励在所有原型中的一个原型和均匀性的两个增强视图之间的对准,命名的原型是原型的对比损失或protocl,这可以最大化簇间距离。此外,我们在期望 - 最大化(EM)框架中制定了NCC,其中E-Step利用球面K手段来估计实例的伪标签和来自目标网络的原型的分布,并且M-Step利用了所提出的损失优化在线网络。结果,NCC形成了一个嵌入空间,其中所有集群都处于分离良好,而内部示例都很紧凑。在包括ImageNet-1K的几个聚类基准数据集上的实验结果证明了NCC优于最先进的方法,通过显着的余量。
translated by 谷歌翻译
虽然自我监督的表示学习(SSL)在大型模型中证明是有效的,但在遵循相同的解决方案时,轻量级模型中的SSL和监督方法之间仍然存在巨大差距。我们深入研究这个问题,发现轻量级模型在简单地执行实例对比时易于在语义空间中崩溃。为了解决这个问题,我们提出了一种与关系知识蒸馏(REKD)的关系方面的对比范例。我们介绍一个异构教师,明确地挖掘语义信息并将新颖的关系知识转移到学生(轻量级模型)。理论分析支持我们对案例对比度的主要担忧,验证了我们关系的对比学习的有效性。广泛的实验结果还表明,我们的方法达到了多种轻量级模型的显着改进。特别是,亚历谢的线性评估显然将目前的最先进从44.7%提高到50.1%,这是第一个接近监督50.5%的工作。代码将可用。
translated by 谷歌翻译
Self-supervised learning (SSL) is rapidly closing BARLOW TWINS is competitive with state-of-the-art methods for self-supervised learning while being conceptually simpler, naturally avoiding trivial constant (i.e. collapsed) embeddings, and being robust to the training batch size.
translated by 谷歌翻译
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive selfsupervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by Sim-CLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-ofthe-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100× fewer labels. 1
translated by 谷歌翻译
Graph Contrastive Learning (GCL) has recently drawn much research interest for learning generalizable node representations in a self-supervised manner. In general, the contrastive learning process in GCL is performed on top of the representations learned by a graph neural network (GNN) backbone, which transforms and propagates the node contextual information based on its local neighborhoods. However, nodes sharing similar characteristics may not always be geographically close, which poses a great challenge for unsupervised GCL efforts due to their inherent limitations in capturing such global graph knowledge. In this work, we address their inherent limitations by proposing a simple yet effective framework -- Simple Neural Networks with Structural and Semantic Contrastive Learning} (S^3-CL). Notably, by virtue of the proposed structural and semantic contrastive learning algorithms, even a simple neural network can learn expressive node representations that preserve valuable global structural and semantic patterns. Our experiments demonstrate that the node representations learned by S^3-CL achieve superior performance on different downstream tasks compared with the state-of-the-art unsupervised GCL methods. Implementation and more experimental details are publicly available at \url{https://github.com/kaize0409/S-3-CL.}
translated by 谷歌翻译
我们通过以端到端的方式对大规模未标记的数据集进行分类,呈现扭曲,简单和理论上可解释的自我监督的表示学习方法。我们使用Softmax操作终止的暹罗网络,以产生两个增强图像的双类分布。没有监督,我们强制执行不同增强的班级分布。但是,只需最小化增强之间的分歧将导致折叠解决方案,即,输出所有图像的相同类概率分布。在这种情况下,留下有关输入图像的信息。为了解决这个问题,我们建议最大化输入和课程预测之间的互信息。具体地,我们最小化每个样品的分布的熵,使每个样品的课程预测是对每个样品自信的预测,并最大化平均分布的熵,以使不同样品的预测变得不同。以这种方式,扭曲可以自然地避免没有特定设计的折叠解决方案,例如非对称网络,停止梯度操作或动量编码器。因此,扭曲优于各种任务的最先进的方法。特别是,在半监督学习中,扭曲令人惊讶地表现出令人惊讶的是,使用Reset-50作为骨干的1%ImageNet标签实现61.2%的顶级精度,以前的最佳结果为6.2%。代码和预先训练的模型是给出的:https://github.com/byteDance/twist
translated by 谷歌翻译
We present DetCo, a simple yet effective self-supervised approach for object detection. Unsupervised pre-training methods have been recently designed for object detection, but they are usually deficient in image classification, or the opposite. Unlike them, DetCo transfers well on downstream instance-level dense prediction tasks, while maintaining competitive image-level classification accuracy. The advantages are derived from (1) multi-level supervision to intermediate representations, (2) contrastive learning between global image and local patches. These two designs facilitate discriminative and consistent global and local representation at each level of feature pyramid, improving detection and classification, simultaneously.Extensive experiments on VOC, COCO, Cityscapes, and ImageNet demonstrate that DetCo not only outperforms recent methods on a series of 2D and 3D instance-level detection tasks, but also competitive on image classification. For example, on ImageNet classification, DetCo is 6.9% and 5.0% top-1 accuracy better than InsLoc and DenseCL, which are two contemporary works designed for object detection. Moreover, on COCO detection, DetCo is 6.9 AP better than SwAV with Mask R-CNN C4. Notably, DetCo largely boosts up Sparse R-CNN, a recent strong detector, from 45.0 AP to 46.5 AP (+1.5 AP), establishing a new SOTA on COCO. Code is available.
translated by 谷歌翻译
对比性自我监督学习(CSL)是一种实用解决方案,它以无监督的方法从大量数据中学习有意义的视觉表示。普通的CSL将从神经网络提取的特征嵌入到特定的拓扑结构上。在训练进度期间,对比度损失将同一输入的不同视图融合在一起,同时将不同输入分开的嵌入。 CSL的缺点之一是,损失项需要大量的负样本才能提供更好的相互信息理想。但是,通过较大的运行批量大小增加负样本的数量也增强了错误的负面影响:语义上相似的样品与锚分开,因此降低了下游性能。在本文中,我们通过引入一个简单但有效的对比学习框架来解决这个问题。关键的见解是使用暹罗风格的度量损失来匹配原型内特征,同时增加了原型间特征之间的距离。我们对各种基准测试进行了广泛的实验,其中结果证明了我们方法在提高视觉表示质量方面的有效性。具体而言,我们使用线性探针的无监督预训练的Resnet-50在Imagenet-1K数据集上超过了受访的训练有素的版本。
translated by 谷歌翻译
Contrastive learning has become a key component of self-supervised learning approaches for computer vision. By learning to embed two augmented versions of the same image close to each other and to push the embeddings of different images apart, one can train highly transferable visual representations. As revealed by recent studies, heavy data augmentation and large sets of negatives are both crucial in learning such representations. At the same time, data mixing strategies, either at the image or the feature level, improve both supervised and semi-supervised learning by synthesizing novel examples, forcing networks to learn more robust features. In this paper, we argue that an important aspect of contrastive learning, i.e. the effect of hard negatives, has so far been neglected. To get more meaningful negative samples, current top contrastive self-supervised learning approaches either substantially increase the batch sizes, or keep very large memory banks; increasing memory requirements, however, leads to diminishing returns in terms of performance. We therefore start by delving deeper into a top-performing framework and show evidence that harder negatives are needed to facilitate better and faster learning. Based on these observations, and motivated by the success of data mixing, we propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead. We exhaustively ablate our approach on linear classification, object detection, and instance segmentation and show that employing our hard negative mixing procedure improves the quality of visual representations learned by a state-of-the-art self-supervised learning method.Project page: https://europe.naverlabs.com/mochi 34th Conference on Neural Information Processing Systems (NeurIPS 2020),
translated by 谷歌翻译
对比的自我监督学习在很大程度上缩小了对想象成的预先训练的差距。然而,它的成功高度依赖于想象成的以对象形象,即相同图像的不同增强视图对应于相同的对象。当预先训练在具有许多物体的更复杂的场景图像上,如此重种策划约束会立即不可行。为了克服这一限制,我们介绍了对象级表示学习(ORL),这是一个新的自我监督的学习框架迈向场景图像。我们的主要洞察力是利用图像级自我监督的预培训作为发现对象级语义对应之前的,从而实现了从场景图像中学习的对象级表示。对Coco的广泛实验表明,ORL显着提高了自我监督学习在场景图像上的性能,甚至超过了在几个下游任务上的监督Imagenet预训练。此外,当可用更加解标的场景图像时,ORL提高了下游性能,证明其在野外利用未标记数据的巨大潜力。我们希望我们的方法可以激励未来的研究从场景数据的更多通用无人监督的代表。
translated by 谷歌翻译
尽管已显示自我监督的学习受益于许多视觉任务,但现有技术主要集中在图像级操作上,这可能无法很好地概括为补丁或像素级别的下游任务。此外,现有的SSL方法可能无法充分描述和关联图像量表内和跨图像量表的上述表示。在本文中,我们提出了一个自制的金字塔表示学习(SS-PRL)框架。所提出的SS-PRL旨在通过学习适当的原型在斑块级别得出金字塔表示,并在图像中观察和关联固有的语义信息。特别是,我们在SS-PRL中提出了跨尺度贴片级的相关性学习,该学习允许模型汇总和关联信息跨贴片量表。我们表明,借助我们提出的用于模型预训练的SS-PRL,可以轻松适应和调整模型,以适应各种应用程序,包括多标签分类,对象检测和实例分割。
translated by 谷歌翻译
最近先进的无监督学习方法使用暹罗样框架来比较来自同一图像的两个“视图”以进行学习表示。使两个视图独特是一种保证无监督方法可以学习有意义的信息的核心。但是,如果使用用于生成两个视图的增强不足够强度,此类框架有时会易碎过度装备,导致培训数据上的过度自信的问题。此缺点会阻碍模型,从学习微妙方差和细粒度信息。为了解决这个问题,在这项工作中,我们的目标是涉及在无监督的学习中的标签空间上的距离概念,并让模型通过混合输入数据空间来了解正面或负对对之间的柔和程度,以便协同工作输入和损耗空间。尽管其概念性简单,我们凭借解决的解决方案 - 无监督图像混合(UN-MIX),我们可以从转换的输入和相应的新标签空间中学习Subtler,更强大和广义表示。广泛的实验在CiFar-10,CiFar-100,STL-10,微小的想象和标准想象中进行了流行的无人监督方法SIMCLR,BYOL,MOCO V1和V2,SWAV等。我们所提出的图像混合物和标签分配策略可以获得一致的改进在完全相同的超参数和基础方法的培训程序之后1〜3%。代码在https://github.com/szq0214/un-mix上公开提供。
translated by 谷歌翻译
使用超越欧几里德距离的神经网络,深入的Bregman分歧测量数据点的分歧,并且能够捕获分布的发散。在本文中,我们提出了深深的布利曼对视觉表现的对比学习的分歧,我们的目标是通过基于功能Bregman分歧培训额外的网络来提高自我监督学习中使用的对比损失。与完全基于单点之间的分歧的传统对比学学习方法相比,我们的框架可以捕获分布之间的发散,这提高了学习表示的质量。我们展示了传统的对比损失和我们提出的分歧损失优于基线的结合,并且最先前的自我监督和半监督学习的大多数方法在多个分类和对象检测任务和数据集中。此外,学习的陈述在转移到其他数据集和任务时概括了良好。源代码和我们的型号可用于补充,并将通过纸张释放。
translated by 谷歌翻译
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning [29] as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
translated by 谷歌翻译
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL -- representation learning and label disambiguation -- in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
translated by 谷歌翻译