Recently, webly supervised learning (WSL) has been studied to leverage numerous and accessible data from the Internet. Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain. However, only by tackling the performance gap above can we fully exploit the practical value of web datasets. To this end, we propose a Few-shot guided Prototypical (FoPro) representation learning method, which only needs a few labeled examples from reality and can significantly improve the performance in the real-world domain. Specifically, we initialize each class center with few-shot real-world data as the ``realistic" prototype. Then, the intra-class distance between web instances and ``realistic" prototypes is narrowed by contrastive learning. Finally, we measure image-prototype distance with a learnable metric. Prototypes are polished by adjacent high-quality web images and involved in removing distant out-of-distribution samples. In experiments, FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Compared with existing WSL methods under the same few-shot settings, FoPro still excels in real-world generalization. Code is available at https://github.com/yuleiqin/fopro.
translated by 谷歌翻译
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that bridges contrastive learning with clustering. PCL not only learns low-level features for the task of instance discrimination, but more importantly, it encodes semantic structures discovered by clustering into the learned embedding space. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks with substantial improvement in low-resource transfer learning. Code and pretrained models are available at https://github.com/salesforce/PCL.
translated by 谷歌翻译
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL -- representation learning and label disambiguation -- in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
translated by 谷歌翻译
通过对比学习,自我监督学习最近在视觉任务中显示了巨大的潜力,这旨在在数据集中区分每个图像或实例。然而,这种情况级别学习忽略了实例之间的语义关系,有时不希望地从语义上类似的样本中排斥锚,被称为“假否定”。在这项工作中,我们表明,对于具有更多语义概念的大规模数据集来说,虚假否定的不利影响更为重要。为了解决这个问题,我们提出了一种新颖的自我监督的对比学习框架,逐步地检测并明确地去除假阴性样本。具体地,在训练过程之后,考虑到编码器逐渐提高,嵌入空间变得更加语义结构,我们的方法动态地检测增加的高质量假否定。接下来,我们讨论两种策略,以明确地在对比学习期间明确地消除检测到的假阴性。广泛的实验表明,我们的框架在有限的资源设置中的多个基准上表现出其他自我监督的对比学习方法。
translated by 谷歌翻译
对比度学习最近在无监督的视觉表示学习中显示出巨大的潜力。在此轨道中的现有研究主要集中于图像内不变性学习。学习通常使用丰富的图像内变换来构建正对,然后使用对比度损失最大化一致性。相反,相互影响不变性的优点仍然少得多。利用图像间不变性的一个主要障碍是,尚不清楚如何可靠地构建图像间的正对,并进一步从它们中获得有效的监督,因为没有配对注释可用。在这项工作中,我们提出了一项全面的实证研究,以更好地了解从三个主要组成部分的形象间不变性学习的作用:伪标签维护,采样策略和决策边界设计。为了促进这项研究,我们引入了一个统一的通用框架,该框架支持无监督的内部和间形内不变性学习的整合。通过精心设计的比较和分析,揭示了多个有价值的观察结果:1)在线标签收敛速度比离线标签更快; 2)半硬性样品比硬否定样品更可靠和公正; 3)一个不太严格的决策边界更有利于形象间的不变性学习。借助所有获得的食谱,我们的最终模型(即InterCLR)对多个标准基准测试的最先进的内图内不变性学习方法表现出一致的改进。我们希望这项工作将为设计有效的无监督间歇性不变性学习提供有用的经验。代码:https://github.com/open-mmlab/mmselfsup。
translated by 谷歌翻译
在对比学习中,最近的进步表现出了出色的表现。但是,绝大多数方法仅限于封闭世界的环境。在本文中,我们通过挖掘开放世界的环境来丰富表示学习的景观,其中新颖阶级的未标记样本自然可以在野外出现。为了弥合差距,我们引入了一个新的学习框架,开放世界的对比学习(Opencon)。Opencon应对已知和新颖阶级学习紧凑的表现的挑战,并促进了一路上的新颖性发现。我们证明了Opencon在挑战基准数据集中的有效性并建立竞争性能。在Imagenet数据集上,Opencon在新颖和总体分类精度上分别胜过当前最佳方法的最佳方法,分别胜过11.9%和7.4%。我们希望我们的工作能为未来的工作打开新的大门,以解决这一重要问题。
translated by 谷歌翻译
虽然自我监督的表示学习(SSL)在大型模型中证明是有效的,但在遵循相同的解决方案时,轻量级模型中的SSL和监督方法之间仍然存在巨大差距。我们深入研究这个问题,发现轻量级模型在简单地执行实例对比时易于在语义空间中崩溃。为了解决这个问题,我们提出了一种与关系知识蒸馏(REKD)的关系方面的对比范例。我们介绍一个异构教师,明确地挖掘语义信息并将新颖的关系知识转移到学生(轻量级模型)。理论分析支持我们对案例对比度的主要担忧,验证了我们关系的对比学习的有效性。广泛的实验结果还表明,我们的方法达到了多种轻量级模型的显着改进。特别是,亚历谢的线性评估显然将目前的最先进从44.7%提高到50.1%,这是第一个接近监督50.5%的工作。代码将可用。
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
很少有图像分类是一个具有挑战性的问题,旨在仅基于少量培训图像来达到人类的识别水平。少数图像分类的一种主要解决方案是深度度量学习。这些方法是,通过将看不见的样本根据距离的距离进行分类,可在强大的深神经网络中学到的嵌入空间中看到的样品,可以避免以少数图像分类的少数训练图像过度拟合,并实现了最新的图像表现。在本文中,我们提供了对深度度量学习方法的最新审查,以进行2018年至2022年的少量图像分类,并根据度量学习的三个阶段将它们分为三组,即学习功能嵌入,学习课堂表示和学习距离措施。通过这种分类法,我们确定了他们面临的不同方法和问题的新颖性。我们通过讨论当前的挑战和未来趋势进行了少量图像分类的讨论。
translated by 谷歌翻译
Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data. Typically, the cleaner is obtained via fitting a mixture model to the distribution of per-sample training losses. However, the modeling procedure is \emph{class agnostic} and assumes the loss distributions of clean and noise samples are the same across different classes. Unfortunately, in practice, such an assumption does not always hold due to the varying learning difficulty of different classes, thus leading to sub-optimal label noise partition criteria. In this work, we reveal this long-ignored problem and propose a simple yet effective solution, named \textbf{C}lass \textbf{P}rototype-based label noise \textbf{C}leaner (\textbf{CPC}). Unlike previous works treating all the classes equally, CPC fully considers loss distribution heterogeneity and applies class-aware modulation to partition the clean and noise data. CPC takes advantage of loss distribution modeling and intra-class consistency regularization in feature space simultaneously and thus can better distinguish clean and noise labels. We theoretically justify the effectiveness of our method by explaining it from the Expectation-Maximization (EM) framework. Extensive experiments are conducted on the noisy-label benchmarks CIFAR-10, CIFAR-100, Clothing1M and WebVision. The results show that CPC consistently brings about performance improvement across all benchmarks. Codes and pre-trained models will be released at \url{https://github.com/hjjpku/CPC.git}.
translated by 谷歌翻译
现有的深度聚类方法依赖于对比学习的对比学习,这需要否定例子来形成嵌入空间,其中所有情况都处于良好分离状态。但是,否定的例子不可避免地引起阶级碰撞问题,损害了群集的表示学习。在本文中,我们探讨了对深度聚类的非对比表示学习,被称为NCC,其基于Byol,一种没有负例的代表性方法。首先,我们建议将一个增强的实例与嵌入空间中的另一个视图的邻居对齐,称为正抽样策略,该域避免了由否定示例引起的类碰撞问题,从而提高了集群内的紧凑性。其次,我们建议鼓励在所有原型中的一个原型和均匀性的两个增强视图之间的对准,命名的原型是原型的对比损失或protocl,这可以最大化簇间距离。此外,我们在期望 - 最大化(EM)框架中制定了NCC,其中E-Step利用球面K手段来估计实例的伪标签和来自目标网络的原型的分布,并且M-Step利用了所提出的损失优化在线网络。结果,NCC形成了一个嵌入空间,其中所有集群都处于分离良好,而内部示例都很紧凑。在包括ImageNet-1K的几个聚类基准数据集上的实验结果证明了NCC优于最先进的方法,通过显着的余量。
translated by 谷歌翻译
Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.
translated by 谷歌翻译
Cross entropy loss has served as the main objective function for classification-based tasks. Widely deployed for learning neural network classifiers, it shows both effectiveness and a probabilistic interpretation. Recently, after the success of self supervised contrastive representation learning methods, supervised contrastive methods have been proposed to learn representations and have shown superior and more robust performance, compared to solely training with cross entropy loss. However, cross entropy loss is still needed to train the final classification layer. In this work, we investigate the possibility of learning both the representation and the classifier using one objective function that combines the robustness of contrastive learning and the probabilistic interpretation of cross entropy loss. First, we revisit a previously proposed contrastive-based objective function that approximates cross entropy loss and present a simple extension to learn the classifier jointly. Second, we propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network. We empirically show that our proposed objective functions show a significant improvement over the standard cross entropy loss with more training stability and robustness in various challenging settings.
translated by 谷歌翻译
在新课程训练时,几乎没有射击学习(FSL)方法通常假设具有准确标记的样品的清洁支持集。这个假设通常可能是不现实的:支持集,无论多么小,仍然可能包括标签错误的样本。因此,对标签噪声的鲁棒性对于FSL方法是实用的,但是这个问题令人惊讶地在很大程度上没有探索。为了解决FSL设置中标签错误的样品,我们做出了一些技术贡献。 (1)我们提供了简单而有效的特征聚合方法,改善了流行的FSL技术Protonet使用的原型。 (2)我们描述了一种嘈杂的噪声学习的新型变压器模型(TRANFS)。 TRANFS利用变压器的注意机制称重标记为错误的样品。 (3)最后,我们对迷你胶原和tieredimagenet的嘈杂版本进行了广泛的测试。我们的结果表明,TRANFS与清洁支持集的领先FSL方法相对应,但到目前为止,在存在标签噪声的情况下,它们的表现优于它们。
translated by 谷歌翻译
创建图像数据集时,使用搜索引擎进行Web图像检索是手动策划的诱人替代方法,但是它们的主要缺点仍然是检索到错误(嘈杂)样本的比例。以前的作品证明了这些嘈杂的样本是分布式(ID)样本的混合物,分配给了错误类别,但在数据集中的其他类别中呈现了相似的视觉语义,以及分布外(OOD)图像,哪些与数据集中的任何类别共享语义相关性。实际上,后者是检索到的嘈杂图像的主要类型。为了解决这种噪声二元性,我们提出了一个两阶段算法,从检测步骤开始,我们使用无监督的对比功能学习来表示特征空间中的图像。我们发现,对比度学习的比对和统一原则使OOD样品可以与单位孔隙单位上的ID样品线性分离。然后,我们使用固定的邻域大小将无监督的表示形式嵌入,并在类级别上应用异常敏感聚类以检测清洁和OOD簇以及ID嘈杂的异常值。我们最终训练了一个噪声强大的神经网络,该网络将ID噪声纠正为正确的类别,并在具有指导性的对比度目标中使用OOD样品,从而聚集它们以改善低级功能。我们的算法改善了合成噪声图像数据集的最新结果以及现实世界中的Web爬行数据。我们的工作是完全可重现的[github]。
translated by 谷歌翻译
本文解决了新型类别发现(NCD)的问题,该问题旨在区分大规模图像集中的未知类别。 NCD任务由于与现实世界情景的亲密关系而具有挑战性,我们只遇到了一些部分类和图像。与NCD上的其他作品不同,我们利用原型强调类别歧视的重要性,并减轻缺少新颖阶级注释的问题。具体而言,我们提出了一种新型的适应性原型学习方法,该方法由两个主要阶段组成:原型表示学习和原型自我训练。在第一阶段,我们获得了一个可靠的特征提取器,该功能提取器可以为所有具有基础和新颖类别的图像提供。该功能提取器的实例和类别歧视能力通过自我监督的学习和适应性原型来提高。在第二阶段,我们再次利用原型来整理离线伪标签,并训练类别聚类的最终参数分类器。我们对四个基准数据集进行了广泛的实验,并证明了该方法具有最先进的性能的有效性和鲁棒性。
translated by 谷歌翻译
最近对比学习在从未标记数据学习视觉表现方面表现出显着进展。核心思想正在培训骨干,以不变的实例的不同增强。虽然大多数方法只能最大化两个增强数据之间的特征相似性,但我们进一步产生了更具挑战性的训练样本,并强迫模型继续预测这些硬样品上的判别表示。在本文中,我们提出了Mixsiam,传统暹罗网络的混合方法。一方面,我们将实例的两个增强图像输入到骨干,并通过执行两个特征的元素最大值来获得辨别结果。另一方面,我们将这些增强图像的混合物作为输入,并期望模型预测接近鉴别的表示。以这种方式,模型可以访问实例的更多变体数据样本,并继续预测它们的不变判别表示。因此,与先前的对比学习方法相比,学习模型更加强大。大型数据集的广泛实验表明,Mixsiam稳步提高了基线,并通过最先进的方法实现了竞争结果。我们的代码即将发布。
translated by 谷歌翻译
深度学习的快速发展在分割方面取得了长足的进步,这是计算机视觉的基本任务之一。但是,当前的细分算法主要取决于像素级注释的可用性,这些注释通常昂贵,乏味且费力。为了减轻这一负担,过去几年见证了越来越多的关注,以建立标签高效,深度学习的细分算法。本文对标签有效的细分方法进行了全面的审查。为此,我们首先根据不同类型的弱标签提供的监督(包括没有监督,粗略监督,不完整的监督和嘈杂的监督和嘈杂的监督),首先开发出一种分类法来组织这些方法,并通过细分类型(包括语义细分)补充,实例分割和全景分割)。接下来,我们从统一的角度总结了现有的标签有效的细分方法,该方法讨论了一个重要的问题:如何弥合弱监督和密集预测之间的差距 - 当前的方法主要基于启发式先导,例如交叉像素相似性,跨标签约束,跨视图一致性,跨图像关系等。最后,我们分享了对标签有效深层细分的未来研究方向的看法。
translated by 谷歌翻译
对比性自我监督学习(CSL)是一种实用解决方案,它以无监督的方法从大量数据中学习有意义的视觉表示。普通的CSL将从神经网络提取的特征嵌入到特定的拓扑结构上。在训练进度期间,对比度损失将同一输入的不同视图融合在一起,同时将不同输入分开的嵌入。 CSL的缺点之一是,损失项需要大量的负样本才能提供更好的相互信息理想。但是,通过较大的运行批量大小增加负样本的数量也增强了错误的负面影响:语义上相似的样品与锚分开,因此降低了下游性能。在本文中,我们通过引入一个简单但有效的对比学习框架来解决这个问题。关键的见解是使用暹罗风格的度量损失来匹配原型内特征,同时增加了原型间特征之间的距离。我们对各种基准测试进行了广泛的实验,其中结果证明了我们方法在提高视觉表示质量方面的有效性。具体而言,我们使用线性探针的无监督预训练的Resnet-50在Imagenet-1K数据集上超过了受访的训练有素的版本。
translated by 谷歌翻译
Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or "views") of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a "swapped" prediction mechanism where we predict the code of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.
translated by 谷歌翻译