最近,深度度量学习(DML)的实质性研究努力集中在设计复杂的成对距离损失,这需要卷积方案来缓解优化,例如样本挖掘或配对加权。分类的标准交叉熵损失在DML中大大忽略了。在表面上,交叉熵可能看起来不相关,与度量学习无关,因为它没有明确地涉及成对距离。但是,我们提供了一个理论分析,将交叉熵链接到几个众所周知的和最近的成对损耗。我们的连接是从两种不同的观点绘制:一个基于明确的优化洞察力;另一个关于标签与学到的相互信息的判别和生成观点。首先,我们明确证明交叉熵是新的成对损耗的上限,其具有类似于各种成对损耗的结构:它最大限度地减少了课堂内距离,同时最大化了阶级间距离。结果,最小化交叉熵可以被视为近似束缚 - 优化(或大大最小化)算法,以最小化该成对丢失。其次,我们表明,更一般地,最小化跨熵实际上是相当于最大化互联信息的相同信息,我们连接多个众所周知的成对损耗。此外,我们表明,各种标准成对损耗可以通过绑定的关系彼此明确地与彼此有关。我们的研究结果表明,交叉熵代表了最大化相互信息的代理 - 作为成对损耗,没有必要进行复杂的样品挖掘启发式。我们对四个标准DML基准测试的实验强烈支持我们的调查结果。我们获得最先进的结果,优于最近和复杂的DML方法。
translated by 谷歌翻译
A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning. In this paper, we provide a general weighting framework for understanding recent pair-based loss functions. Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW, which is implemented in two iterative steps (i.e., mining and weighting). This allows it to fully consider three similarities for pair weighting, providing a more principled approach for collecting and weighting informative pairs. Finally, the proposed MS loss obtains new state-of-the-art performance on four image retrieval benchmarks, where it outperforms the most recent approaches, such as ABE [14] and HTL [4], by a large margin, e.g., , and 80.9% → 88.0% on In-Shop Clothes Retrieval dataset
translated by 谷歌翻译
对比学习的标准方法是最大化数据不同观点之间的一致性。这些视图成对排序,使它们是正面的,编码对应于不同对象的视图对应的同一对象的不同视图或负面的视图。监督信号来自最大程度地提高正面对的总相似性,而为了避免崩溃,需要负面对。在这项工作中,我们注意到,当从数据的视图中形成集合时,考虑单个对的方法无法解释集合和集合间的相似性。因此,它限制了可用于训练表示形式的监督信号的信息内容。我们建议通过将对比对象作为集合进行对比,超越对比对象。为此,我们使用旨在评估集合和图形相似性的组合二次分配理论,并将设定对比度物镜作为对比度学习方法的正规化学方法。我们进行实验,并证明我们的方法改善了对度量学习和自我监督分类任务的学说。
translated by 谷歌翻译
Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeping dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.
translated by 谷歌翻译
We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -an anchor point x is similar to a set of positive points Y , and dissimilar to a set of negative points Z, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc. Even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses.
translated by 谷歌翻译
深度度量学习(DML)有助于学习嵌入功能,以将语义上的数据投射到附近的嵌入空间中,并在许多应用中起着至关重要的作用,例如图像检索和面部识别。但是,DML方法的性能通常很大程度上取决于采样方法,从训练中的嵌入空间中选择有效的数据。实际上,嵌入空间中的嵌入是通过一些深层模型获得的,其中嵌入空间通常由于缺乏训练点而在贫瘠的区域中,导致所谓的“缺失嵌入”问题。此问题可能会损害样品质量,从而导致DML性能退化。在这项工作中,我们研究了如何减轻“缺失”问题以提高采样质量并实现有效的DML。为此,我们提出了一个密集锚定的采样(DAS)方案,该方案将嵌入的数据点视为“锚”,并利用锚附近的嵌入空间来密集地生成无数据点的嵌入。具体而言,我们建议用判别性特征缩放(DFS)和多个锚点利用单个锚周围的嵌入空间,并具有记忆转换转换(MTS)。通过这种方式,通过有或没有数据点的嵌入方式,我们能够提供更多的嵌入以促进采样过程,从而提高DML的性能。我们的方法毫不费力地集成到现有的DML框架中,并在没有铃铛和哨声的情况下改进了它们。在三个基准数据集上进行的广泛实验证明了我们方法的优势。
translated by 谷歌翻译
Recent methods for deep metric learning have been focusing on designing different contrastive loss functions between positive and negative pairs of samples so that the learned feature embedding is able to pull positive samples of the same class closer and push negative samples from different classes away from each other. In this work, we recognize that there is a significant semantic gap between features at the intermediate feature layer and class labels at the final output layer. To bridge this gap, we develop a contrastive Bayesian analysis to characterize and model the posterior probabilities of image labels conditioned by their features similarity in a contrastive learning setting. This contrastive Bayesian analysis leads to a new loss function for deep metric learning. To improve the generalization capability of the proposed method onto new classes, we further extend the contrastive Bayesian loss with a metric variance constraint. Our experimental results and ablation studies demonstrate that the proposed contrastive Bayesian metric learning method significantly improves the performance of deep metric learning in both supervised and pseudo-supervised scenarios, outperforming existing methods by a large margin.
translated by 谷歌翻译
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the Ima-geNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions, and is more stable to hyperparameter settings such as optimizers and data augmentations. Our loss function is simple to implement and reference TensorFlow code is released at https://t.ly/supcon 1 .
translated by 谷歌翻译
We introduce an information-maximization approach for the Generalized Category Discovery (GCD) problem. Specifically, we explore a parametric family of loss functions evaluating the mutual information between the features and the labels, and find automatically the one that maximizes the predictive performances. Furthermore, we introduce the Elbow Maximum Centroid-Shift (EMaCS) technique, which estimates the number of classes in the unlabeled set. We report comprehensive experiments, which show that our mutual information-based approach (MIB) is both versatile and highly competitive under various GCD scenarios. The gap between the proposed approach and the existing methods is significant, more so when dealing with fine-grained classification problems. Our code: \url{https://github.com/fchiaroni/Mutual-Information-Based-GCD}.
translated by 谷歌翻译
Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling, which selects more informative and stable examples than traditional approaches. In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions. We evaluate our approach on the Stanford Online Products, CAR196, and the CUB200-2011 datasets for image retrieval and clustering, and on the LFW dataset for face verification. Our method achieves state-of-the-art performance on all of them.
translated by 谷歌翻译
深度度量学习(DML)旨在最大程度地减少嵌入图像中成对内部/间阶层接近性违规的经验预期损失。我们将DML与有限机会限制的可行性问题联系起来。我们表明,基于代理的DML的最小化器满足了某些机会限制,并且基于代理方法的最坏情况可以通过围绕类代理的最小球的半径来表征,以覆盖相应类的整个域样本,建议每课多个代理有助于表现。为了提供可扩展的算法并利用更多代理,我们考虑了基于代理的DML实例的最小化者所隐含的机会限制,并将DML重新制定为在此类约束的交叉点中找到可行的点,从而导致问题近似解决。迭代预测。简而言之,我们反复训练基于代理的损失,并用故意选择的新样本的嵌入来重新定位代理。我们将我们的方法应用于公认的损失,并在四个流行的基准数据集上评估图像检索。优于最先进的方法,我们的方法一致地提高了应用损失的性能。代码可在以下网址找到:https://github.com/yetigurbuz/ccp-dml
translated by 谷歌翻译
在距离度量学习网络的培训期间,典型损耗函数的最小值可以被认为是满足由训练数据施加的一组约束的“可行点”。为此,我们将距离度量学习问题重构为查找约束集的可行点,其中训练数据的嵌入向量满足所需的类内和帧间接近度。由约束集引起的可行性集被表示为仅针对训练数据的特定样本(来自每个类别的样本)强制执行接近约束的宽松可行集合。然后,通过在那些可行的组上执行交替的投影来大致解决可行点问题。这种方法引入了正则化术语,并导致最小化具有系统批量组结构的典型损失函数,其中这些批次被约束以包含来自每个类的相同样本,用于一定数量的迭代。此外,这些特定样品可以被认为是阶级代表,允许在批量构建期间有效地利用艰难的挖掘。所提出的技术应用于良好的损失,并在斯坦福在线产品,CAR196和CUB200-2011数据集进行了评估,用于图像检索和聚类。表现优于现有技术,所提出的方法一致地提高了综合损失函数的性能,没有额外的计算成本,并通过硬负面挖掘进一步提高性能。
translated by 谷歌翻译
Cross entropy loss has served as the main objective function for classification-based tasks. Widely deployed for learning neural network classifiers, it shows both effectiveness and a probabilistic interpretation. Recently, after the success of self supervised contrastive representation learning methods, supervised contrastive methods have been proposed to learn representations and have shown superior and more robust performance, compared to solely training with cross entropy loss. However, cross entropy loss is still needed to train the final classification layer. In this work, we investigate the possibility of learning both the representation and the classifier using one objective function that combines the robustness of contrastive learning and the probabilistic interpretation of cross entropy loss. First, we revisit a previously proposed contrastive-based objective function that approximates cross entropy loss and present a simple extension to learn the classifier jointly. Second, we propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network. We empirically show that our proposed objective functions show a significant improvement over the standard cross entropy loss with more training stability and robustness in various challenging settings.
translated by 谷歌翻译
我们从统计依赖性角度接近自我监督的图像表示学习,提出与希尔伯特 - 施密特独立性标准(SSL-HSIC)自我监督的学习。 SSL-HSIC最大化图像和图像标识的变换表示之间的依赖性,同时最小化这些表示的核化方差。该框架产生了对Infonce的新了解,在不同转换之间的相互信息(MI)上的变分下限。虽然已知MI本身具有可能导致学习无意义的表示的病理学,但其绑定表现得更好:我们表明它隐含地近似于SSL-HSIC(具有略微不同的规范器)。我们的方法还向我们深入了解Byol,一种无与伦比的SSL方法,因为SSL-HSIC类似地了解了当地的样本邻居。 SSL-HSIC允许我们在批量大小中直接在时间线性上直接优化统计依赖性,而无需限制数据假设或间接相互信息估计。 SSL-HSIC培训或没有目标网络,SSL-HSIC与Imagenet的标准线性评估相匹配,半监督学习和转移到其他分类和视觉任务,如语义分割,深度估计和对象识别等。代码可在https://github.com/deepmind/ssl_hsic提供。
translated by 谷歌翻译
Deep metric learning aims to learn an embedding space, where semantically similar samples are close together and dissimilar ones are repelled against. To explore more hard and informative training signals for augmentation and generalization, recent methods focus on generating synthetic samples to boost metric learning losses. However, these methods just use the deterministic and class-independent generations (e.g., simple linear interpolation), which only can cover the limited part of distribution spaces around original samples. They have overlooked the wide characteristic changes of different classes and can not model abundant intra-class variations for generations. Therefore, generated samples not only lack rich semantics within the certain class, but also might be noisy signals to disturb training. In this paper, we propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining and boost metric learning losses. Further, for most datasets that have a few samples within the class, we propose the neighbor correction to revise the inaccurate estimations, according to our correlation discovery where similar classes generally have similar variation distributions. Extensive experiments on five benchmarks show our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%. Our code is available at https://github.com/darkpromise98/IAA
translated by 谷歌翻译
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL -- representation learning and label disambiguation -- in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
translated by 谷歌翻译
Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning. Our analysis suggests a simple method for integrating self-supervised learning with generative models, allowing for the joint training of these two seemingly distinct approaches. We refer to this combined framework as GEDI, which stands for GEnerative and DIscriminative training. Additionally, we demonstrate an instantiation of the GEDI framework by integrating an energy-based model with a cluster-based self-supervised learning model. Through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, we show that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a wide margin. We also demonstrate that GEDI can be integrated into a neural-symbolic framework to address tasks in the small data regime, where it can use logical constraints to further improve clustering and classification performance.
translated by 谷歌翻译
Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is viewagnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks.
translated by 谷歌翻译
基于深度学习的分类中特征表示的主要挑战之一是设计表现出强大歧视力的适当损失功能。经典的SoftMax损失并不能明确鼓励对特征的歧视性学习。研究的一个流行方向是将边缘纳入良好的损失中,以实施额外的课内紧凑性和阶层间的可分离性,但是,这是通过启发式手段而不是严格的数学原则来开发的。在这项工作中,我们试图通过将原则优化目标提出为最大的利润率来解决这一限制。具体而言,我们首先将类别的边缘定义为级别间的可分离性的度量,而样品边缘是级别的紧凑性的度量。因此,为了鼓励特征的歧视性表示,损失函数应促进类和样品的最大可能边缘。此外,我们得出了广义的保证金软损失,以得出现有基于边缘的损失的一般结论。这个原则性的框架不仅提供了新的观点来理解和解释现有的基于保证金的损失,而且还提供了新的见解,可以指导新工具的设计,包括样本保证金正则化和最大的平衡案例的最大保证金损失,和零中心的正则化案例。实验结果证明了我们的策略对各种任务的有效性,包括视觉分类,分类不平衡,重新识别和面部验证。
translated by 谷歌翻译
Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.
translated by 谷歌翻译