在对比的表示学习中,训练数据表示,使得即使在通过增强的图像改变时,它也可以对图像实例进行分类。然而,根据数据集,一些增强可以损坏超出识别的图像的信息,并且这种增强可以导致折叠表示。我们通过将随机编码过程正式化,其中通过增强的数据损坏与由编码器保留的信息之间存在脱疣,对该问题提出了部分解决方案。我们展示了基于此框架的InfoMax目标,我们可以学习增强的数据依赖分布,以避免表示的崩溃。
translated by 谷歌翻译
对比表示学习旨在通过估计数据的多个视图之间的共享信息来获得有用的表示形式。在这里,数据增强的选择对学会表示的质量很敏感:随着更难的应用,数据增加了,视图共享更多与任务相关的信息,但也可以妨碍表示代表的概括能力。在此激励的基础上,我们提出了一种新的强大的对比度学习计划,即r \'enyicl,可以通过利用r \'enyi差异来有效地管理更艰难的增强。我们的方法建立在r \'enyi差异的变异下限基础上,但是由于差异很大,对变异方法的使用是不切实际的。要应对这一挑战,我们提出了一个新颖的对比目标,该目标是进行变异估计的新型对比目标偏斜r \'enyi的分歧,并提供理论保证,以确保偏差差异如何导致稳定训练。我们表明,r \'enyi对比度学习目标执行先天的硬性负面样本和易于选择的阳性抽样学习有用的功能并忽略滋扰功能。通过在Imagenet上进行实验,我们表明,r \'enyi对比度学习具有更强的增强性能优于其他自我监督的方法,而无需额外的正则化或计算上的开销。图形和表格,显示了与其他对比方法相比的经验增益。
translated by 谷歌翻译
我们从统计依赖性角度接近自我监督的图像表示学习,提出与希尔伯特 - 施密特独立性标准(SSL-HSIC)自我监督的学习。 SSL-HSIC最大化图像和图像标识的变换表示之间的依赖性,同时最小化这些表示的核化方差。该框架产生了对Infonce的新了解,在不同转换之间的相互信息(MI)上的变分下限。虽然已知MI本身具有可能导致学习无意义的表示的病理学,但其绑定表现得更好:我们表明它隐含地近似于SSL-HSIC(具有略微不同的规范器)。我们的方法还向我们深入了解Byol,一种无与伦比的SSL方法,因为SSL-HSIC类似地了解了当地的样本邻居。 SSL-HSIC允许我们在批量大小中直接在时间线性上直接优化统计依赖性,而无需限制数据假设或间接相互信息估计。 SSL-HSIC培训或没有目标网络,SSL-HSIC与Imagenet的标准线性评估相匹配,半监督学习和转移到其他分类和视觉任务,如语义分割,深度估计和对象识别等。代码可在https://github.com/deepmind/ssl_hsic提供。
translated by 谷歌翻译
机器学习系统经常在培训和测试之间遇到分发转变。在本文中,我们介绍了一个简单的变分目标,其OptiCa正好成为所有表现形式的集合,在那种情况下,保证风险最小化者对保留贝叶斯预测因子的任何分配换档,例如协变量。我们的目标有两个组成部分。首先,表示必须保持对任务的判别,即,一些预测指标必须能够同时最小化来源和目标风险。其次,代表性的边际支持需要跨源头和目标相同。我们通过设计自我监督的学习方法来实现这一实用,只使用未标记的数据和增强来培训强大的陈述。我们的目标在域底实现最先进的结果,并对最近的方法(如剪辑)的稳健性提供洞察力。
translated by 谷歌翻译
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive selfsupervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by Sim-CLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-ofthe-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100× fewer labels. 1
translated by 谷歌翻译
分销(OOD)泛化问题的目标是培训推广所有环境的预测因子。此字段中的流行方法使用这样的假设,即这种预测器应为\ Texit {不变预测器},该{不变预测仪}捕获跨环境仍然不变的机制。虽然这些方法在各种案例研究中进行了实验成功,但仍然有很多关于这一假设的理论验证的空间。本文介绍了一系列不变预测因素所必需的一系列理论条件,以实现ood最优性。我们的理论不仅适用于非线性案例,还概括了\ CiteT {Rojas2018Invariant}中使用的必要条件。我们还从我们的理论中得出渐变对齐算法,并展示了\ Citet {Aubinlinear}提出的三个\ Texit {不变性单元测试}中的两种竞争力。
translated by 谷歌翻译
Contrastive representation learning has been outstandingly successful in practice. In this work, we identify two key properties related to the contrastive loss: (1) alignment (closeness) of features from positive pairs, and (2) uniformity of the induced distribution of the (normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss optimizes these properties, and analyze their positive effects on downstream tasks. Empirically, we introduce an optimizable metric to quantify each property. Extensive experiments on standard vision and language datasets confirm the strong agreement between both metrics and downstream task performance. Directly optimizing for these two metrics leads to representations with comparable or better performance at downstream tasks than contrastive learning. Project
translated by 谷歌翻译
Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning. Our analysis suggests a simple method for integrating self-supervised learning with generative models, allowing for the joint training of these two seemingly distinct approaches. We refer to this combined framework as GEDI, which stands for GEnerative and DIscriminative training. Additionally, we demonstrate an instantiation of the GEDI framework by integrating an energy-based model with a cluster-based self-supervised learning model. Through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, we show that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a wide margin. We also demonstrate that GEDI can be integrated into a neural-symbolic framework to address tasks in the small data regime, where it can use logical constraints to further improve clustering and classification performance.
translated by 谷歌翻译
Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is viewagnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks.
translated by 谷歌翻译
在本文中,我们研究了在非全粒图上进行节点表示学习的自我监督学习的问题。现有的自我监督学习方法通​​常假定该图是同质的,其中链接的节点通常属于同一类或具有相似的特征。但是,这种同质性的假设在现实图表中并不总是正确的。我们通过为图神经网络开发脱钩的自我监督学习(DSSL)框架来解决这个问题。 DSSL模仿了节点的生成过程和语义结构的潜在变量建模的链接,该过程将不同邻域之间的不同基础语义解散到自我监督的节点学习过程中。我们的DSSL框架对编码器不可知,不需要预制的增强,因此对不同的图表灵活。为了通过潜在变量有效地优化框架,我们得出了自我监督目标的较低范围的证据,并开发了具有变异推理的可扩展培训算法。我们提供理论分析,以证明DSSL享有更好的下游性能。与竞争性的自我监督学习基线相比,对各种类图基准的广泛实验表明,我们提出的框架可以显着取得更好的性能。
translated by 谷歌翻译
尽管自我监督学习(SSL)方法取得了经验成功,但尚不清楚其表示的哪些特征导致了高下游精度。在这项工作中,我们表征了SSL表示应该满足的属性。具体而言,我们证明了必要和充分的条件,因此,对于给出的数据增强的任何任务,在该表示形式上训练的所需探针(例如,线性或MLP)具有完美的准确性。这些要求导致一个统一的概念框架,用于改善现有的SSL方法并得出新方法。对于对比度学习,我们的框架规定了对以前的方法(例如使用不对称投影头)的简单但重大改进。对于非对比度学习,我们使用框架来得出一个简单新颖的目标。我们所得的SSL算法在标准基准测试上的表现优于基线,包括Imagenet线性探测的SHAV+多螺旋桨。
translated by 谷歌翻译
学习概括不见于没有人类监督的有效视觉表现是一个基本问题,以便将机器学习施加到各种各样的任务。最近,分别是SIMCLR和BYOL的两个自我监督方法,对比学习和潜在自动启动的家庭取得了重大进展。在这项工作中,我们假设向这些算法添加显式信息压缩产生更好,更强大的表示。我们通过开发与条件熵瓶颈(CEB)目标兼容的SIMCLR和BYOL配方来验证这一点,允许我们衡量并控制学习的表示中的压缩量,并观察它们对下游任务的影响。此外,我们探讨了Lipschitz连续性和压缩之间的关系,显示了我们学习的编码器的嘴唇峰常数上的易触摸下限。由于Lipschitz连续性与稳健性密切相关,这为什么压缩模型更加强大提供了新的解释。我们的实验证实,向SIMCLR和BYOL添加压缩显着提高了线性评估精度和模型鲁棒性,跨各种域移位。特别是,Byol的压缩版本与Reset-50的ImageNet上的76.0%的线性评估精度达到了76.0%的直线评价精度,并使用Reset-50 2x的78.8%。
translated by 谷歌翻译
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning. Despite its success, the influence of different view choices has been less studied. In this paper, we use theoretical and empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. To verify this hypothesis, we devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. We also consider data augmentation as a way to reduce MI, and show that increasing data augmentation indeed leads to decreasing MI and improves downstream classification accuracy. As a byproduct, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification (73% top-1 linear readout with a ResNet-50) 1 .
translated by 谷歌翻译
Existing graph contrastive learning methods rely on augmentation techniques based on random perturbations (e.g., randomly adding or dropping edges and nodes). Nevertheless, altering certain edges or nodes can unexpectedly change the graph characteristics, and choosing the optimal perturbing ratio for each dataset requires onerous manual tuning. In this paper, we introduce Implicit Graph Contrastive Learning (iGCL), which utilizes augmentations in the latent space learned from a Variational Graph Auto-Encoder by reconstructing graph topological structure. Importantly, instead of explicitly sampling augmentations from latent distributions, we further propose an upper bound for the expected contrastive loss to improve the efficiency of our learning algorithm. Thus, graph semantics can be preserved within the augmentations in an intelligent way without arbitrary manual design or prior human knowledge. Experimental results on both graph-level and node-level tasks show that the proposed method achieves state-of-the-art performance compared to other benchmarks, where ablation studies in the end demonstrate the effectiveness of modules in iGCL.
translated by 谷歌翻译
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the Ima-geNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions, and is more stable to hyperparameter settings such as optimizers and data augmentations. Our loss function is simple to implement and reference TensorFlow code is released at https://t.ly/supcon 1 .
translated by 谷歌翻译
对比学习的标准方法是最大化数据不同观点之间的一致性。这些视图成对排序,使它们是正面的,编码对应于不同对象的视图对应的同一对象的不同视图或负面的视图。监督信号来自最大程度地提高正面对的总相似性,而为了避免崩溃,需要负面对。在这项工作中,我们注意到,当从数据的视图中形成集合时,考虑单个对的方法无法解释集合和集合间的相似性。因此,它限制了可用于训练表示形式的监督信号的信息内容。我们建议通过将对比对象作为集合进行对比,超越对比对象。为此,我们使用旨在评估集合和图形相似性的组合二次分配理论,并将设定对比度物镜作为对比度学习方法的正规化学方法。我们进行实验,并证明我们的方法改善了对度量学习和自我监督分类任务的学说。
translated by 谷歌翻译
在连续时间域上表示为随机微分方程的基于扩散的方法最近已证明是一种非对抗性生成模型。培训此类模型依赖于denoising得分匹配,可以将其视为多尺度的Denoising自动编码器。在这里,我们扩大了Denoising分数匹配框架,以实现表示无监督信号的表示。 GAN和VAE通过将潜在代码直接转换为数据样本来学习表示形式。相比之下,引入的基于扩散的表示学习依赖于Denoisising分数匹配目标的新公式,因此编码了DeNoising所需的信息。我们说明了这种差异如何允许对表示中编码的细节级别进行手动控制。使用相同的方法,我们建议学习无限维度的潜在代码,该代码可在半监督图像分类中改善最先进的模型。我们还将扩散评分匹配的学术表示表示与自动编码器等其他方法的质量进行比较,并通过其在下游任务上的性能进行对比训练的系统。
translated by 谷歌翻译
我们将神经激活编码(NAC)作为一种学习从未标记数据的深度表示的新方法,用于下游应用。我们认为深度编码器应在下游预测器的数据上最大化其非线性表征,以充分利用其代表性。为此,NAC通过嘈杂的通信信道通过嘈杂的通信信道最大化编码器的激活模式和数据之间的相互信息。我们表明,用于稳健激活码的学习增加了Relu编码器的不同线性区域的数量,因此是最大的非线性表达性。 NAC更有意义地了解数据的连续和离散表示,我们分别在两个下游任务中评估:(i)Cifar-10和Imagenet-1k和(ii)在CiFar-10和Flickr-25k上的最近邻检索的线性分类。经验结果表明,NAC在最近的基本链上获得了更好或相当的性能,包括SIMCLR和Distillhash。此外,NAC预押出了对深度生成模型的培训提供了显着的好处。我们的代码可在https://github.com/yookoon/nac提供。
translated by 谷歌翻译
使用超越欧几里德距离的神经网络,深入的Bregman分歧测量数据点的分歧,并且能够捕获分布的发散。在本文中,我们提出了深深的布利曼对视觉表现的对比学习的分歧,我们的目标是通过基于功能Bregman分歧培训额外的网络来提高自我监督学习中使用的对比损失。与完全基于单点之间的分歧的传统对比学学习方法相比,我们的框架可以捕获分布之间的发散,这提高了学习表示的质量。我们展示了传统的对比损失和我们提出的分歧损失优于基线的结合,并且最先前的自我监督和半监督学习的大多数方法在多个分类和对象检测任务和数据集中。此外,学习的陈述在转移到其他数据集和任务时概括了良好。源代码和我们的型号可用于补充,并将通过纸张释放。
translated by 谷歌翻译
由于现实世界图形/网络数据中的广泛标签稀缺问题,因此,自我监督的图形神经网络(GNN)非常需要。曲线图对比度学习(GCL),通过训练GNN以其不同的增强形式最大化相同图表之间的表示之间的对应关系,即使在不使用标签的情况下也可以产生稳健和可转移的GNN。然而,GNN由传统的GCL培训经常冒险捕获冗余图形特征,因此可能是脆弱的,并在下游任务中提供子对比。在这里,我们提出了一种新的原理,称为普通的普通GCL(AD-GCL),其使GNN能够通过优化GCL中使用的对抗性图形增强策略来避免在训练期间捕获冗余信息。我们将AD-GCL与理论解释和设计基于可训练的边缘滴加图的实际实例化。我们通过与最先进的GCL方法进行了实验验证了AD-GCL,并在无监督,6 \%$ 14 \%$ 6 \%$ 14 \%$ 6 \%$ 6 \%$ 3 \%$ 3 \%$达到半监督总体学习设置,具有18个不同的基准数据集,用于分子属性回归和分类和社交网络分类。
translated by 谷歌翻译