最近的工作据称,利用Softmax跨熵的分类损失不仅可以用于固定设定的分类任务,而且还通过专门为开放式任务开发的优于开销的损失,包括几次射击学习和检索。使用不同的嵌入几何形状研究了软MAX分类器 - 欧几里德,双曲线和球形,并且已经对一个或另一个的优越性进行了索赔,但它们没有得到精心控制的系统。我们对各种固定设定分类和图像检索任务的软MAX损失嵌入几何的实证研究。对于球形损失观察到的一个有趣的财产导致我们提出了一种基于VON MISES-FISHER分配的概率分类器,我们表明它具有最先进的方法竞争,同时生产出完善的盒子校准。我们提供有关亏损之间的权衡以及如何在其中选择的指导。
translated by 谷歌翻译
基于代理的深度度量学习(DML)通过将图像嵌入与班级代表接近的图像(通常相对于它们之间的角度)来学习深度表示。但是,这无视嵌入规范,该规范可以带有其他有益的环境,例如类或图像 - 内在不确定性。此外,基于代理的DML努力学习课堂内部结构。为了立即解决这两个问题,我们引入了基于概率的非各向异性概率代理DML。我们将图像模拟为高超球的定向von mises-fisher(VMF)分布,可以反映图像内部不确定性。此外,我们为类代理提供了非异向von mises-fisher(NIVMF)分布,以更好地表示复杂的类别特异性方差。为了衡量这些模型之间的代理到图像距离,我们开发并研究了多个分布到分配和分布指标。每种框架选择都是由一系列消融研究激励的,这些研究展示了我们对基于代理的DML的概率方法的有益特性,例如不确定性意识,在培训期间较好的梯度以及总体改善的概括性能。后者尤其反映在标准DML基准测试中的竞争性能中,我们的方法可以进行比较,这表明现有的基于代理的DML可以从更概率的治疗中受益匪浅。代码可在github.com/explainableml/probabilistic_deep_metric_learning上找到。
translated by 谷歌翻译
以前的工作提出了许多新的损失函数和常规程序,可提高图像分类任务的测试准确性。但是,目前尚不清楚这些损失函数是否了解下游任务的更好表示。本文研究了培训目标的选择如何影响卷积神经网络隐藏表示的可转移性,训练在想象中。我们展示了许多目标在Vanilla Softmax交叉熵上导致想象的精度有统计学意义的改进,但由此产生的固定特征提取器转移到下游任务基本较差,并且当网络完全微调时,损失的选择几乎没有效果新任务。使用居中内核对齐来测量网络隐藏表示之间的相似性,我们发现损失函数之间的差异仅在网络的最后几层中都很明显。我们深入了解倒数第二层的陈述,发现不同的目标和近奇计的组合导致大幅不同的类别分离。具有较高类别分离的表示可以在原始任务上获得更高的准确性,但它们的功能对于下游任务不太有用。我们的结果表明,用于原始任务的学习不变功能与传输任务相关的功能之间存在权衡。
translated by 谷歌翻译
Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often nearing ceiling), and the experimental methodology seemingly favors the deep methods. We conduct a large-scale empirical study of 17 clustering methods across three datasets and obtain several robust findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they match or even perform worse than shallow, heuristic-based methods. When embeddings are highly discriminative, deep methods do outperform the baselines, consistent with past results, but the margin between methods is much smaller than previously reported. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved. To enable reproducibility, we include all necessary details in the appendices, and plan to release the code.
translated by 谷歌翻译
In recent years, deep metric learning and its probabilistic extensions claimed state-of-the-art results in the face verification task. Despite improvements in face verification, probabilistic methods received little attention in the research community and practical applications. In this paper, we, for the first time, perform an in-depth analysis of known probabilistic methods in verification and retrieval tasks. We study different design choices and propose a simple extension, achieving new state-of-the-art results among probabilistic methods. Finally, we study confidence prediction and show that it correlates with data quality, but contains little information about prediction error probability. We thus provide a new confidence evaluation benchmark and establish a baseline for future confidence prediction research. PyTorch implementation is publicly released.
translated by 谷歌翻译
在这项工作中,我们提出了一种新的损失,以提高特征可怜和分类性能。通过自适应余弦/相干估计(ACE)的动机,我们的提出方法包括由人工神经网络本质学学习的角度信息。我们的学习ACE(蕾丝)将数据转换为新的“白细胞”空间,可提高级别的间可分离性和级别的紧凑性。我们将我们的蕾丝与基于艺术艺术品的替代最终的和功能正则化方法进行比较。我们的研究结果表明,该方法可以作为交叉熵和角度软墨水方法的可行替代方案。我们的代码是公开的:https://github.com/gatorsense/lace。
translated by 谷歌翻译
在这项工作中,我们使用变分推论来量化无线电星系分类的深度学习模型预测的不确定性程度。我们表明,当标记无线电星系时,个体测试样本的模型后差水平与人类不确定性相关。我们探讨了各种不同重量前沿的模型性能和不确定性校准,并表明稀疏事先产生更良好的校准不确定性估计。使用单个重量的后部分布,我们表明我们可以通过从最低信噪比(SNR)中除去权重来修剪30%的完全连接的层权重,而无需显着损失性能。我们证明,可以使用基于Fisher信息的排名来实现更大程度的修剪,但我们注意到两种修剪方法都会影响Failaroff-Riley I型和II型无线电星系的不确定性校准。最后,我们表明,与此领域的其他工作相比,我们经历了冷的后效,因此后部必须缩小后加权以实现良好的预测性能。我们检查是否调整成本函数以适应模型拼盘可以弥补此效果,但发现它不会产生显着差异。我们还研究了原则数据增强的效果,并发现这改善了基线,而且还没有弥补观察到的效果。我们将其解释为寒冷的后效,因为我们的培训样本过于有效的策划导致可能性拼盘,并将其提高到未来无线电银行分类的潜在问题。
translated by 谷歌翻译
学习概括不见于没有人类监督的有效视觉表现是一个基本问题,以便将机器学习施加到各种各样的任务。最近,分别是SIMCLR和BYOL的两个自我监督方法,对比学习和潜在自动启动的家庭取得了重大进展。在这项工作中,我们假设向这些算法添加显式信息压缩产生更好,更强大的表示。我们通过开发与条件熵瓶颈(CEB)目标兼容的SIMCLR和BYOL配方来验证这一点,允许我们衡量并控制学习的表示中的压缩量,并观察它们对下游任务的影响。此外,我们探讨了Lipschitz连续性和压缩之间的关系,显示了我们学习的编码器的嘴唇峰常数上的易触摸下限。由于Lipschitz连续性与稳健性密切相关,这为什么压缩模型更加强大提供了新的解释。我们的实验证实,向SIMCLR和BYOL添加压缩显着提高了线性评估精度和模型鲁棒性,跨各种域移位。特别是,Byol的压缩版本与Reset-50的ImageNet上的76.0%的线性评估精度达到了76.0%的直线评价精度,并使用Reset-50 2x的78.8%。
translated by 谷歌翻译
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the Ima-geNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions, and is more stable to hyperparameter settings such as optimizers and data augmentations. Our loss function is simple to implement and reference TensorFlow code is released at https://t.ly/supcon 1 .
translated by 谷歌翻译
We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
课程学习需要示例难以从轻松到硬进行。但是,很少研究图像难度的信誉,这会严重影响课程的有效性。在这项工作中,我们提出了角度差距,这是基于特征嵌入和通过超球体学习构建的类别嵌入和类体重嵌入的角度差异的难度度量。为了确定难度估计,我们将按班级模型校准作为培训后技术引入学习的双曲线空间。这弥合了概率模型校准与超透明学习的角度距离估计之间的差距。我们显示了校准的角度差距的优越性,而不是最近在CIFAR10-H和ImagenEtV2上的难度指标。我们进一步提出了基于角度间隙的课程学习,以进行无监督的域适应性,从而可以从学习简易样品转化为采矿硬样品。我们将该课程与最先进的自我训练方法(CST)相结合。拟议的课程CST学习了强大的表示形式,并且在Office31和Visda 2017上的最新基准都优于最近的基线。
translated by 谷歌翻译
We present an approach to quantifying both aleatoric and epistemic uncertainty for deep neural networks in image classification, based on generative adversarial networks (GANs). While most works in the literature that use GANs to generate out-of-distribution (OoD) examples only focus on the evaluation of OoD detection, we present a GAN based approach to learn a classifier that produces proper uncertainties for OoD examples as well as for false positives (FPs). Instead of shielding the entire in-distribution data with GAN generated OoD examples which is state-of-the-art, we shield each class separately with out-of-class examples generated by a conditional GAN and complement this with a one-vs-all image classifier. In our experiments, in particular on CIFAR10, CIFAR100 and Tiny ImageNet, we improve over the OoD detection and FP detection performance of state-of-the-art GAN-training based classifiers. Furthermore, we also find that the generated GAN examples do not significantly affect the calibration error of our classifier and result in a significant gain in model accuracy.
translated by 谷歌翻译
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.
translated by 谷歌翻译
众所周知,视觉分类模型在数据分布班面上遭受较差的校准。在本文中,我们对此问题采取了几何方法。我们提出几何灵敏度分解(GSD)将样本特征嵌入的标准分解为目标分类器的示例特征嵌入和角度相似度分解为依赖于实例和实例 - 独立的组件。实例相关组件捕获关于输入中的更改的敏感信息,而实例无关的组件仅表示仅用于最小化训练数据集的丢失的不敏感信息。灵感来自分解,我们分析了一个简单的扩展到当前的SoftMax-Linear模型,这在训练期间学会解开两个组件。在几种常见视觉模型上,脱谕式模型在面对配送(OOD)数据和腐败方面的标准校准度量上的其他校准方法表现出明显不那么复杂。具体而言,我们将当前技术超越30.8%的相对改善对预期校准误差的损坏的CIFAR100。代码在https://github.com/gt-ripl/geometric -sentivity-decomposition.git。
translated by 谷歌翻译
Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by L 2 normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.
translated by 谷歌翻译
当前独立于域的经典计划者需要问题域和实例作为输入的符号模型,从而导致知识采集瓶颈。同时,尽管深度学习在许多领域都取得了重大成功,但知识是在与符号系统(例如计划者)不兼容的亚符号表示中编码的。我们提出了Latplan,这是一种无监督的建筑,结合了深度学习和经典计划。只有一组未标记的图像对,显示了环境中允许的过渡子集(训练输入),Latplan学习了环境的完整命题PDDL动作模型。稍后,当给出代表初始状态和目标状态(计划输入)的一对图像时,Latplan在符号潜在空间中找到了目标状态的计划,并返回可视化的计划执行。我们使用6个计划域的基于图像的版本来评估LATPLAN:8个插头,15个式嘴,Blockworld,Sokoban和两个LightsOut的变体。
translated by 谷歌翻译
变异自动编码器(VAE)是最常用的无监督机器学习模型之一。但是,尽管对先前和后验的高斯分布的默认选择通常代表了数学方便的分布通常会导致竞争结果,但我们表明该参数化无法用潜在的超球体结构对数据进行建模。为了解决这个问题,我们建议使用von Mises-fisher(VMF)分布,从而导致超级潜在空间。通过一系列实验,我们展示了这种超球vae或$ \ mathcal {s} $ - vae如何更适合于用超球形结构捕获数据,同时胜过正常的,$ \ mathcal {n} $ - vae-,在其他数据类型的低维度中。http://github.com/nicola-decao/s-vae-tf和https://github.com/nicola-decao/nicola-decao/s-vae-pytorch
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
理想学识渊博的表示应显示可转移性和鲁棒性。监督对比学习(SUPCON)是一种训练准确模型的有前途的方法,但是当班级映射中的所有点符合相同的表示形式时,就会产生不会捕获这些属性的表示形式。最近的工作表明,“散布”这些表示可以改善它们,但是确切的机制知之甚少。我们认为,单独创建点差不足以进行更好的表示,因为差异对于班级的排列不变。取而代之的是,有必要正确的传播程度和破坏这种不变性的机制。我们首先证明,添加加权类条件的信息损失以控制传播程度。接下来,我们研究了三种破坏排列不变性的机制:使用约束编码器,添加类条件自动编码器并使用数据增强。我们表明,后两者鼓励在更现实的条件下与前者聚集潜在子类。使用这些见解,我们表明,在5个标准数据集中添加适当加权的集体条件infonce损失和一个班级条件自动编码器,以在5个标准数据集中进行粗到5分的转移,并在最差的组上进行4.7分,以达到11.1个升力。 3个数据集,将Celeba的最新时间设置为11.5分。
translated by 谷歌翻译