现代基于深度学习的系统的性能极大地取决于输入对象的质量。例如,对于模糊或损坏的输入,面部识别质量将较低。但是,在更复杂的情况下,很难预测输入质量对所得准确性的影响。我们提出了一种深度度量学习的方法,该方法允许直接估算不确定性,几乎没有额外的计算成本。开发的\ textit {scaleface}算法使用可训练的比例值,以修改嵌入式空间中的相似性。这些依赖于输入的量表值代表了对识别结果的信心的度量,从而允许估计不确定性。我们提供了有关面部识别任务的全面实验,这些实验表明与其他不确定性感知的面部识别方法相比,比例表面的表现出色。我们还将结果扩展到了文本到图像检索的任务,表明所提出的方法以显着的利润击败了竞争对手。
translated by 谷歌翻译
Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, the temperature scalar is exactly the scale of uncertainty noise implicitly added in the softmax function. Meanwhile, the uncertainty of images in a dataset should follow a prior distribution. Based on the observation, a unified framework for uncertainty modeling and FR, Random Temperature Scaling (RTS), is proposed to learn a reliable FR algorithm. The benefits of RTS are two-fold. (1) In the training phase, it can adjust the learning strength of clean and noisy samples for stability and accuracy. (2) In the test phase, it can provide a score of confidence to detect uncertain, low-quality and even OOD samples, without training on extra labels. Extensive experiments on FR benchmarks demonstrate that the magnitude of variance in RTS, which serves as an OOD detection metric, is closely related to the uncertainty of the input image. RTS can achieve top performance on both the FR and OOD detection tasks. Moreover, the model trained with RTS can perform robustly on datasets with noise. The proposed module is light-weight and only adds negligible computation cost to the model.
translated by 谷歌翻译
In recent years, deep metric learning and its probabilistic extensions claimed state-of-the-art results in the face verification task. Despite improvements in face verification, probabilistic methods received little attention in the research community and practical applications. In this paper, we, for the first time, perform an in-depth analysis of known probabilistic methods in verification and retrieval tasks. We study different design choices and propose a simple extension, achieving new state-of-the-art results among probabilistic methods. Finally, we study confidence prediction and show that it correlates with data quality, but contains little information about prediction error probability. We thus provide a new confidence evaluation benchmark and establish a baseline for future confidence prediction research. PyTorch implementation is publicly released.
translated by 谷歌翻译
In this paper, we investigate the problem of predictive confidence in face and kinship verification. Most existing face and kinship verification methods focus on accuracy performance while ignoring confidence estimation for their prediction results. However, confidence estimation is essential for modeling reliability in such high-risk tasks. To address this issue, we first introduce a novel yet simple confidence measure for face and kinship verification, which allows the verification models to transform the similarity score into a confidence score for a given face pair. We further propose a confidence-calibrated approach called angular scaling calibration (ASC). ASC is easy to implement and can be directly applied to existing face and kinship verification models without model modifications, yielding accuracy-preserving and confidence-calibrated probabilistic verification models. To the best of our knowledge, our approach is the first general confidence-calibrated solution to face and kinship verification in a modern context. We conduct extensive experiments on four widely used face and kinship verification datasets, and the results demonstrate the effectiveness of our approach.
translated by 谷歌翻译
深度神经网络具有令人印象深刻的性能,但是他们无法可靠地估计其预测信心,从而限制了其在高风险领域中的适用性。我们表明,应用多标签的一VS损失揭示了分类的歧义并降低了模型的过度自信。引入的Slova(单标签One-Vs-All)模型重新定义了单个标签情况的典型单VS-ALL预测概率,其中只有一个类是正确的答案。仅当单个类具有很高的概率并且其他概率可忽略不计时,提议的分类器才有信心。与典型的SoftMax函数不同,如果所有其他类的概率都很小,Slova自然会检测到分布的样本。该模型还通过指数校准进行了微调,这使我们能够与模型精度准确地对齐置信分数。我们在三个任务上验证我们的方法。首先,我们证明了斯洛伐克与最先进的分布校准具有竞争力。其次,在数据集偏移下,斯洛伐克的性能很强。最后,我们的方法在检测到分布样品的检测方面表现出色。因此,斯洛伐克是一种工具,可以在需要不确定性建模的各种应用中使用。
translated by 谷歌翻译
Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.
translated by 谷歌翻译
面部识别系统必须处理可能导致匹配决策不正确的大型变量(例如不同的姿势,照明和表达)。这些可变性可以根据面部图像质量来测量,这在样本的效用上定义了用于识别的实用性。以前的识别作品不使用这种有价值的信息或利用非本质上的质量估算。在这项工作中,我们提出了一种简单且有效的面部识别解决方案(Qmagface),其将质量感知的比较分数与基于大小感知角裕度损耗的识别模型相结合。所提出的方法包括比较过程中特定于模型的面部图像质量,以增强在无约束情况下的识别性能。利用利用损失诱导的质量与其比较评分之间的线性,我们的质量意识比较功能简单且高度普遍。在几个面部识别数据库和基准上进行的实验表明,引入的质量意识导致识别性能一致的改进。此外,所提出的Qmagface方法在挑战性环境下特别好,例如交叉姿势,跨年或跨品。因此,它导致最先进的性能在几个面部识别基准上,例如在XQLFQ上的98.50%,83.97%,CFP-FP上的98.74%。 QMagface的代码是公开可用的。
translated by 谷歌翻译
面部图像的质量显着影响底层识别算法的性能。面部图像质量评估(FIQA)估计捕获的图像的效用在实现可靠和准确的识别性能方面。在这项工作中,我们提出了一种新的学习范式,可以在培训过程中学习内部网络观察。基于此,我们所提出的CR-FiQA使用该范例来通过预测其相对分类性来估计样品的面部图像质量。基于关于其类中心和最近的负类中心的角度空间中的训练样本特征表示来测量该分类性。我们通过实验说明了面部图像质量与样本相对分类性之间的相关性。由于此类属性仅为培训数据集可观察到,因此我们建议从培训数据集中学习此属性,并利用它来预测看不见样品的质量措施。该培训同时执行,同时通过用于面部识别模型训练的角度裕度罚款的软墨损失来优化类中心。通过对八个基准和四个面部识别模型的广泛评估实验,我们展示了我们提出的CR-FiQA在最先进(SOTA)FIQ算法上的优越性。
translated by 谷歌翻译
机器学习模型通常会遇到与训练分布不同的样本。无法识别分布(OOD)样本,因此将该样本分配给课堂标签会显着损害模​​型的可靠性。由于其对在开放世界中的安全部署模型的重要性,该问题引起了重大关注。由于对所有可能的未知分布进行建模的棘手性,检测OOD样品是具有挑战性的。迄今为止,一些研究领域解决了检测陌生样本的问题,包括异常检测,新颖性检测,一级学习,开放式识别识别和分布外检测。尽管有相似和共同的概念,但分别分布,开放式检测和异常检测已被独立研究。因此,这些研究途径尚未交叉授粉,创造了研究障碍。尽管某些调查打算概述这些方法,但它们似乎仅关注特定领域,而无需检查不同领域之间的关系。这项调查旨在在确定其共同点的同时,对各个领域的众多著名作品进行跨域和全面的审查。研究人员可以从不同领域的研究进展概述中受益,并协同发展未来的方法。此外,据我们所知,虽然进行异常检测或单级学习进行了调查,但没有关于分布外检测的全面或最新的调查,我们的调查可广泛涵盖。最后,有了统一的跨域视角,我们讨论并阐明了未来的研究线,打算将这些领域更加紧密地融为一体。
translated by 谷歌翻译
位置识别是同时定位和映射(SLAM)和空间感知的关键。但是,野外的地方识别通常会因图像变化(例如改变观点和街头外观)而产生错误的预测。将不确定性估计纳入地点识别的生命周期是减轻变化对位置识别性能的影响的有前途的方法。但是,这种静脉的现有不确定性估计方法要么是计算效率低下(例如蒙特卡洛辍学),要么以降低准确性为代价。本文提出了Stun,这是一个自学框架,该框架学会同时预测位置并估计给定输入图像的预测不确定性。为此,我们首先使用标准的度量学习管道训练老师网培训网络,以生产嵌入培训。然后,在经过预告片的教师网络监督的情况下,培训了一个具有额外差异分支的学生网,以对嵌入先验的培训进行训练,并按样本估算不确定性样本。在在线推理阶段,我们仅使用学生网与不确定性结合产生位置预测。与对不确定性一无所知的位置识别系统相比,我们的框架具有自由估计的不确定性估计而无需牺牲任何预测准确性。我们对大规模匹兹堡30K数据集的实验结果表明,昏迷在识别精度和不确定性估计质量方面的表现都优于最先进的方法。
translated by 谷歌翻译
Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the K positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.
translated by 谷歌翻译
Deep neural networks have attained remarkable performance when applied to data that comes from the same distribution as that of the training set, but can significantly degrade otherwise. Therefore, detecting whether an example is out-of-distribution (OoD) is crucial to enable a system that can reject such samples or alert users. Recent works have made significant progress on OoD benchmarks consisting of small image datasets. However, many recent methods based on neural networks rely on training or tuning with both in-distribution and out-of-distribution data. The latter is generally hard to define a-priori, and its selection can easily bias the learning. We base our work on a popular method ODIN 1 [21], proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance. We specifically propose to decompose confidence scoring as well as a modified input pre-processing method. We show that both of these significantly help in detection performance. Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference in the difficulty of the problem, providing an analysis of when ODIN-like strategies do or do not work.
translated by 谷歌翻译
在过去的几年中,关于分类,检测和分割问题的3D学习领域取得了重大进展。现有的绝大多数研究都集中在规范的封闭式条件上,忽略了现实世界的内在开放性。这限制了需要管理新颖和未知信号的自主系统的能力。在这种情况下,利用3D数据可以是有价值的资产,因为它传达了有关感应物体和场景几何形状的丰富信息。本文提供了关于开放式3D学习的首次广泛研究。我们介绍了一种新颖的测试床,其设置在类别语义转移方面的难度增加,并且涵盖了内域(合成之间)和跨域(合成对真实)场景。此外,我们研究了相关的分布情况,并开放了2D文献,以了解其最新方法是否以及如何在3D数据上有效。我们广泛的基准测试在同一连贯的图片中定位了几种算法,从而揭示了它们的优势和局限性。我们的分析结果可能是未来量身定制的开放式3D模型的可靠立足点。
translated by 谷歌翻译
自动面部识别是一个知名的研究领域。在该领域的最后三十年的深入研究中,已经提出了许多不同的面部识别算法。随着深度学习的普及及其解决各种不同问题的能力,面部识别研究人员集中精力在此范式下创建更好的模型。从2015年开始,最先进的面部识别就植根于深度学习模型。尽管有大规模和多样化的数据集可用于评估面部识别算法的性能,但许多现代数据集仅结合了影响面部识别的不同因素,例如面部姿势,遮挡,照明,面部表情和图像质量。当算法在这些数据集上产生错误时,尚不清楚哪些因素导致了此错误,因此,没有指导需要多个方向进行更多的研究。这项工作是我们以前在2014年开发的作品的后续作品,最终于2016年发表,显示了各种面部方面对面部识别算法的影响。通过将当前的最新技术与过去的最佳系统进行比较,我们证明了在强烈的遮挡下,某些类型的照明和强烈表达的面孔是深入学习算法所掌握的问题,而具有低分辨率图像的识别,极端的姿势变化和开放式识别仍然是一个开放的问题。为了证明这一点,我们使用六个不同的数据集和五种不同的面部识别算法以开源和可重现的方式运行一系列实验。我们提供了运行所有实验的源代码,这很容易扩展,因此在我们的评估中利用自己的深网只有几分钟的路程。
translated by 谷歌翻译
开放式识别使深度神经网络(DNN)能够识别未知类别的样本,同时在已知类别的样本上保持高分类精度。基于自动编码器(AE)和原型学习的现有方法在处理这项具有挑战性的任务方面具有巨大的潜力。在这项研究中,我们提出了一种新的方法,称为类别特定的语义重建(CSSR),该方法整合了AE和原型学习的力量。具体而言,CSSR用特定于类的AE表示的歧管替代了原型点。与传统的基于原型的方法不同,CSSR在单个AE歧管上的每个已知类模型,并通过AE的重建误差来测量类归属感。特定于类的AE被插入DNN主链的顶部,并重建DNN而不是原始图像所学的语义表示。通过端到端的学习,DNN和AES互相促进,以学习歧视性和代表性信息。在多个数据集上进行的实验结果表明,所提出的方法在封闭式和开放式识别中都达到了出色的性能,并且非常简单且灵活地将其纳入现有框架中。
translated by 谷歌翻译
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.
translated by 谷歌翻译
检测到分布输入对于在现实世界中安全部署机器学习模型至关重要。然而,已知神经网络遭受过度自信的问题,在该问题中,它们对分布和分布的输入的信心异常高。在这项工作中,我们表明,可以通过在训练中实施恒定的向量规范来通过logit归一化(logitnorm)(logitnorm)来缓解此问题。我们的方法是通过分析的激励,即logit的规范在训练过程中不断增加,从而导致过度自信的产出。因此,LogitNorm背后的关键思想是将网络优化期间输出规范的影响解散。通过LogitNorm培训,神经网络在分布数据和分布数据之间产生高度可区分的置信度得分。广泛的实验证明了LogitNorm的优势,在公共基准上,平均FPR95最高为42.30%。
translated by 谷歌翻译
自主驾驶应用中的对象检测意味着语义对象的检测和跟踪通常是城市驾驶环境的原产,作为行人和车辆。最先进的基于深度学习的物体检测中的主要挑战之一是假阳性,其出现过于自信得分。由于安全问题,这在自动驾驶和其他关键机器人感知域中是非常不可取的。本文提出了一种通过将新的概率层引入测试中的深度对象检测网络来缓解过度自信预测问题的方法。建议的方法避免了传统的乙状结肠或Softmax预测层,其通常产生过度自信预测。证明所提出的技术在不降低真实阳性上的性能的情况下降低了误报的过度频率。通过yolov4和第二(基于LiDar的探测器)对2D-Kitti异点检测验证了该方法。该方法使得能够实现可解释的概率预测,而无需重新培训网络,因此非常实用。
translated by 谷歌翻译
Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by L 2 normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.
translated by 谷歌翻译
学习歧视性面部特征在建立高性能面部识别模型方面发挥着重要作用。最近的最先进的面部识别解决方案,提出了一种在常用的分类损失函数,Softmax损失中纳入固定的惩罚率,通过最大限度地减少级别的变化来增加面部识别模型的辨别力并最大化级别的帧间变化。边缘惩罚Softmax损失,如arcFace和Cosface,假设可以使用固定的惩罚余量同样地学习不同身份之间的测地距。然而,这种学习目标对于具有不一致的间帧内变化的真实数据并不是现实的,这可能限制了面部识别模型的判别和概括性。在本文中,我们通过提出弹性罚款损失(弹性面)来放松固定的罚款边缘约束,这允许在推动阶级可分离性中灵活性。主要思想是利用从每个训练迭代中的正常分布中汲取的随机保证金值。这旨在提供决策边界机会,以提取和缩回,以允许灵活的类别可分离学习的空间。我们展示了在大量主流基准上使用相同的几何变换,展示了我们的弹性面损失和COSFace损失的优势。从更广泛的角度来看,我们的弹性面在九个主流基准中提出了最先进的面部识别性能。
translated by 谷歌翻译