在某些面部识别应用中,我们有兴趣验证个人是否是小组的成员,而无需透露其身份。一些现有的方法提出了一种将预定的面部描述量化为离散嵌入的机制,并将其汇总为一个组表示。但是,只有针对给定的封闭的个体进行了优化,并且需要每次更改组时从头开始学习组表示。在本文中,我们提出了一个深厚的建筑,该建筑共同学习面对描述符和更好的端到端表演的聚合机制。该系统可以应用于新的团体,该团体以前从未见过,该计划可以轻松管理新的会员或成员的结局。我们通过在多个大型野生面数据集上进行的实验表明,与其他基线相比,该方法导致验证性能更高。
translated by 谷歌翻译
Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.
translated by 谷歌翻译
在本文中,我们首先尝试调查深度哈希学习与车辆重新识别的集成。我们提出了一个深度哈希的车辆重新识别框架,被称为DVHN,这基本上减少了存储器使用,并在预留最接近的邻居搜索精度的同时提高检索效率。具体地,〜DVHN通过联合优化特征学习网络和哈希码生成模块,直接为每个图像直接学习离散的紧凑型二进制哈希代码。具体地,我们直接将来自卷积神经网络的输出限制为离散二进制代码,并确保学习的二进制代码是对分类的最佳选择。为了优化深度离散散列框架,我们进一步提出了一种用于学习二进制相似性保存散列代码的交替最小化方法。在两个广泛研究的车辆重新识别数据集 - \ textbf {sportid}和\ textbf {veri} - 〜〜\ textbf {veri} - 〜已经证明了我们对最先进的深哈希方法的方法的优越性。 2048美元的TextBF {DVHN}价格可以实现13.94 \%和10.21 \%的准确性改进\ textbf {map}和\ textbf {stuckbf {stank @ 1}的\ textbf {stuckid(800)} dataSet。对于\ textbf {veri},我们分别实现了35.45 \%和32.72 \%\ textbf {rank @ 1}和\​​ textbf {map}的性能增益。
translated by 谷歌翻译
As automated face recognition applications tend towards ubiquity, there is a growing need to secure the sensitive face data used within these systems. This paper presents a survey of biometric template protection (BTP) methods proposed for securing face templates (images/features) in neural-network-based face recognition systems. The BTP methods are categorised into two types: Non-NN and NN-learned. Non-NN methods use a neural network (NN) as a feature extractor, but the BTP part is based on a non-NN algorithm, whereas NN-learned methods employ a NN to learn a protected template from the unprotected template. We present examples of Non-NN and NN-learned face BTP methods from the literature, along with a discussion of their strengths and weaknesses. We also investigate the techniques used to evaluate these methods in terms of the three most common BTP criteria: recognition accuracy, irreversibility, and renewability/unlinkability. The recognition accuracy of protected face recognition systems is generally evaluated using the same (empirical) techniques employed for evaluating standard (unprotected) biometric systems. However, most irreversibility and renewability/unlinkability evaluations are found to be based on theoretical assumptions/estimates or verbal implications, with a lack of empirical validation in a practical face recognition context. So, we recommend a greater focus on empirical evaluations to provide more concrete insights into the irreversibility and renewability/unlinkability of face BTP methods in practice. Additionally, an exploration of the reproducibility of the studied BTP works, in terms of the public availability of their implementation code and evaluation datasets/procedures, suggests that it would be difficult to faithfully replicate most of the reported findings. So, we advocate for a push towards reproducibility, in the hope of advancing face BTP research.
translated by 谷歌翻译
近年来,已经产生了大量的视觉内容,并从许多领域共享,例如社交媒体平台,医学成像和机器人。这种丰富的内容创建和共享引入了新的挑战,特别是在寻找类似内容内容的图像检索(CBIR)-A的数据库中,即长期建立的研究区域,其中需要改进的效率和准确性来实时检索。人工智能在CBIR中取得了进展,并大大促进了实例搜索过程。在本调查中,我们审查了最近基于深度学习算法和技术开发的实例检索工作,通过深网络架构类型,深度功能,功能嵌入方法以及网络微调策略组织了调查。我们的调查考虑了各种各样的最新方法,在那里,我们识别里程碑工作,揭示各种方法之间的联系,并呈现常用的基准,评估结果,共同挑战,并提出未来的未来方向。
translated by 谷歌翻译
We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks, and improves over current stateof-the-art compact image representations on standard image retrieval benchmarks.
translated by 谷歌翻译
Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the K positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.
translated by 谷歌翻译
监督基于深度学习的哈希和矢量量化是实现快速和大规模的图像检索系统。通过完全利用标签注释,与传统方法相比,它们正在实现出色的检索性能。但是,令人生心的是为大量训练数据准确地分配标签,并且还有注释过程易于出错。为了解决这些问题,我们提出了第一款深度无监督的图像检索方法被称为自我监督的产品量化(SPQ)网络,该方法是无标签和以自我监督的方式培训的。我们通过比较单独转换的图像(视图)来设计一个交叉量化的对比学习策略,该横向学习策略共同学习码字和深视觉描述符。我们的方法分析了图像内容以提取描述性功能,允许我们理解图像表示以准确检索。通过对基准进行广泛的实验,我们证明该方法即使没有监督预测,也会产生最先进的结果。
translated by 谷歌翻译
In modern face recognition, the conventional pipeline consists of four stages: detect ⇒ align ⇒ represent ⇒ classify. We revisit both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight sharing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an identity labeled dataset of four million facial images belonging to more than 4,000 identities. The learned representations coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance.
translated by 谷歌翻译
当前用于面部识别的模型(FR)中存在人口偏见。我们在野外(BFW)数据集中平衡的面孔是衡量种族和性别亚组偏见的代理,使一个人可以表征每个亚组的FR表现。当单个分数阈值确定样本对是真实还是冒名顶替者时,我们显示的结果是非最佳选择的。在亚组中,性能通常与全球平均水平有很大差异。因此,仅适用于与验证数据相匹配的人群的特定错误率。我们使用新的域适应性学习方案来减轻性能不平衡,以使用最先进的神经网络提取的面部特征。该技术平衡了性能,但也可以提高整体性能。该建议的好处是在面部特征中保留身份信息,同时减少其所包含的人口统计信息。人口统计学知识的去除阻止了潜在的未来偏见被注入决策。由于对个人的可用信息或推断,因此此删除可改善隐私。我们定性地探索这一点;我们还定量地表明,亚组分类器不再从提出的域适应方案的特征中学习。有关源代码和数据描述,请参见https://github.com/visionjo/facerec-bias-bfw。
translated by 谷歌翻译
Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points-for example, specific locations-in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.
translated by 谷歌翻译
我们提出了一种Saimaa环形密封(Pusa hispida saimensis)的方法。通过摄像机捕获和众包访问大型图像量,为动物监测和保护提供了新的可能性,并呼吁自动分析方法,特别是在重新识别图像中的单个动物时。所提出的方法通过PELAGE模式聚合(NORPPA)重新识别新型环形密封件,利用Saimaa环形密封件的永久和独特的毛线模式和基于内容的图像检索技术。首先,对查询图像进行了预处理,每个密封实例都进行了分段。接下来,使用基于U-NET编码器解码器的方法提取密封件的层模式。然后,将基于CNN的仿射不变特征嵌入并聚集到Fisher载体中。最后,使用Fisher载体之间的余弦距离用于从已知个体数据库中找到最佳匹配。我们在新的挑战性Saimaa环形密封件重新识别数据集上对该方法进行了各种修改的广泛实验。在与替代方法的比较中,提出的方法显示出在我们的数据集上产生最佳的重新识别精度。
translated by 谷歌翻译
Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.
translated by 谷歌翻译
随着对手工卫生的需求不断增长和使用的便利性,掌上识别最近具有淡淡的发展,为人识别提供了有效的解决方案。尽管已经致力于该地区的许多努力,但仍然不确定无接触棕榈污染的辨别能力,特别是对于大规模数据集。为了解决问题,在本文中,我们构建了一个大型无尺寸的棕榈纹数据集,其中包含了来自1167人的2334个棕榈手机。为了我们的最佳知识,它是有史以来最大的非接触式手掌形象基准,而是关于个人和棕榈树的数量收集。此外,我们提出了一个名为3DCPN(3D卷积棕榈识别网络)的无棕榈识别的新型深度学习框架,它利用3D卷积来动态地集成多个Gabor功能。在3DCPN中,嵌入到第一层中的新颖变体以增强曲线特征提取。通过精心设计的集合方案,然后将低级别的3D功能卷积以提取高级功能。最后在顶部,我们设置了基于地区的损失功能,以加强全局和本地描述符的辨别能力。为了展示我们方法的优越性,在我们的数据集和其他流行数据库同济和IITD上进行了广泛的实验,其中结果显示了所提出的3DCPN实现最先进的或可比性的性能。
translated by 谷歌翻译
In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians).The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity.To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VG-GFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on the face recognition of IJB datasets, exceeding the previous state-of-the-art by a large margin. The dataset and models are publicly available 1 .
translated by 谷歌翻译
许多现有人员的重新识别(RE-ID)方法取决于特征图,这些特征图可以分区以定位一个人的部分或减少以创建全球表示形式。尽管部分定位已显示出显着的成功,但它使用了基于位置的分区或静态特征模板。但是,这些假设假设零件在给定图像或其位置中的先前存在,忽略了特定于图像的信息,这些信息限制了其在挑战性场景中的可用性,例如用部分遮挡和部分探针图像进行重新添加。在本文中,我们介绍了一个基于空间注意力的动态零件模板初始化模块,该模块在主链的早期层中使用中级语义特征动态生成零件序列。遵循自发注意力的层,使用简化的跨注意方案来使用主链的人体部分特征来提取各种人体部位的模板特征,提高整个模型的判别能力。我们进一步探索零件描述符的自适应加权,以量化局部属性的缺失或阻塞,并抑制相应零件描述子对匹配标准的贡献。关于整体,遮挡和部分重新ID任务基准的广泛实验表明,我们提出的架构能够实现竞争性能。代码将包含在补充材料中,并将公开提供。
translated by 谷歌翻译
生物识别技术,尤其是人脸识别,已成为全球身份管理系统的重要组成部分。在Biometrics的部署中,必须安全地存储生物信息,以保护用户的隐私是必要的。在此上下文中,生物识别密码系统旨在满足生物识别信息保护的关键要求,从而实现隐私保留存储和生物识别数据的比较。该工作调查了通过深卷积神经网络提取的面部特征向量的改进的模糊Vault方案的应用。为此,引入了一个特征转换方法,将固定长度的实值深度特征向量映射到整数值的功能集。作为所述特征变换的一部分,进行了对不同特征量化和二碳技术的详细分析。在关键绑定处,获得的特征集被锁定在可解释的改进的模糊拱顶中。对于关键检索,研究了不同多项式重建技术的效率。所提出的特征转换方法和模板保护方案是生物识别特性的不可知。在实验中,构造了基于可解释的改进的深面模糊Vault基础模板保护方案,采用用培训的最先进的深卷积神经网络提取的特征,该特征在接受附加角度损失(arcFace)。为了最佳配置,在Furet和FRGCV2面部数据库的跨数据库实验中实现了0.01%的假匹配速率低于1%以下的假非匹配率。平均而言,获得高达约28位的安全级别。这项工作提出了一个有效的面基模糊Vault方案,提供了面部参考数据的隐私保护以及从脸部的数字键推导。
translated by 谷歌翻译
使用卷积神经网络,面部属性(例如,年龄和吸引力)估算性能得到了大大提高。然而,现有方法在培训目标和评估度量之间存在不一致,因此它们可能是次优。此外,这些方法始终采用具有大量参数的图像分类或面部识别模型,其携带昂贵的计算成本和存储开销。在本文中,我们首先分析了两种最新方法(排名CNN和DLDL)之间的基本关系,并表明排名方法实际上是隐含的学习标签分布。因此,该结果首先将两个现有的最新方法统一到DLDL框架中。其次,为了减轻不一致和降低资源消耗,我们设计了一种轻量级网络架构,并提出了一个统一的框架,可以共同学习面部属性分发和回归属性值。在面部年龄和吸引力估算任务中都证明了我们的方法的有效性。我们的方法使用单一模型实现新的最先进的结果,使用36美元\倍,参数减少3美元,在面部年龄/吸引力估算上的推动速度为3美元。此外,即使参数的数量进一步降低到0.9m(3.8MB磁盘存储),我们的方法也可以实现与最先进的结果。
translated by 谷歌翻译
近年来,围面识别被制定为有价值的生物识别方法,特别是在野生环境中(例如,由于Covid-19大流行导致的遮阳面),其中面部识别可能不适用。本文提出了一种名为基于属性的深周相识别(ADPR)的新的深周围识别框架,其预测软生物学测量,并将预测结合到周边识别算法中,以确定具有高精度的围绕围绕围绕图像的标识。我们提出了一个端到端的框架,它使用了几个共享卷积神经网络(CNN)层(公共网络),其输出馈送两个单独的专用分支(模态专用层);第一分支在第二分支预测软管生物识别技术的同时分类周边图像。接下来,来自这两个分支的特征融合在一起以获得最终的周边识别。所提出的方法与现有方法不同,因为它不仅使用共享的CNN特征空间来共同培训这两个任务,而且还融合了预测的软生物识别功能,具有训练步骤中的周边特征,以提高整体周边识别性能。我们的建议模型使用四个不同的公共数据集进行了广泛的评估。实验结果表明,基于软生物识别的外观识别方法优于野生环境中的其他最先进的方法。
translated by 谷歌翻译
Privacy of machine learning models is one of the remaining challenges that hinder the broad adoption of Artificial Intelligent (AI). This paper considers this problem in the context of image datasets containing faces. Anonymization of such datasets is becoming increasingly important due to their central role in the training of autonomous cars, for example, and the vast amount of data generated by surveillance systems. While most prior work de-identifies facial images by modifying identity features in pixel space, we instead project the image onto the latent space of a Generative Adversarial Network (GAN) model, find the features that provide the biggest identity disentanglement, and then manipulate these features in latent space, pixel space, or both. The main contribution of the paper is the design of a feature-preserving anonymization framework, StyleID, which protects the individuals' identity, while preserving as many characteristics of the original faces in the image dataset as possible. As part of the contribution, we present a novel disentanglement metric, three complementing disentanglement methods, and new insights into identity disentanglement. StyleID provides tunable privacy, has low computational complexity, and is shown to outperform current state-of-the-art solutions.
translated by 谷歌翻译