深度学习取得了面部识别基准的出色性能,但是对于低分辨率(LR)图像,性能大大降低了。我们提出了一种注意力相似性知识蒸馏方法,该方法将作为教师的高分辨率(HR)网络获得的注意图转移到LR网络中,以提高LR识别性能。受到人类能够基于从HR图像获得的先验知识近似物体区域的人类的启发,我们设计了使用余弦相似性的知识蒸馏损失,以使学生网络的注意力类似于教师网络的注意力。在各种LR面部相关的基准上进行的实验证实了所提出的方法通常改善了LR设置上的识别性能,通过简单地传输良好的注意力图来优于最先进的结果。 https://github.com/gist-ailab/teaching-where-where-to-look在https://github.com/github.com/github.com/phis-look中公开可用。
translated by 谷歌翻译
深度学习的巨大成功主要是由于大规模的网络架构和高质量的培训数据。但是,在具有有限的内存和成像能力的便携式设备上部署最近的深层模型仍然挑战。一些现有的作品通过知识蒸馏进行了压缩模型。不幸的是,这些方法不能处理具有缩小图像质量的图像,例如低分辨率(LR)图像。为此,我们采取了开创性的努力,从高分辨率(HR)图像到达将处理LR图像的紧凑型网络模型中学习的繁重网络模型中蒸馏有用的知识,从而推动了新颖的像素蒸馏的当前知识蒸馏技术。为实现这一目标,我们提出了一名教师助理 - 学生(TAS)框架,将知识蒸馏分解为模型压缩阶段和高分辨率表示转移阶段。通过装备新颖的特点超分辨率(FSR)模块,我们的方法可以学习轻量级网络模型,可以实现与重型教师模型相似的准确性,但参数更少,推理速度和较低分辨率的输入。在三个广泛使用的基准,\即,幼崽200-2011,Pascal VOC 2007和ImageNetsub上的综合实验证明了我们方法的有效性。
translated by 谷歌翻译
分解表示形式通常被用于年龄不变的面部识别(AIFR)任务。但是,这些方法已经达到了一些局限性,(1)具有年龄标签的大规模面部识别(FR)培训数据的要求,这在实践中受到限制; (2)高性能的重型深网架构; (3)他们的评估通常是在与年龄相关的面部数据库上进行的,同时忽略了标准的大规模FR数据库以确保鲁棒性。这项工作提出了一种新颖的轻巧的角度蒸馏(LIAAD)方法,用于克服这些限制的大规模轻量级AIFR。鉴于两个具有不同专业知识的教师,LIAAD引入了学习范式,以有效地提炼老年人的专注和棱角分明的知识,从这些老师到轻量级的学生网络,使其更强大,以更高的fr准确性和稳健的年龄,从而有效地提炼了一个学习范式因素。因此,LIAAD方法能够采用带有和不具有年龄标签的两个FR数据集的优势来训练AIFR模型。除了先前的蒸馏方法主要关注封闭设置问题中的准确性和压缩比,我们的LIAAD旨在解决开放式问题,即大规模的面部识别。对LFW,IJB-B和IJB-C Janus,AgeDB和Megaface-Fgnet的评估证明了拟议方法在轻重量结构上的效率。这项工作还提出了一个新的纵向面部衰老(Logiface)数据库\ footNote {将提供该数据库},以进一步研究未来与年龄相关的面部问题。
translated by 谷歌翻译
最近的卷积神经网络(CNN)的改进 - 基于单图像超分辨率(SISR)方法严重依赖于制造网络架构,而不是发现除了简单地降低回归损耗之外的合适的培训算法。调整知识蒸馏(KD)可以开辟一种方法,以便对SISR进行进一步改进,并且在模型效率方面也是有益的。 KD是一种模型压缩方法,可提高深神经网络(DNN)的性能而不使用其他参数进行测试。它最近越来越敏捷,以提供更好的能力性能权衡。在本文中,我们提出了一种适用于SISR的新型特征蒸馏(FD)方法。我们展示了基于FITNET的FD方法的局限性,它在SISR任务中受到影响,并建议修改现有的FD算法以专注于本地特征信息。此外,我们提出了一种基于教师 - 学生差异的软特征注意方法,其选择性地专注于特定的像素位置以提取特征信息。我们致电我们的方法本地选择性特征蒸馏(LSFD)并验证我们的方法在SISR问题中优于传统的FD方法。
translated by 谷歌翻译
One of the most efficient methods for model compression is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, conventional distillation approaches overlook this fact and use the same hint points as in the early studies. Therefore, we propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. Our method is applicable for any student network, once it is applied on a chosen teacher network. The proposed approach is validated in CIFAR-100 and ImageNet datasets, using various teacher-student pairs and numerous hint distillation methods. Our results show that hint points selected by our algorithm results in superior compression performance compared to state-of-the-art knowledge distillation algorithms on the same student models and datasets.
translated by 谷歌翻译
知识蒸馏(KD)可以有效地将知识从繁琐的网络(教师)转移到紧凑的网络(学生),在某些计算机视觉应用中证明了其优势。知识的表示对于知识转移和学生学习至关重要,这通常以手工制作的方式定义或直接使用中间功能。在本文中,我们建议在教师学生架构下为单像超级分辨率任务提出一种模型 - 不足的元知识蒸馏方法。它提供了一种更灵活,更准确的方法,可以通过知识代表网络(KRNET)的能力来帮助教师通过具有可学习参数的知识传输知识。为了提高知识表示对学生需求的看法能力,我们建议通过采用学生特征以及KRNET中的教师和学生之间的相关性来解决从中间产出到转移知识的转型过程。具体而言,生成纹理感知的动态内核,然后提取要改进的纹理特征,并将相应的教师指导分解为质地监督,以进一步促进高频细节的恢复质量。此外,KRNET以元学习方式进行了优化,以确保知识转移和学生学习有益于提高学生的重建质量。在各种单个图像超分辨率数据集上进行的实验表明,我们所提出的方法优于现有的定义知识表示相关的蒸馏方法,并且可以帮助超分辨率算法实现更好的重建质量,而无需引入任何推理复杂性。
translated by 谷歌翻译
Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is often inevitable due to reduced number of electrodes and coverage of scalp regions of a low-density EEG montage. To address this issue, we introduce knowledge distillation (KD), a learning mechanism developed for transferring knowledge/information between neural network models, to enhance the performance of low-density EEG decoding. Our framework includes a newly proposed similarity-keeping (SK) teacher-student KD scheme that encourages a low-density EEG student model to acquire the inter-sample similarity as in a pre-trained teacher model trained on high-density EEG data. The experimental results validate that our SK-KD framework consistently improves motor-imagery EEG decoding accuracy when number of electrodes deceases for the input EEG data. For both common low-density headphone-like and headband-like montages, our method outperforms state-of-the-art KD methods across various EEG decoding model architectures. As the first KD scheme developed for enhancing EEG decoding, we foresee the proposed SK-KD framework to facilitate the practicality of low-density EEG-based BCI in real-world applications.
translated by 谷歌翻译
Facial action units (FAUs) are critical for fine-grained facial expression analysis. Although FAU detection has been actively studied using ideally high quality images, it was not thoroughly studied under heavily occluded conditions. In this paper, we propose the first occlusion-robust FAU recognition method to maintain FAU detection performance under heavy occlusions. Our novel approach takes advantage of rich information from the latent space of masked autoencoder (MAE) and transforms it into FAU features. Bypassing the occlusion reconstruction step, our model efficiently extracts FAU features of occluded faces by mining the latent space of a pretrained masked autoencoder. Both node and edge-level knowledge distillation are also employed to guide our model to find a mapping between latent space vectors and FAU features. Facial occlusion conditions, including random small patches and large blocks, are thoroughly studied. Experimental results on BP4D and DISFA datasets show that our method can achieve state-of-the-art performances under the studied facial occlusion, significantly outperforming existing baseline methods. In particular, even under heavy occlusion, the proposed method can achieve comparable performance as state-of-the-art methods under normal conditions.
translated by 谷歌翻译
深度神经网络已迅速成为人脸识别(FR)的主流方法。但是,这限制了这些模型的部署,该模型包含了嵌入式和低端设备的极大量参数。在这项工作中,我们展示了一个非常轻巧和准确的FR解决方案,即小组装。我们利用神经结构搜索开发一个新的轻量级脸部架构。我们还提出了一种基于知识蒸馏(KD)的新型培训范式,该培训范式是多步KD,其中知识从教师模型蒸馏到学生模型的培训成熟日的不同阶段。我们进行了详细的消融研究,证明了使用NAS为FR的特定任务而不是一般对象分类的理智,以及我们提出的多步KD的益处。我们对九种不同基准的最先进(SOTA)紧凑型FR模型提供了广泛的实验评估和比较,包括IJB-B,IJB-C和Megaface等大规模评估基准。在考虑相同水平的模型紧凑性时,Pocketnets在九个主流基准上始终如一地推进了SOTA FR性能。使用0.92M参数,我们最小的网络PocketNets-128对最近的SOTA压缩型号实现了非常竞争力的结果,该模型包含多达4M参数。
translated by 谷歌翻译
Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.
translated by 谷歌翻译
由于其能够学习全球关系和卓越的表现,变形金刚引起了很多关注。为了实现更高的性能,将互补知识从变形金刚到卷积神经网络(CNN)是很自然的。但是,大多数现有的知识蒸馏方法仅考虑同源 - 建筑蒸馏,例如将知识从CNN到CNN蒸馏。在申请跨架构方案时,它们可能不合适,例如从变压器到CNN。为了解决这个问题,提出了一种新颖的跨架构知识蒸馏方法。具体而言,引入了部分交叉注意投影仪和小组线性投影仪,而不是直接模仿老师的输出/中级功能,以使学生的功能与教师的功能保持一致。并进一步提出了多视图强大的训练方案,以提高框架的稳健性和稳定性。广泛的实验表明,所提出的方法在小规模和大规模数据集上均优于14个最先进的方法。
translated by 谷歌翻译
In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.
translated by 谷歌翻译
全球Covid-19大流行的出现会给生物识别技术带来新的挑战。不仅是非接触式生物识别选项变得更加重要,而且最近也遇到了频繁的面具的面对面识别。这些掩模会影响前面识别系统的性能,因为它们隐藏了重要的身份信息。在本文中,我们提出了一种掩模不变的面部识别解决方案(MaskInv),其利用训练范例内的模板级知识蒸馏,其旨在产生类似于相同身份的非掩盖面的掩模面的嵌入面。除了蒸馏知识外,学生网络还通过基于边缘的身份分类损失,弹性面,使用遮蔽和非蒙面面的额外指导。在两个真正蒙面面部数据库和具有合成面具的五个主流数据库的逐步消融研究中,我们证明了我们的maskinV方法的合理化。我们所提出的解决方案优于先前的最先进(SOTA)在最近的MFRC-21挑战中的学术解决方案,屏蔽和屏蔽VS非屏蔽,并且还优于MFR2数据集上的先前解决方案。此外,我们证明所提出的模型仍然可以在缺陷的面上表现良好,只有在验证性能下的少量损失。代码,培训的模型以及合成屏蔽数据的评估协议是公开的:https://github.com/fdbtrs/masked-face-recognition-kd。
translated by 谷歌翻译
知识蒸馏(KD)在将学习表征从大型模型(教师)转移到小型模型(学生)方面表现出非常有希望的能力。但是,随着学生和教师之间的容量差距变得更大,现有的KD方法无法获得更好的结果。我们的工作表明,“先验知识”对KD至关重要,尤其是在应用大型老师时。特别是,我们提出了动态的先验知识(DPK),该知识将教师特征的一部分作为特征蒸馏之前的先验知识。这意味着我们的方法还将教师的功能视为“输入”,而不仅仅是``目标''。此外,我们根据特征差距动态调整训练阶段的先验知识比率,从而引导学生在适当的困难中。为了评估所提出的方法,我们对两个图像分类基准(即CIFAR100和Imagenet)和一个对象检测基准(即MS Coco)进行了广泛的实验。结果表明,在不同的设置下,我们方法在性能方面具有优势。更重要的是,我们的DPK使学生模型的表现与教师模型的表现呈正相关,这意味着我们可以通过应用更大的教师进一步提高学生的准确性。我们的代码将公开用于可重复性。
translated by 谷歌翻译
面部识别网络通常展示相对于性别,Skintone等的敏感属性,适用于性别和Skintone,我们观察到网络的面积,网络参加属性的类别。这可能有助于偏见。在这种直觉上建立一种新的基于蒸馏的方法,称为蒸馏和去偏置(D&D),以实施网络以寻求类似的面部区域,而不管属性类别如何。在D&D中,我们从一个属性中培训一类图像的教师网络;例如轻的Skintone。然后从教师蒸馏信息,我们在剩余类别的图像上培训学生网络;例如,黑暗的skintone。特征级蒸馏损失约束学生网络以生成类似教师的表示。这允许学生网络参加所有属性类别的类似面部区域,并使其能够减少偏差。我们还提出了D&D的顶部的第二蒸馏步骤,称为D&D ++。对于D&D ++网络,我们将D&D网络的“未偏见”蒸馏成新的学生网络,D&D ++网络。我们在所有属性类别上培训新网络;例如,光明和黑暗的碳酸根。这有助于我们培训对属性偏差的网络,同时获得比D&D更高的面部验证性能。我们展示D&D ++优于在IJB-C数据集上减少性别和Skintone偏置的现有基线,同时获得比现有的对抗偏置方法更高的面部验证性能。我们评估我们所提出的方法对两个最先进的面部识别网络的有效性:Crystalface和Arcface。
translated by 谷歌翻译
知识蒸馏在模型压缩方面取得了显着的成就。但是,大多数现有方法需要原始的培训数据,而实践中的实际数据通常是不可用的,因为隐私,安全性和传输限制。为了解决这个问题,我们提出了一种有条件的生成数据无数据知识蒸馏(CGDD)框架,用于培训有效的便携式网络,而无需任何实际数据。在此框架中,除了使用教师模型中提取的知识外,我们将预设标签作为额外的辅助信息介绍以培训发电机。然后,训练有素的发生器可以根据需要产生指定类别的有意义的培训样本。为了促进蒸馏过程,除了使用常规蒸馏损失,我们将预设标签视为地面真理标签,以便学生网络直接由合成训练样本类别监督。此外,我们强制学生网络模仿教师模型的注意图,进一步提高了其性能。为了验证我们方法的优越性,我们设计一个新的评估度量称为相对准确性,可以直接比较不同蒸馏方法的有效性。培训的便携式网络通过提出的数据无数据蒸馏方法获得了99.63%,99.07%和99.84%的CIFAR10,CIFAR100和CALTECH101的相对准确性。实验结果表明了所提出的方法的优越性。
translated by 谷歌翻译
最新的深度神经网络模型已在受控的高分辨率面部图像上达到了几乎完美的面部识别精度。但是,当他们使用非常低分辨率的面部图像测试时,它们的性能会大大降低。这在监视系统中尤其重要,在监视系统中,低分辨率探测图像应与高分辨率图库图像匹配。超分辨率技术旨在从低分辨率对应物中产生高分辨率的面部图像。尽管它们能够重建视觉上吸引人的图像,但与身份相关的信息尚未保留。在这里,我们提出了一个具有身份的端到端图像到图像翻译的深度神经网络,该网络能够使其高分辨率的高分辨率面孔超级解决方案,同时保留与身份相关的信息。我们通过训练一个非常深的卷积编码器网络来实现这一目标,并在相应层之间具有对称收缩路径。该网络在多尺度的低分辨率条件下训练了重建和具有身份损失的结合。对我们提出的模型的广泛定量评估表明,它在自然和人工低分辨率的面部数据集甚至看不见的身份方面优于竞争超分辨率和低分辨率的面部识别方法。
translated by 谷歌翻译
最近的深度学习模型在言语增强方面已经达到了高性能。但是,获得快速和低复杂模型而没有明显的性能降解仍然是一项挑战。以前的知识蒸馏研究对言语增强无法解决这个问题,因为它们的输出蒸馏方法在某些方面不符合语音增强任务。在这项研究中,我们提出了基于特征的蒸馏多视图注意转移(MV-AT),以在时域中获得有效的语音增强模型。基于多视图功能提取模型,MV-AT将教师网络的多视图知识传输到学生网络,而无需其他参数。实验结果表明,所提出的方法始终提高瓦伦蒂尼和深噪声抑制(DNS)数据集的各种规模的学生模型的性能。与基线模型相比,使用我们提出的方法(一种用于有效部署的轻巧模型)分别使用了15.4倍和4.71倍(FLOPS),与具有相似性能的基线模型相比,Many-S-8.1GF分别达到了15.4倍和4.71倍。
translated by 谷歌翻译
Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important role in the context of applying artificial neural networks to a variety of tasks from fields such as computer vision and NLP. In this work we show that, by properly defining attention for convolutional neural networks, we can actually use this type of information in order to significantly improve the performance of a student CNN network by forcing it to mimic the attention maps of a powerful teacher network.To that end, we propose several novel methods of transferring attention, showing consistent improvement across a variety of datasets and convolutional neural network architectures. Code and models for our experiments are available at https://github.com/szagoruyko/attention-transfer.
translated by 谷歌翻译
Convolution Neural Networks (CNNs) have been used in various fields and are showing demonstrated excellent performance, especially in Single-Image Super Resolution (SISR). However, recently, CNN-based SISR has numerous parameters and computational costs for obtaining better performance. As one of the methods to make the network efficient, Knowledge Distillation (KD) which optimizes the performance trade-off by adding a loss term to the existing network architecture is currently being studied. KD for SISR is mainly proposed as a feature distillation (FD) to minimize L1-distance loss of feature maps between teacher and student networks, but it does not fully take into account the amount and importance of information that the student can accept. In this paper, we propose a feature-based adaptive contrastive distillation (FACD) method for efficiently training lightweight SISR networks. We show the limitations of the existing feature-distillation (FD) with L1-distance loss, and propose a feature-based contrastive loss that maximizes the mutual information between the feature maps of the teacher and student networks. The experimental results show that the proposed FACD improves not only the PSNR performance of the entire benchmark datasets and scales but also the subjective image quality compared to the conventional FD approach.
translated by 谷歌翻译