High-quality annotated images are significant to deep facial expression recognition (FER) methods. However, uncertain labels, mostly existing in large-scale public datasets, often mislead the training process. In this paper, we achieve uncertain label correction of facial expressions using auxiliary action unit (AU) graphs, called ULC-AG. Specifically, a weighted regularization module is introduced to highlight valid samples and suppress category imbalance in every batch. Based on the latent dependency between emotions and AUs, an auxiliary branch using graph convolutional layers is added to extract the semantic information from graph topologies. Finally, a re-labeling strategy corrects the ambiguous annotations by comparing their feature similarities with semantic templates. Experiments show that our ULC-AG achieves 89.31% and 61.57% accuracy on RAF-DB and AffectNet datasets, respectively, outperforming the baseline and state-of-the-art methods.
translated by 谷歌翻译
Deep models for facial expression recognition achieve high performance by training on large-scale labeled data. However, publicly available datasets contain uncertain facial expressions caused by ambiguous annotations or confusing emotions, which could severely decline the robustness. Previous studies usually follow the bias elimination method in general tasks without considering the uncertainty problem from the perspective of different corresponding sources. In this paper, we propose a novel method of multi-task assisted correction in addressing uncertain facial expression recognition called MTAC. Specifically, a confidence estimation block and a weighted regularization module are applied to highlight solid samples and suppress uncertain samples in every batch. In addition, two auxiliary tasks, i.e., action unit detection and valence-arousal measurement, are introduced to learn semantic distributions from a data-driven AU graph and mitigate category imbalance based on latent dependencies between discrete and continuous emotions, respectively. Moreover, a re-labeling strategy guided by feature-level similarity constraint further generates new labels for identified uncertain samples to promote model learning. The proposed method can flexibly combine with existing frameworks in a fully-supervised or weakly-supervised manner. Experiments on RAF-DB, AffectNet, and AffWild2 datasets demonstrate that the MTAC obtains substantial improvements over baselines when facing synthetic and real uncertainties and outperforms the state-of-the-art methods.
translated by 谷歌翻译
Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well.
translated by 谷歌翻译
现实世界的面部表达识别(FER)数据集遭受吵闹的注释,由于众包,表达式的歧义,注释者的主观性和类间的相似性。但是,最近的深层网络具有强大的能力,可以记住嘈杂的注释导致腐蚀功能嵌入和泛化不良的能力。为了处理嘈杂的注释,我们提出了一个动态FER学习框架(DNFER),其中根据训练过程中的动态类特定阈值选择了干净的样品。具体而言,DNFER基于使用选定的干净样品和使用所有样品的无监督培训的监督培训。在训练过程中,每个微型批次的平均后类概率被用作动态类特异性阈值,以选择干净的样品进行监督训练。该阈值与噪声率无关,与其他方法不同,不需要任何干净的数据。此外,要从所有样品中学习,使用无监督的一致性损失对齐弱调节图像和强大图像之间的后验分布。我们证明了DNFER在合成和实际噪声注释的FER数据集(如RaFDB,Ferplus,Sfew和altimpnet)上的鲁棒性。
translated by 谷歌翻译
尽管在过去的几年中取得了重大进展,但歧义仍然是面部表情识别(FER)的关键挑战。它可能导致嘈杂和不一致的注释,这阻碍了现实世界中深度学习模型的性能。在本文中,我们提出了一种新的不确定性标签分布学习方法,以提高深层模型的鲁棒性,以防止不确定性和歧义。我们利用价值空间中的邻里信息来适应培训训练样本的情绪分布。我们还考虑提供的标签将其纳入标签分布时的不确定性。我们的方法可以轻松地集成到深层网络中,以获得更多的培训监督并提高识别准确性。在各种嘈杂和模棱两可的环境下,在几个数据集上进行了密集的实验表明,我们的方法取得了竞争成果,并且超出了最新的最新方法。我们的代码和模型可在https://github.com/minhnhatvt/label-distribution-learning-fer-tf上找到。
translated by 谷歌翻译
由于类间的相似性和注释歧义,嘈杂的标签面部表达识别(FER)比传统的嘈杂标签分类任务更具挑战性。最近的作品主要通过过滤大量损坏样本来解决此问题。在本文中,我们从新功能学习的角度探索了嘈杂的标签。我们发现,FER模型通过专注于可以认为与嘈杂标签相关的一部分来记住嘈杂的样本,而不是从导致潜在真理的整个功能中学习。受到的启发,我们提出了一种新颖的擦除注意力一致性(EAC)方法,以自动抑制嘈杂的样品。具体而言,我们首先利用面部图像的翻转语义一致性来设计不平衡的框架。然后,我们随机删除输入图像,并使用翻转注意一致性,以防止模型专注于部分特征。 EAC明显优于最先进的嘈杂标签方法,并将其概括地概括为其他类似CIFAR100和Tiny-Imagenet等类别的任务。该代码可在https://github.com/zyh-uaiaaaa/erasing-prestention-consistency中获得。
translated by 谷歌翻译
面部表达识别(FER)遭受由含糊不清的面部图像和注释者的主观性引起的数据不确定性,导致了文化语义和特征协变量转移问题。现有作品通常通过估计噪声分布或通过从干净的数据中学到的知识引导网络培训来纠正标签错误的数据,从而忽略了表达式的关联关系。在这项工作中,我们提出了一种基于自适应的特征归一化(AGFN)方法,以通过将特征分布与表达式结合标准化,以保护FER模型免受数据不确定性。具体而言,我们提出了一个泊松图生成器,以通过采样过程在每个迷你批次中自适应地构造样品的拓扑图,并相应地设计了坐标下降策略来优化提出的网络。我们的方法优于最先进的方法,在基准数据集Ferplus和RAF-DB上,精度为91.84%和91.11%,当错误标记的数据的百分比增加(例如20%)时,我们的网络超越了。现有的工作量显着占3.38%和4.52%。
translated by 谷歌翻译
面价/唤醒,表达和动作单元是面部情感分析中的相关任务。但是,由于各种收集的条件,这些任务仅在野外的性能有限。野外情感行为分析的第四次竞争(ABAW)提供了价值/唤醒,表达和动作单元标签的图像。在本文中,我们介绍了多任务学习框架,以增强野外三个相关任务的性能。功能共享和标签融合用于利用它们的关系。我们对提供的培训和验证数据进行实验。
translated by 谷歌翻译
面部行为分析是一个广泛的主题,具有各种类别,例如面部情绪识别,年龄和性别认识,……许多研究都集中在单个任务上,而多任务学习方法仍然开放,需要更多的研究。在本文中,我们为情感行为分析在野外竞争中的多任务学习挑战提供了解决方案和实验结果。挑战是三个任务的组合:动作单元检测,面部表达识别和偶像估计。为了应对这一挑战,我们引入了一个跨集团模块,以提高多任务学习绩效。此外,还应用面部图来捕获动作单元之间的关联。结果,我们在组织者提供的验证数据上实现了1.24的评估度量,这比0.30的基线结果要好。
translated by 谷歌翻译
本文介绍了我们针对六个基本表达分类的方法论情感行为分析(ABAW)竞赛2022年的曲目。从人为生成的数据中表达并概括为真实数据。由于合成数据和面部动作单元(AU)的客观性的模棱两可,我们求助于AU信息以提高性能,并做出如下贡献。首先,为了使模型适应合成场景,我们使用了预先训练的大规模面部识别数据中的知识。其次,我们提出了一个概念上的框架,称为Au-persuped卷积视觉变压器(AU-CVT),该框架通过与AU或Pseudo Au标签共同训练辅助数据集来显然改善了FER的性能。我们的AU-CVT在验证集上的F1分数为0.6863美元,准确性为$ 0.7433 $。我们工作的源代码在线公开可用:https://github.com/msy1412/abaw4
translated by 谷歌翻译
先前的工作表明,使用顺序学习者学习面部不同组成部分的顺序可以在面部表达识别系统的性能中发挥重要作用。我们提出了Facetoponet,这是面部表达识别的端到端深层模型,它能够学习面部有效的树拓扑。然后,我们的模型遍历学习的树以生成序列,然后将其用于形成嵌入以喂养顺序学习者。设计的模型采用一个流进行学习结构,并为学习纹理提供一个流。结构流着重于面部地标的位置,而纹理流的主要重点是在地标周围的斑块上学习纹理信息。然后,我们通过利用有效的基于注意力的融合策略来融合两个流的输出。我们对四个大型内部面部表达数据集进行了广泛的实验 - 即Alltionnet,FER2013,ExpW和RAF-DB,以及一个实验室控制的数据集(CK+)来评估我们的方法。 Facetoponet在五个数据集中的三个数据集中达到了最新的性能,并在其他两个数据集中获得了竞争结果。我们还执行严格的消融和灵敏度实验,以评估模型中不同组件和参数的影响。最后,我们执行鲁棒性实验,并证明与该地区其他领先方法相比,Facetoponet对阻塞更具稳健性。
translated by 谷歌翻译
Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.
translated by 谷歌翻译
为了解决不同面部表情识别(FER)数据集之间的数据不一致的问题,近年来许多跨域FER方法(CD-FERS)已被广泛设计。虽然每个声明要实现卓越的性能,但由于源/目标数据集和特征提取器的不一致选择,缺乏公平的比较。在这项工作中,我们首先分析了这些不一致的选择造成的性能效果,然后重新实施了一些良好的CD-FER和最近发布的域适应算法。我们确保所有这些算法采用相同的源数据集和特征提取器,以便进行公平CD-FER评估。我们发现大多数主要的领先算法使用对抗性学习来学习整体域的不变功能来缓解域移位。然而,这些算法忽略了局部特征,这些功能在不同的数据集中更可转换,并为细粒度适应提供更详细的内容。为了解决这些问题,我们通过开发新的对抗图表示适应(AGRA)框架,将图形表示传播与对抗域整体局部特征共同适应的对抗。具体地,它首先构建两个图形,以分别在每个域内和跨不同的域内相关的全部和局部区域。然后,它从输入图像中提取整体本地特征,并使用可学习的每类统计分布来初始化相应的图形节点。最后,采用两个堆叠的图形卷积网络(GCNS)在每个域内传播全部本地功能,以探索它们的交互和整体域的不同域,用于全部局部功能共同适应。我们对几个流行的基准进行了广泛和公平的评估,并表明建议的AGRA框架优于以前的最先进的方法。
translated by 谷歌翻译
Understanding the facial expressions of our interlocutor is important to enrich the communication and to give it a depth that goes beyond the explicitly expressed. In fact, studying one's facial expression gives insight into their hidden emotion state. However, even as humans, and despite our empathy and familiarity with the human emotional experience, we are only able to guess what the other might be feeling. In the fields of artificial intelligence and computer vision, Facial Emotion Recognition (FER) is a topic that is still in full growth mostly with the advancement of deep learning approaches and the improvement of data collection. The main purpose of this paper is to compare the performance of three state-of-the-art networks, each having their own approach to improve on FER tasks, on three FER datasets. The first and second sections respectively describe the three datasets and the three studied network architectures designed for an FER task. The experimental protocol, the results and their interpretation are outlined in the remaining sections.
translated by 谷歌翻译
在本文中,我们通过利用全新监督学习来推进面部表情识别(FER)的表现。本领域技术的当前状态通常旨在通过具有有限数量的样本的培训模型来识别受控环境中的面部表达。为了增强学习模型的各种场景的稳健性,我们建议通过利用标记的样本以及大量未标记的数据来执行全能监督学习。特别是,我们首先使用MS-CeleB-1M作为面部池,其中包括大约5,822k未标记的面部图像。然后,采用基于少量标记样品的原始模型来通过进行基于特征的相似性比较来选择具有高度自信心的样本。我们发现以这种全局监督方式构建的新数据集可以显着提高学习的FER模型的泛化能力,并因此提高了性能。然而,随着使用更多的训练样本,需要更多的计算资源和培训时间,在许多情况下通常不能实惠。为了减轻计算资源的要求,我们进一步采用了数据集蒸馏策略,以将目标任务相关知识从新的开采样本中蒸馏,并将其压缩成一组非常小的图像。这种蒸馏的数据集能够提高FER的性能,额外的额外计算成本。我们在五个流行的基准和新构造的数据集中执行广泛的实验,其中可以使用所提出的框架在各种设置下实现一致的收益。我们希望这项工作作为一个坚实的基线,并帮助缓解FER的未来研究。
translated by 谷歌翻译
人类的情感认可是人工智能的积极研究领域,在过去几年中取得了实质性的进展。许多最近的作品主要关注面部区域以推断人类的情感,而周围的上下文信息没有有效地利用。在本文中,我们提出了一种新的深网络,有效地识别使用新的全球局部注意机制的人类情绪。我们的网络旨在独立地从两个面部和上下文区域提取特征,然后使用注意模块一起学习它们。以这种方式,面部和上下文信息都用于推断人类的情绪,从而增强分类器的歧视。密集实验表明,我们的方法超越了最近的最先进的方法,最近的情感数据集是公平的保证金。定性地,我们的全球局部注意力模块可以提取比以前的方法更有意义的注意图。我们网络的源代码和培训模型可在https://github.com/minhnhatvt/glamor-net上获得
translated by 谷歌翻译
用于图像分类的最可公开的数据集是单个标签,而图像在我们的日常生活中是固有的多标记。这种注释差距使得许多预先接受的单标准分类模型在实际情况下失败。该注释问题更加关注空中图像:从传感器收集的空中数据自然地覆盖具有多个标签的相对大的陆地面积,而被广泛可用的注释空中数据集(例如,UCM,AID)是单标记的。作为手动注释的多标签空中图像将是时间/劳动,我们提出了一种新的自我校正综合域适应(SCIDA)方法,用于自动多标签学习。 SCIDA是弱监督,即,自动学习多标签图像分类模型,从使用大量的公共可用的单一标签图像。为实现这一目标,我们提出了一种新颖的标签 - 明智的自我校正(LWC)模块,以更好地探索潜在的标签相关性。该模块还使无监督的域适配(UDA)从单个到多标签数据中可能。对于模型培训,所提出的型号仅使用单一标签信息,但不需要先验知识的多标记数据;它预测了多标签空中图像的标签。在我们的实验中,用单标签的MAI-AID-S和MAI-UCM-S数据集接受培训,所提出的模型直接在收集的多场景空中图像(MAI)数据集上进行测试。
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
面部影响分析仍然是一项艰巨的任务,其设置从实验室控制到野外情况。在本文中,我们提出了新的框架,以应对第四次情感行为分析(ABAW)竞争的两个挑战:i)多任务学习(MTL)挑战和II)从合成数据(LSD)中学习挑战。对于MTL挑战,我们采用SMM-EmotionNet具有更好的特征向量策略。对于LSD挑战,我们建议采用各自的方法来应对单个标签,不平衡分布,微调限制和模型体系结构的选择。竞争的官方验证集的实验结果表明,我们提出的方法的表现优于基线。该代码可在https://github.com/sylyoung/abaw4-hust-ant上找到。
translated by 谷歌翻译
Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neural Networks (CNN). This is mainly because the new proposed modules are difficult to converge well from scratch due to lacking inductive bias and easy to focus on the occlusion and noisy areas. TransFER, a representative transformer-based method for FER, alleviates this with multi-branch attention dropping but brings excessive computations. On the contrary, we present two attentive pooling (AP) modules to pool noisy features directly. The AP modules include Attentive Patch Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to emphasize the most discriminative features while reducing the impacts of less relevant features. The proposed APP is employed to select the most informative patches on CNN features, and ATP discards unimportant tokens in ViT. Being simple to implement and without learnable parameters, the APP and ATP intuitively reduce the computational cost while boosting the performance by ONLY pursuing the most discriminative features. Qualitative results demonstrate the motivations and effectiveness of our attentive poolings. Besides, quantitative results on six in-the-wild datasets outperform other state-of-the-art methods.
translated by 谷歌翻译