基于卷积神经网络的面部伪造检测方法在训练过程中取得了显着的结果,但在测试过程中努力保持可比的性能。我们观察到,检测器比人工制品痕迹更容易专注于内容信息,这表明检测器对数据集的内在偏置敏感,这会导致严重的过度拟合。在这一关键观察的激励下,我们设计了一个易于嵌入的拆卸框架,以删除内容信息,并进一步提出内容一致性约束(C2C)和全球表示对比度约束(GRCC),以增强分解特征的独立性。此外,我们巧妙地构建了两个不平衡的数据集来研究内容偏差的影响。广泛的可视化和实验表明,我们的框架不仅可以忽略内容信息的干扰,而且还可以指导探测器挖掘可疑的人工痕迹并实现竞争性能。
translated by 谷歌翻译
Deep learning has enabled realistic face manipulation (i.e., deepfake), which poses significant concerns over the integrity of the media in circulation. Most existing deep learning techniques for deepfake detection can achieve promising performance in the intra-dataset evaluation setting (i.e., training and testing on the same dataset), but are unable to perform satisfactorily in the inter-dataset evaluation setting (i.e., training on one dataset and testing on another). Most of the previous methods use the backbone network to extract global features for making predictions and only employ binary supervision (i.e., indicating whether the training instances are fake or authentic) to train the network. Classification merely based on the learning of global features leads often leads to weak generalizability to unseen manipulation methods. In addition, the reconstruction task can improve the learned representations. In this paper, we introduce a novel approach for deepfake detection, which considers the reconstruction and classification tasks simultaneously to address these problems. This method shares the information learned by one task with the other, which focuses on a different aspect other existing works rarely consider and hence boosts the overall performance. In particular, we design a two-branch Convolutional AutoEncoder (CAE), in which the Convolutional Encoder used to compress the feature map into the latent representation is shared by both branches. Then the latent representation of the input data is fed to a simple classifier and the unsupervised reconstruction component simultaneously. Our network is trained end-to-end. Experiments demonstrate that our method achieves state-of-the-art performance on three commonly-used datasets, particularly in the cross-dataset evaluation setting.
translated by 谷歌翻译
通过各种面部操作技术产生,由于安全问题,面部伪造检测引起了不断的关注。以前的作品总是根据交叉熵损失将面部伪造检测作为分类问题,这强调了类别级别差异,而不是真实和假面之间的基本差异,限制了看不见的域中的模型概括。为了解决这个问题,我们提出了一种新颖的面部伪造检测框架,名为双重对比学习(DCL),其特殊地构建了正负配对数据,并在不同粒度下进行了设计的对比学习,以学习广义特征表示。具体地,结合硬样品选择策略,首先提出通过特别构造实例对来促进与之相关的鉴别特征学习的任务相关的对比学习策略。此外,为了进一步探索基本的差异,引入内部内部对比学习(INL-ICL),以通过构建内部实例构建局部区域对来关注伪造的面中普遍存在的局部内容不一致。在若干数据集上的广泛实验和可视化证明了我们对最先进的竞争对手的方法的概括。
translated by 谷歌翻译
As ultra-realistic face forgery techniques emerge, deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations despite excellent performance on known forgeries. In this paper, we are motivated by the observation that the discrepancies between real and fake videos are extremely subtle and localized, and inconsistencies or irregularities can exist in some critical facial regions across various information domains. To this end, we propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general deepfake video detection. In the proposed pipeline, a specialized framework is presented to simultaneously exploit local forgery patterns from space, frequency, and time domains, thus learning cross-domain features to detect forgeries. Moreover, the framework leverages four high-level forgery-sensitive local regions of a human face to guide the model to enhance subtle artifacts and localize potential anomalies. Extensive experiments on several benchmark datasets demonstrate the impressive performance of our method, and we achieve superiority over several state-of-the-art methods on cross-dataset generalization. We also examined the factors that contribute to its performance through ablations, which suggests that exploiting cross-domain local characteristics is a noteworthy direction for developing more general deepfake detectors.
translated by 谷歌翻译
随着面部伪造技术的快速发展,DeepFake视频在数字媒体上引起了广泛的关注。肇事者大量利用这些视频来传播虚假信息并发表误导性陈述。大多数现有的DeepFake检测方法主要集中于纹理特征,纹理特征可能会受到外部波动(例如照明和噪声)的影响。此外,基于面部地标的检测方法对外部变量更强大,但缺乏足够的细节。因此,如何在空间,时间和频域中有效地挖掘独特的特征,并将其与面部地标融合以进行伪造视频检测仍然是一个悬而未决的问题。为此,我们提出了一个基于多种模式的信息和面部地标的几何特征,提出了地标增强的多模式图神经网络(LEM-GNN)。具体而言,在框架级别上,我们设计了一种融合机制来挖掘空间和频域元素的联合表示,同时引入几何面部特征以增强模型的鲁棒性。在视频级别,我们首先将视频中的每个帧视为图中的节点,然后将时间信息编码到图表的边缘。然后,通过应用图形神经网络(GNN)的消息传递机制,将有效合并多模式特征,以获得视频伪造的全面表示。广泛的实验表明,我们的方法始终优于广泛使用的基准上的最先进(SOTA)。
translated by 谷歌翻译
Online media data, in the forms of images and videos, are becoming mainstream communication channels. However, recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost, which not only poses a serious threat to the trustworthiness of digital information but also has severe societal implications. This motivates a growing interest of research in media tampering detection, i.e., using deep learning techniques to examine whether media data have been maliciously manipulated. Depending on the content of the targeted images, media forgery could be divided into image tampering and Deepfake techniques. The former typically moves or erases the visual elements in ordinary images, while the latter manipulates the expressions and even the identity of human faces. Accordingly, the means of defense include image tampering detection and Deepfake detection, which share a wide variety of properties. In this paper, we provide a comprehensive review of the current media tampering detection approaches, and discuss the challenges and trends in this field for future research.
translated by 谷歌翻译
随着GAN的出现,面部伪造技术被严重滥用。即将实现准确的伪造检测。受到PPG信号对应于脸部视频中心跳引起的肤色的周期性变化的启发,我们观察到,尽管在伪造过程中不可避免地损失了PPG信号,但仍然存在PPG信号的混合物,但PPG信号的混合伪造视频具有独特的节奏模式,具体取决于其生成方法。在这一关键观察中,我们提出了一个针对面孔检测和分类的框架,包括:1)用于PPG信号过滤的时空滤波网络(STFNET),以及2)用于约束和约束的时空交互网络(stinet) PPG信号的相互作用。此外,通过深入了解伪造方法的产生,我们进一步提出了源头和源中的材料,以提高框架的性能。总体而言,广泛的实验证明了我们方法的优势。
translated by 谷歌翻译
现有的伪造检测方法通常将面部伪造视为二进制分类问题,并采用深层卷积神经网络来学习歧视性特征。理想的判别特征应仅与面部图像的真实/假标签有关。但是,我们观察到,香草分类网络学到的特征与不必要的属性(例如伪造方法和面部身份)相关。这种现象将限制伪造的检测性能,尤其是对于概括能力。在此激励的基础上,我们提出了一种新型方法,该方法利用对抗性学习来消除不同伪造方法和面部身份的负面影响,该方法有助于分类网络学习固有的常见歧视性特征,以进行伪造伪造。为了利用缺乏面部身份的地面真实标签的数据,我们根据来自现成的面部识别模型得出的相似性信息设计了一个特殊的身份歧视器。在对抗性学习的帮助下,我们的伪造检测模型学会了通过消除伪造方法和面部身份的影响来提取共同的歧视特征。广泛的实验证明了该方法在数据集内和交叉数据集评估设置下的有效性。
translated by 谷歌翻译
尽管令人鼓舞的是深泡检测的进展,但由于训练过程中探索的伪造线索有限,对未见伪造类型的概括仍然是一个重大挑战。相比之下,我们注意到Deepfake中的一种常见现象:虚假的视频创建不可避免地破坏了原始视频中的统计规律性。受到这一观察的启发,我们建议通过区分实际视频中没有出现的“规律性中断”来增强深层检测的概括。具体而言,通过仔细检查空间和时间属性,我们建议通过伪捕获生成器破坏真实的视频,并创建各种伪造视频以供培训。这种做法使我们能够在不使用虚假视频的情况下实现深泡沫检测,并以简单有效的方式提高概括能力。为了共同捕获空间和时间上的破坏,我们提出了一个时空增强块,以了解我们自我创建的视频之间的规律性破坏。通过全面的实验,我们的方法在几个数据集上表现出色。
translated by 谷歌翻译
Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.
translated by 谷歌翻译
尽管最近对Deepfake技术的滥用引起了严重的关注,但由于每个帧的光真逼真的合成,如何检测DeepFake视频仍然是一个挑战。现有的图像级方法通常集中在单个框架上,而忽略了深击视频中隐藏的时空提示,从而导致概括和稳健性差。视频级检测器的关键是完全利用DeepFake视频中不同框架的当地面部区域分布在当地面部区域中的时空不一致。受此启发,本文提出了一种简单而有效的补丁级方法,以通过时空辍学变压器促进深击视频检测。该方法将每个输入视频重组成贴片袋,然后将其馈入视觉变压器以实现强大的表示。具体而言,提出了时空辍学操作,以充分探索斑块级时空提示,并作为有效的数据增强,以进一步增强模型的鲁棒性和泛化能力。该操作是灵活的,可以轻松地插入现有的视觉变压器中。广泛的实验证明了我们对25种具有令人印象深刻的鲁棒性,可推广性和表示能力的最先进的方法的有效性。
translated by 谷歌翻译
由于滥用了深层,检测伪造视频是非常可取的。现有的检测方法有助于探索DeepFake视频中的特定工件,并且非常适合某些数据。但是,这些人工制品的不断增长的技术一直在挑战传统的深泡探测器的鲁棒性。结果,这些方法的普遍性的发展已达到阻塞。为了解决这个问题,鉴于经验结果是,深层视频中经常在声音和面部背后的身份不匹配,并且声音和面孔在某种程度上具有同质性,在本文中,我们建议从未开发的语音中执行深层检测 - 面对匹配视图。为此,设计了一种语音匹配方法来测量这两个方法的匹配度。然而,对特定的深泡数据集进行培训使模型过于拟合深层算法的某些特征。相反,我们提倡一种迅速适应未开发的伪造方法的方法,然后进行预训练,然后进行微调范式。具体而言,我们首先在通用音频视频数据集上预先培训该模型,然后在下游深板数据上进行微调。我们对三个广泛利用的DeepFake数据集进行了广泛的实验-DFDC,Fakeavceleb和DeepFaketimit。与其他最先进的竞争对手相比,我们的方法获得了显着的性能增长。还值得注意的是,我们的方法在有限的DeepFake数据上进行了微调时已经取得了竞争性结果。
translated by 谷歌翻译
随着面部伪造技术的快速发展,由于安全问题,伪造的检测引起了越来越多的关注。现有方法尝试使用频率信息在高质量的锻造面上进行微妙的伪影。然而,频率信息的开发是粗糙的,更重要的是,他们的香草学习过程努力提取细粒度的伪造痕迹。为了解决这个问题,我们提出了一个渐进式增强学习框架来利用RGB和细粒度的频率线索。具体而言,我们对RGB图像进行细粒度分解,以在频率空间中完全删除真实的迹线和虚假的迹线。随后,我们提出了一种基于双分支网络的渐进式增强学习框架,结合自增强和互增强模块。自增强模块基于空间噪声增强和渠道注意,捕获不同输入空间中的迹线。通过在共享空间维度中通信,互增强模块同时增强RGB和频率特征。逐步增强过程有助于学习具有细粒面的伪造线索的歧视特征。在多个数据集上进行广泛的实验表明我们的方法优于最先进的面部伪造检测方法。
translated by 谷歌翻译
面部反欺骗(FAS)在防止演示攻击中的人脸识别系统中起着至关重要的作用。由于身份和微不足道的方差不足,现有面部反欺骗数据集缺乏多样性,这限制了FAS模型的泛化能力。在本文中,我们提出了双重欺骗解散生成(DSDG)框架,通过“通过生成反欺骗”来解决这一挑战。根据变形AutiaceDer(VAE)中的可解释分解潜在解剖学,DSDG学习身份表示的联合分布和潜在空间中的欺骗模式表示。然后,可以从随机噪声生成大规模成对的实时和欺骗图像,以提高训练集的分集。然而,由于VAE的固有缺陷,一些产生的面部图像被部分地扭曲。这种嘈杂的样本很难预测精确的深度值,因此可能阻碍广泛使用的深度监督优化。为了解决这个问题,我们进一步引入了轻量级深度不确定性模块(DUM),减轻了噪声样本对深度不确定性学习的不利影响。 DUM在没有依赖性的情况下开发,因此可以灵活地集成与任何深度监督网络进行面部反欺骗。我们评估了提出的方法在五个流行基准上的有效性,并在测试中实现了最先进的结果。该代码可在https://github.com/jdai-cv/facex-zoo/tree/main/addition_module/dsdg中获得。
translated by 谷歌翻译
近年来,随着面部编辑和发电的迅速发展,越来越多的虚假视频正在社交媒体上流传,这引起了极端公众的关注。基于频域的现有面部伪造方法发现,与真实图像相比,GAN锻造图像在频谱中具有明显的网格视觉伪像。但是对于综合视频,这些方法仅局限于单个帧,几乎不关注不同框架之间最歧视的部分和时间频率线索。为了充分利用视频序列中丰富的信息,本文对空间和时间频域进行了视频伪造检测,并提出了一个离散的基于余弦转换的伪造线索增强网络(FCAN-DCT),以实现更全面的时空功能表示。 FCAN-DCT由一个骨干网络和两个分支组成:紧凑特征提取(CFE)模块和频率时间注意(FTA)模块。我们对两个可见光(VIS)数据集Wilddeepfake和Celeb-DF(V2)进行了彻底的实验评估,以及我们的自我构建的视频伪造数据集DeepFakenir,这是第一个近境模式的视频伪造数据集。实验结果证明了我们方法在VIS和NIR场景中检测伪造视频的有效性。
translated by 谷歌翻译
最近,由于社交媒体数字取证中的安全性和隐私问题,DeepFake引起了广泛的公众关注。随着互联网上广泛传播的深层视频变得越来越现实,传统的检测技术未能区分真实和假货。大多数现有的深度学习方法主要集中于使用卷积神经网络作为骨干的局部特征和面部图像中的关系。但是,本地特征和关系不足以用于模型培训,无法学习足够的一般信息以进行深层检测。因此,现有的DeepFake检测方法已达到瓶颈,以进一步改善检测性能。为了解决这个问题,我们提出了一个深度卷积变压器,以在本地和全球范围内纳入决定性图像。具体而言,我们应用卷积池和重新注意事项来丰富提取的特征并增强功效。此外,我们在模型训练中采用了几乎没有讨论的图像关键框架来改进性能,并可视化由视频压缩引起的密钥和正常图像帧之间的特征数量差距。我们最终通过在几个DeepFake基准数据集上进行了广泛的实验来说明可传递性。所提出的解决方案在内部和跨数据库实验上始终优于几个最先进的基线。
translated by 谷歌翻译
近年来,视觉伪造达到了人类无法识别欺诈的复杂程度,这对信息安全构成了重大威胁。出现了广泛的恶意申请,例如名人的假新闻,诽谤或勒索,政治战中的政治家冒充,以及谣言的传播吸引观点。结果,已经提出了一种富有的视觉验证技术,以试图阻止这种危险的趋势。在本文中,我们使用全面的和经验方法,提供了一种基准,可以对视觉伪造和视觉取证进行深入的洞察。更具体地,我们开发一个独立的框架,整合最先进的假冒生成器和探测器,并使用各种标准来测量这些技术的性能。我们还对基准测试结果进行了详尽的分析,确定了在措施与对策之间永无止境的战争中的比较参考的方法的特征。
translated by 谷歌翻译
Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.
translated by 谷歌翻译
With the rapid development of deep generative models (such as Generative Adversarial Networks and Auto-encoders), AI-synthesized images of the human face are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processings, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for face forgery detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a face forgery dataset simulating real-world applications for evaluation, we further create a challenging face forgery dataset, named DeepFakeFaceForensics (DF^3), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF^3 dataset and three other open-source datasets.
translated by 谷歌翻译
面部抗泡沫(FAS)旨在将面部欺骗攻击与真实的攻击区分开,通常通过学习适当的模型来执行相关的分类任务。在实践中,人们期望将这种模型推广到不同图像域中的FAS。此外,假设将事先知道欺骗攻击的类型是不切实际的。在本文中,我们提出了一个深度学习模型,以解决上述域名抗繁殖任务。特别是,我们提出的网络能够将面部无性表示与无关的面部表述(即面部内容和图像域特征)相关。所产生的LIVISE表示表现出足够的域不变特性,因此可以应用于执行域将来的FAS。在我们的实验中,我们在具有各种设置的五个基准数据集上进行实验,并验证我们的模型在识别未见图像域中的新型欺骗攻击方面对最新方法的表现有利。
translated by 谷歌翻译