基于深度学习的面部识别模型容易受到对抗攻击的影响。为了遏制这些攻击,大多数防御方法旨在提高对抗性扰动的识别模型的鲁棒性。但是,这些方法的概括能力非常有限。实际上,它们仍然容易受到看不见的对抗攻击。深度学习模型对于一般的扰动(例如高斯噪音)相当强大。一种直接的方法是使对抗性扰动失活,以便可以轻松地将它们作为一般扰动处理。在本文中,提出了一种称为扰动失活(PIN)的插件对抗防御方法,以使对抗防御的对抗性扰动灭活。我们发现,不同子空间中的扰动对识别模型有不同的影响。应该有一个称为免疫空间的子空间,其中扰动对识别模型的不利影响要比其他子空间更少。因此,我们的方法估计了免疫空间,并通过将它们限制在此子空间中来使对抗性扰动失活。可以将所提出的方法推广到看不见的对抗扰动,因为它不依赖于特定类型的对抗攻击方法。这种方法不仅优于几种最先进的对抗防御方法,而且还通过详尽的实验证明了卓越的概括能力。此外,提出的方法可以成功地应用于四个商业API,而无需额外的培训,这表明可以轻松地将其推广到现有的面部识别系统。源代码可从https://github.com/renmin1991/perturbation in-inactivate获得
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
Adversarial patch is an important form of real-world adversarial attack that brings serious risks to the robustness of deep neural networks. Previous methods generate adversarial patches by either optimizing their perturbation values while fixing the pasting position or manipulating the position while fixing the patch's content. This reveals that the positions and perturbations are both important to the adversarial attack. For that, in this paper, we propose a novel method to simultaneously optimize the position and perturbation for an adversarial patch, and thus obtain a high attack success rate in the black-box setting. Technically, we regard the patch's position, the pre-designed hyper-parameters to determine the patch's perturbations as the variables, and utilize the reinforcement learning framework to simultaneously solve for the optimal solution based on the rewards obtained from the target model with a small number of queries. Extensive experiments are conducted on the Face Recognition (FR) task, and results on four representative FR models show that our method can significantly improve the attack success rate and query efficiency. Besides, experiments on the commercial FR service and physical environments confirm its practical application value. We also extend our method to the traffic sign recognition task to verify its generalization ability.
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
Deep Neural Networks have been widely used in many fields. However, studies have shown that DNNs are easily attacked by adversarial examples, which have tiny perturbations and greatly mislead the correct judgment of DNNs. Furthermore, even if malicious attackers cannot obtain all the underlying model parameters, they can use adversarial examples to attack various DNN-based task systems. Researchers have proposed various defense methods to protect DNNs, such as reducing the aggressiveness of adversarial examples by preprocessing or improving the robustness of the model by adding modules. However, some defense methods are only effective for small-scale examples or small perturbations but have limited defense effects for adversarial examples with large perturbations. This paper assigns different defense strategies to adversarial perturbations of different strengths by grading the perturbations on the input examples. Experimental results show that the proposed method effectively improves defense performance. In addition, the proposed method does not modify any task model, which can be used as a preprocessing module, which significantly reduces the deployment cost in practical applications.
translated by 谷歌翻译
我们提出了一种新颖且有效的纯化基于纯化的普通防御方法,用于预处理盲目的白色和黑匣子攻击。我们的方法仅在一般图像上进行了自我监督学习,在计算上效率和培训,而不需要对分类模型的任何对抗训练或再培训。我们首先显示对原始图像与其对抗示例之间的残余的对抗噪声的实证分析,几乎均为对称分布。基于该观察,我们提出了一种非常简单的迭代高斯平滑(GS),其可以有效地平滑对抗性噪声并实现大大高的鲁棒精度。为了进一步改进它,我们提出了神经上下文迭代平滑(NCIS),其以自我监督的方式列举盲点网络(BSN)以重建GS也平滑的原始图像的辨别特征。从我们使用四种分类模型对大型想象成的广泛实验,我们表明我们的方法既竞争竞争标准精度和最先进的强大精度,则针对最强大的净化器 - 盲目的白色和黑匣子攻击。此外,我们提出了一种用于评估基于商业图像分类API的纯化方法的新基准,例如AWS,Azure,Clarifai和Google。我们通过基于集合转移的黑匣子攻击产生对抗性实例,这可以促进API的完全错误分类,并证明我们的方法可用于增加API的抗逆性鲁棒性。
translated by 谷歌翻译
在难以察觉的对抗性示例攻击时被发现深度神经网络是不稳定的,这对于它施加到需要高可靠性的医学诊断系统是危险的。然而,在自然图像中具有良好效果的防御方法可能不适合医疗诊断任务。预处理方法(例如,随机调整大小,压缩)可能导致医学图像中的小病变特征的损失。在增强的数据集中培训网络对已经在线部署的医疗模型也不实用。因此,有必要为医疗诊断任务设计易于部署和有效的防御框架。在本文中,我们为反对对抗性攻击(即Medrdf)的医疗净化模型提出了较强和初稿的初步诊断框架。它采用了Pertined Medical模型的推理时间。具体地,对于每个测试图像,MEDRDF首先创建它的大量噪声副本,并从预训经医学诊断模型获得这些副本的输出标签。然后,基于这些副本的标签,MEDRDF通过多数投票输出最终的稳健诊断结果。除了诊断结果之外,MedRDF还产生强大的公制(RM)作为结果的置信度。因此,利用MEDRDF将预先训练的非强大诊断模型转换为强大的,是方便且可靠的。 Covid-19和Dermamnist数据集的实验结果验证了MEDRDF在提高医疗模型的稳健性方面的有效性。
translated by 谷歌翻译
Deep neural networks have empowered accurate device-free human activity recognition, which has wide applications. Deep models can extract robust features from various sensors and generalize well even in challenging situations such as data-insufficient cases. However, these systems could be vulnerable to input perturbations, i.e. adversarial attacks. We empirically demonstrate that both black-box Gaussian attacks and modern adversarial white-box attacks can render their accuracies to plummet. In this paper, we firstly point out that such phenomenon can bring severe safety hazards to device-free sensing systems, and then propose a novel learning framework, SecureSense, to defend common attacks. SecureSense aims to achieve consistent predictions regardless of whether there exists an attack on its input or not, alleviating the negative effect of distribution perturbation caused by adversarial attacks. Extensive experiments demonstrate that our proposed method can significantly enhance the model robustness of existing deep models, overcoming possible attacks. The results validate that our method works well on wireless human activity recognition and person identification systems. To the best of our knowledge, this is the first work to investigate adversarial attacks and further develop a novel defense framework for wireless human activity recognition in mobile computing research.
translated by 谷歌翻译
以前的作品表明,自动扬声器验证(ASV)严重易受恶意欺骗攻击,例如重播,合成语音和最近出现的对抗性攻击。巨大的努力致力于捍卫ANV反击重播和合成语音;但是,只有几种方法探讨了对抗对抗攻击。所有现有的解决ASV对抗性攻击方法都需要对对抗性样本产生的知识,但是防守者知道野外攻击者应用的确切攻击算法是不切实际的。这项工作是第一个在不知道特定攻击算法的情况下对ASV进行对抗性防御。灵感来自自我监督的学习模型(SSLMS),其具有减轻输入中的浅表噪声并重建中断的浅层样本的优点,这项工作至于对噪声的对抗扰动以及SSLMS对ASV的对抗性防御。具体而言,我们建议从两种角度进行对抗性防御:1)对抗扰动纯化和2)对抗扰动检测。实验结果表明,我们的检测模块通过检测对抗性样本的精度约为80%,有效地屏蔽了ASV。此外,由于对ASV的对抗防御性能没有共同的指标,因此考虑到纯化和基于净化的方法,这项工作也将评估指标正式地进行对抗防御。我们真诚地鼓励未来的作品基于拟议的评估框架基于拟议的评估框架来基准。
translated by 谷歌翻译
深度神经网络的面部识别模型已显示出容易受到对抗例子的影响。但是,过去的许多攻击都要求对手使用梯度下降来解决输入依赖性优化问题,这使该攻击实时不切实际。这些对抗性示例也与攻击模型紧密耦合,并且在转移到不同模型方面并不那么成功。在这项工作中,我们提出了Reface,这是对基于对抗性转换网络(ATN)的面部识别模型的实时,高度转移的攻击。 ATNS模型对抗性示例生成是馈送前向神经网络。我们发现,纯U-NET ATN的白盒攻击成功率大大低于基于梯度的攻击,例如大型面部识别数据集中的PGD。因此,我们为ATN提出了一个新的架构,该架构缩小了这一差距,同时维持PGD的10000倍加速。此外,我们发现在给定的扰动幅度下,与PGD相比,我们的ATN对抗扰动在转移到新的面部识别模型方面更有效。 Reface攻击可以在转移攻击环境中成功欺骗商业面部识别服务,并将面部识别精度从AWS SearchFaces API和Azure Face验证准确性从91%降低到50.1%,从而将面部识别精度从82%降低到16.4%。
translated by 谷歌翻译
发言人识别系统(SRSS)最近被证明容易受到对抗攻击的影响,从而引发了重大的安全问题。在这项工作中,我们系统地研究了基于确保SRSS的基于对抗性训练的防御。根据SRSS的特征,我们提出了22种不同的转换,并使用扬声器识别的7种最新有前途的对抗攻击(4个白盒和3个Black-Box)对其进行了彻底评估。仔细考虑了国防评估中的最佳实践,我们分析了转换的强度以承受适应性攻击。我们还评估并理解它们与对抗训练相结合的自适应攻击的有效性。我们的研究提供了许多有用的见解和发现,其中许多与图像和语音识别域中的结论是新的或不一致的,例如,可变和恒定的比特率语音压缩具有不同的性能,并且某些不可差的转换仍然有效地抗衡。当前有希望的逃避技术通常在图像域中很好地工作。我们证明,与完整的白色盒子设置中的唯一对抗性训练相比,提出的新型功能级转换与对抗训练相比是相当有效的,例如,将准确性提高了13.62%,而攻击成本则达到了两个数量级,而其他攻击成本则增加了。转型不一定会提高整体防御能力。这项工作进一步阐明了该领域的研究方向。我们还发布了我们的评估平台SpeakerGuard,以促进进一步的研究。
translated by 谷歌翻译
虽然深入学习模型取得了前所未有的成功,但他们对逆势袭击的脆弱性引起了越来越关注,特别是在部署安全关键域名时。为了解决挑战,已经提出了鲁棒性改善的许多辩护策略,包括反应性和积极主动。从图像特征空间的角度来看,由于特征的偏移,其中一些人无法达到满足结果。此外,模型学习的功能与分类结果无直接相关。与他们不同,我们考虑基本上从模型内部进行防御方法,并在攻击前后调查神经元行为。我们观察到,通过大大改变为正确标签的神经元大大改变神经元来误导模型。受其激励,我们介绍了神经元影响的概念,进一步将神经元分为前,中间和尾部。基于它,我们提出神经元水平逆扰动(NIP),第一神经元水平反应防御方法对抗对抗攻击。通过强化前神经元并削弱尾部中的弱化,辊隙可以消除几乎所有的对抗扰动,同时仍然保持高良好的精度。此外,它可以通过适应性,尤其是更大的扰动来应对不同的扰动。在三个数据集和六种模型上进行的综合实验表明,NIP优于最先进的基线对抗11个对抗性攻击。我们进一步通过神经元激活和可视化提供可解释的证据,以便更好地理解。
translated by 谷歌翻译
已经证明,基于度量学习的人重新识别(Reid)系统继承了深度神经网络(DNN)的脆弱性,这很容易被普通ararar公制攻击所迷惑。现有的工作主要依赖于对公制防御的对抗培训,并且没有完全研究更多方法。通过探索攻击对潜在特征的影响,我们提出了针对度量攻击和防御方法的有针对性的方法。在公制攻击方面,我们使用本地颜色偏差来构建输入的类内变化以攻击颜色特征。在公制防御方面,我们提出了一种联合防御方法,包括两个主动防御和被动防御的部分。主动防御有助于通过构建来自多模式图像的不同输入来增强模型到色彩变化的鲁棒性和多种方式的结构关系的学习,并且被动防御通过迂回缩放来利用变化像素空间中的结构特征的不变性以保护结构特征在消除一些对抗噪声的同时。广泛的实验表明,拟议的联合防御与现有的对抗公制防御方法相比,不仅与同时进行多次攻击而且也没有显着降低模型的泛化能力。代码可在https://github.com/finger-monkey/multi-modal_joint_defence上获得。
translated by 谷歌翻译
基于深度神经网络(DNN)的智能信息(IOT)系统已被广泛部署在现实世界中。然而,发现DNNS易受对抗性示例的影响,这提高了人们对智能物联网系统的可靠性和安全性的担忧。测试和评估IOT系统的稳健性成为必要和必要。最近已经提出了各种攻击和策略,但效率问题仍未纠正。现有方法是计算地广泛或耗时,这在实践中不适用。在本文中,我们提出了一种称为攻击启发GaN(AI-GaN)的新框架,在有条件地产生对抗性实例。曾经接受过培训,可以有效地给予对抗扰动的输入图像和目标类。我们在白盒设置的不同数据集中应用AI-GaN,黑匣子设置和由最先进的防御保护的目标模型。通过广泛的实验,AI-GaN实现了高攻击成功率,优于现有方法,并显着降低了生成时间。此外,首次,AI-GaN成功地缩放到复杂的数据集。 Cifar-100和Imagenet,所有课程中的成功率约为90美元。
translated by 谷歌翻译
Video classification systems are vulnerable to adversarial attacks, which can create severe security problems in video verification. Current black-box attacks need a large number of queries to succeed, resulting in high computational overhead in the process of attack. On the other hand, attacks with restricted perturbations are ineffective against defenses such as denoising or adversarial training. In this paper, we focus on unrestricted perturbations and propose StyleFool, a black-box video adversarial attack via style transfer to fool the video classification system. StyleFool first utilizes color theme proximity to select the best style image, which helps avoid unnatural details in the stylized videos. Meanwhile, the target class confidence is additionally considered in targeted attacks to influence the output distribution of the classifier by moving the stylized video closer to or even across the decision boundary. A gradient-free method is then employed to further optimize the adversarial perturbations. We carry out extensive experiments to evaluate StyleFool on two standard datasets, UCF-101 and HMDB-51. The experimental results demonstrate that StyleFool outperforms the state-of-the-art adversarial attacks in terms of both the number of queries and the robustness against existing defenses. Moreover, 50% of the stylized videos in untargeted attacks do not need any query since they can already fool the video classification model. Furthermore, we evaluate the indistinguishability through a user study to show that the adversarial samples of StyleFool look imperceptible to human eyes, despite unrestricted perturbations.
translated by 谷歌翻译
积极调查深度神经网络的对抗鲁棒性。然而,大多数现有的防御方法限于特定类型的对抗扰动。具体而言,它们通常不能同时为多次攻击类型提供抵抗力,即,它们缺乏多扰动鲁棒性。此外,与图像识别问题相比,视频识别模型的对抗鲁棒性相对未开发。虽然有几项研究提出了如何产生对抗性视频,但在文献中只发表了关于防御策略的少数关于防御策略的方法。在本文中,我们提出了用于视频识别的多种抗逆视频的第一战略之一。所提出的方法称为Multibn,使用具有基于学习的BN选择模块的多个独立批量归一化(BN)层对多个对冲视频类型进行对抗性训练。利用多个BN结构,每个BN Brach负责学习单个扰动类型的分布,从而提供更精确的分布估计。这种机制有利于处理多种扰动类型。 BN选择模块检测输入视频的攻击类型,并将其发送到相应的BN分支,使MultiBN全自动并允许端接训练。与目前的对抗训练方法相比,所提出的Multibn对不同甚至不可预见的对抗性视频类型具有更强的多扰动稳健性,从LP界攻击和物理上可实现的攻击范围。在不同的数据集和目标模型上保持真实。此外,我们进行了广泛的分析,以研究多BN结构的性质。
translated by 谷歌翻译
To assess the vulnerability of deep learning in the physical world, recent works introduce adversarial patches and apply them on different tasks. In this paper, we propose another kind of adversarial patch: the Meaningful Adversarial Sticker, a physically feasible and stealthy attack method by using real stickers existing in our life. Unlike the previous adversarial patches by designing perturbations, our method manipulates the sticker's pasting position and rotation angle on the objects to perform physical attacks. Because the position and rotation angle are less affected by the printing loss and color distortion, adversarial stickers can keep good attacking performance in the physical world. Besides, to make adversarial stickers more practical in real scenes, we conduct attacks in the black-box setting with the limited information rather than the white-box setting with all the details of threat models. To effectively solve for the sticker's parameters, we design the Region based Heuristic Differential Evolution Algorithm, which utilizes the new-found regional aggregation of effective solutions and the adaptive adjustment strategy of the evaluation criteria. Our method is comprehensively verified in the face recognition and then extended to the image retrieval and traffic sign recognition. Extensive experiments show the proposed method is effective and efficient in complex physical conditions and has a good generalization for different tasks.
translated by 谷歌翻译