尽管取得了巨大的成功,但深入的学习严重遭受鲁棒性;也就是说,深度神经网络非常容易受到对抗的攻击,即使是最简单的攻击。灵感来自脑科学最近的进步,我们提出了一种新的内部模型(DIM),这是一种基于新的生成自动化器的模型来解决这一挑战。模拟人类大脑中的管道进行视觉信号处理,暗淡采用两级方法。在第一阶段,DIM使用丹组器来减少输入的噪声和尺寸,反映了塔马拉姆的信息预处理。从主视觉皮质中的内存相关迹线的稀疏编码启发,第二阶段产生一组内部模型,一个用于每个类别。我们评估了42次对抗攻击的衰弱,表明Dim有效地防御所有攻击,并且优于整体鲁棒性的SOTA。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
Deep neural networks have empowered accurate device-free human activity recognition, which has wide applications. Deep models can extract robust features from various sensors and generalize well even in challenging situations such as data-insufficient cases. However, these systems could be vulnerable to input perturbations, i.e. adversarial attacks. We empirically demonstrate that both black-box Gaussian attacks and modern adversarial white-box attacks can render their accuracies to plummet. In this paper, we firstly point out that such phenomenon can bring severe safety hazards to device-free sensing systems, and then propose a novel learning framework, SecureSense, to defend common attacks. SecureSense aims to achieve consistent predictions regardless of whether there exists an attack on its input or not, alleviating the negative effect of distribution perturbation caused by adversarial attacks. Extensive experiments demonstrate that our proposed method can significantly enhance the model robustness of existing deep models, overcoming possible attacks. The results validate that our method works well on wireless human activity recognition and person identification systems. To the best of our knowledge, this is the first work to investigate adversarial attacks and further develop a novel defense framework for wireless human activity recognition in mobile computing research.
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
深神经网络(DNN)对不可感知的恶意扰动高度敏感,称为对抗性攻击。在实际成像和视觉应用中发现了这种脆弱性之后,相关的安全问题引起了广泛的研究关注,并且已经开发出许多防御技术。这些防御方法中的大多数都依赖于对抗性训练(AT) - 根据特定威胁模型对图像的分类网络进行训练,该模型定义了允许修改的幅度。尽管在带来有希望的结果的情况下,对特定威胁模型的培训未能推广到其他类型的扰动。一种不同的方法利用预处理步骤从受攻击的图像中删除对抗性扰动。在这项工作中,我们遵循后一条路径,并旨在开发一种技术,从而导致在威胁模型各种实现中的强大分类器。为此,我们利用了随机生成建模的最新进展,并将其利用它们用于从条件分布中进行采样。我们的辩护依赖于在受攻击的图像中添加高斯i.i.d噪声,然后进行了预验证的扩散过程 - 一种在脱氧网络上执行随机迭代过程的体系结构,从而产生了高感知质量质量的结果。通过在CIFAR-10数据集上进行的广泛实验,通过此随机预处理步骤获得的鲁棒性得到了验证,这表明我们的方法在各种威胁模型下都优于领先的防御方法。
translated by 谷歌翻译
随着图像识别中深度学习模型的快速发展和使用的增加,安全成为其在安全至关重要系统中的部署的主要关注点。由于深度学习模型的准确性和鲁棒性主要归因于训练样本的纯度,因此,深度学习体系结构通常容易受到对抗性攻击的影响。对抗性攻击通常是通过对正常图像的微妙扰动而获得的,正常图像对人类最不可感知,但可能会严重混淆最新的机器学习模型。我们提出了一个名为Apudae的框架,利用DeNoing AutoCoders(DAES)通过以自适应方式使用这些样品来纯化这些样本,从而提高了已攻击目标分类器网络的分类准确性。我们还展示了如何自适应地使用DAE,而不是直接使用它们,而是进一步提高分类精度,并且更强大,可以设计自适应攻击以欺骗它们。我们在MNIST,CIFAR-10,Imagenet数据集上展示了我们的结果,并展示了我们的框架(Apudae)如何在净化对手方面提供可比性和在大多数情况下的基线方法。我们还设计了专门设计的自适应攻击,以攻击我们的净化模型,并展示我们的防御方式如何强大。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://github.com/kabkabm/defensegan.
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
强有力的对手例子是评估和增强深神经网络鲁棒性的关键。流行的对抗性攻击算法使用梯度上升最大化非cave损失函数。但是,每种攻击的性能通常对由于信息不足(仅一个输入示例,几乎没有白色盒子源模型和未知的防御策略)而敏感。因此,精心设计的对抗性示例容易过度拟合源模型,从而将其转移性限制在身份不明的架构上。在本文中,我们提出了多种渐近正态分布攻击(Multianda),这是一种新颖的方法,可以明确表征来自学习分布的对抗性扰动。具体而言,我们通过利用随机梯度上升(SGA)的渐近正态性能(SGA)的优势来近似于扰动,然后将整体策略应用于此过程,以估算高斯混合模型,以更好地探索潜在的优化空间。从学习分布中绘制扰动使我们能够为每个输入生成任何数量的对抗示例。近似后验实质上描述了SGA迭代的固定分布,该分布捕获了局部最佳距离周围的几何信息。因此,从分布中得出的样品可靠地保持转移性。我们提出的方法通过对七个正常训练和七个防御模型进行广泛的实验,超过了对具有或没有防御的深度学习模型的九个最先进的黑盒攻击。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
大多数对抗攻击防御方法依赖于混淆渐变。这些方法在捍卫基于梯度的攻击方面是成功的;然而,它们容易被攻击绕过,该攻击不使用梯度或近似近似和使用校正梯度的攻击。不存在不存在诸如对抗培训等梯度的防御,但这些方法通常对诸如其幅度的攻击进行假设。我们提出了一种分类模型,该模型不会混淆梯度,并且通过施工而强大而不承担任何关于攻击的知识。我们的方法将分类作为优化问题,我们“反转”在不受干扰的自然图像上培训的条件发电机,以找到生成最接近查询图像的类。我们假设潜在的脆性抗逆性攻击源是前馈分类器的高度低维性质,其允许对手发现输入空间中的小扰动,从而导致输出空间的大变化。另一方面,生成模型通常是低到高维的映射。虽然该方法与防御GaN相关,但在我们的模型中使用条件生成模型和反演而不是前馈分类是临界差异。与Defense-GaN不同,它被证明生成了容易规避的混淆渐变,我们表明我们的方法不会混淆梯度。我们展示了我们的模型对黑箱攻击的极其强劲,并与自然训练的前馈分类器相比,对白盒攻击的鲁棒性提高。
translated by 谷歌翻译
Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 -it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ∼10%. Code is available at https://github.com/facebookresearch/ ImageNet-Adversarial-Training.
translated by 谷歌翻译
深度神经网络(DNNS)最近在许多分类任务中取得了巨大的成功。不幸的是,它们容易受到对抗性攻击的影响,这些攻击会产生对抗性示例,这些示例具有很小的扰动,以欺骗DNN模型,尤其是在模型共享方案中。事实证明,对抗性训练是最有效的策略,它将对抗性示例注入模型训练中,以提高DNN模型的稳健性,以对对抗性攻击。但是,基于现有的对抗性示例的对抗训练无法很好地推广到标准,不受干扰的测试数据。为了在标准准确性和对抗性鲁棒性之间取得更好的权衡,我们提出了一个新型的对抗训练框架,称为潜在边界引导的对抗训练(梯子),该训练(梯子)在潜在的边界引导的对抗性示例上对对手进行对手训练DNN模型。与大多数在输入空间中生成对抗示例的现有方法相反,梯子通过增加对潜在特征的扰动而产生了无数的高质量对抗示例。扰动是沿SVM构建的具有注意机制的决策边界的正常情况进行的。我们从边界场的角度和可视化视图分析了生成的边界引导的对抗示例的优点。与Vanilla DNN和竞争性底线相比,对MNIST,SVHN,CELEBA和CIFAR-10的广泛实验和详细分析验证了梯子在标准准确性和对抗性鲁棒性之间取得更好的权衡方面的有效性。
translated by 谷歌翻译
愚弄深度神经网络(DNN)与黑匣子优化已成为一种流行的对抗攻击方式,因为DNN的结构先验知识始终是未知的。尽管如此,最近的黑匣子对抗性攻击可能会努力平衡其在解决高分辨率图像中产生的对抗性示例(AES)的攻击能力和视觉质量。在本文中,我们基于大规模的多目标进化优化,提出了一种关注引导的黑盒逆势攻击,称为LMOA。通过考虑图像的空间语义信息,我们首先利用注意图来确定扰动像素。而不是攻击整个图像,减少了具有注意机制的扰动像素可以有助于避免维度的臭名臭氧,从而提高攻击性能。其次,采用大规模的多目标进化算法在突出区域中遍历降低的像素。从其特征中受益,所产生的AES有可能在人类视力不可知的同时愚弄目标DNN。广泛的实验结果已经验证了所提出的LMOA在ImageNet数据集中的有效性。更重要的是,与现有的黑匣子对抗性攻击相比,产生具有更好的视觉质量的高分辨率AE更具竞争力。
translated by 谷歌翻译
Spiking neural networks (SNNs) attract great attention due to their low power consumption, low latency, and biological plausibility. As they are widely deployed in neuromorphic devices for low-power brain-inspired computing, security issues become increasingly important. However, compared to deep neural networks (DNNs), SNNs currently lack specifically designed defense methods against adversarial attacks. Inspired by neural membrane potential oscillation, we propose a novel neural model that incorporates the bio-inspired oscillation mechanism to enhance the security of SNNs. Our experiments show that SNNs with neural oscillation neurons have better resistance to adversarial attacks than ordinary SNNs with LIF neurons on kinds of architectures and datasets. Furthermore, we propose a defense method that changes model's gradients by replacing the form of oscillation, which hides the original training gradients and confuses the attacker into using gradients of 'fake' neurons to generate invalid adversarial samples. Our experiments suggest that the proposed defense method can effectively resist both single-step and iterative attacks with comparable defense effectiveness and much less computational costs than adversarial training methods on DNNs. To the best of our knowledge, this is the first work that establishes adversarial defense through masking surrogate gradients on SNNs.
translated by 谷歌翻译
Neural networks are vulnerable to adversarial examples, which poses a threat to their application in security sensitive systems. We propose high-level representation guided denoiser (HGD) as a defense for image classification. Standard denoiser suffers from the error amplification effect, in which small residual adversarial noise is progressively amplified and leads to wrong classifications. HGD overcomes this problem by using a loss function defined as the difference between the target model's outputs activated by the clean image and denoised image. Compared with ensemble adversarial training which is the state-of-the-art defending method on large images, HGD has three advantages. First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes. Third, HGD can be transferred to defend models other than the one guiding it. In NIPS competition on defense against adversarial attacks, our HGD solution won the first place and outperformed other models by a large margin. 1 * Equal contribution.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
与此同时,黑匣子对抗攻击已经吸引了令人印象深刻的注意,在深度学习安全领域的实际应用,同时,由于无法访问目标模型的网络架构或内部权重,非常具有挑战性。基于假设:如果一个例子对多种型号保持过逆势,那么它更有可能将攻击能力转移到其他模型,基于集合的对抗攻击方法是高效的,用于黑匣子攻击。然而,集合攻击的方式相当不那么调查,并且现有的集合攻击只是均匀地融合所有型号的输出。在这项工作中,我们将迭代集合攻击视为随机梯度下降优化过程,其中不同模型上梯度的变化可能导致众多局部Optima差。为此,我们提出了一种新的攻击方法,称为随机方差减少了整体(SVRE)攻击,这可以降低集合模型的梯度方差,并充分利用集合攻击。标准想象数据集的经验结果表明,所提出的方法可以提高对抗性可转移性,并且优于现有的集合攻击显着。
translated by 谷歌翻译