通过输入修改(即,对抗示例)最近引起了很多关注的神经网络的攻击问题。相对容易生成且难以检测,这些攻击构成了许多建议的防御试图减轻的安全漏洞。但是,对攻击和防御效果的评估通常依赖于传统分类指标,而不适合对抗对抗情景。这些指标中的大多数是基于精度的,因此可能具有有限的范围和低独特的功率。其他度量不考虑神经网络功能的独特特征,或间接地测量攻击的攻击效果(例如,通过他们的发电的复杂性)。在本文中,我们提出了两个专门旨在测量攻击的效果或防御恢复效果,在多字符分类任务中的神经网络输出中的攻击或恢复效果。灵感来自归一化折扣累积增益和信息检索文献中使用的互惠秩指标,我们将神经网络预测视为排名结果的结果。使用关于等级概率的其他信息使我们能够定义适合手头任务的新型度量。我们使用普拉vgg19模型和想象成数据集的各种攻击和防御评估我们的指标。与普通分类指标相比,我们拟议的指标表现出优越的信息性和独特性。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.
translated by 谷歌翻译
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
translated by 谷歌翻译
最近的自然语言处理(NLP)技术在基准数据集中实现了高性能,主要原因是由于深度学习性能的显着改善。研究界的进步导致了最先进的NLP任务的生产系统的巨大增强,例如虚拟助理,语音识别和情感分析。然而,随着对抗性攻击测试时,这种NLP系统仍然仍然失败。初始缺乏稳健性暴露于当前模型的语言理解能力中的令人不安的差距,当NLP系统部署在现实生活中时,会产生问题。在本文中,我们通过以各种维度的系统方式概述文献来展示了NLP稳健性研究的结构化概述。然后,我们深入了解稳健性的各种维度,跨技术,指标,嵌入和基准。最后,我们认为,鲁棒性应该是多维的,提供对当前研究的见解,确定文学中的差距,以建议值得追求这些差距的方向。
translated by 谷歌翻译
Recent years have seen a proliferation of research on adversarial machine learning. Numerous papers demonstrate powerful algorithmic attacks against a wide variety of machine learning (ML) models, and numerous other papers propose defenses that can withstand most attacks. However, abundant real-world evidence suggests that actual attackers use simple tactics to subvert ML-driven systems, and as a result security practitioners have not prioritized adversarial ML defenses. Motivated by the apparent gap between researchers and practitioners, this position paper aims to bridge the two domains. We first present three real-world case studies from which we can glean practical insights unknown or neglected in research. Next we analyze all adversarial ML papers recently published in top security conferences, highlighting positive trends and blind spots. Finally, we state positions on precise and cost-driven threat modeling, collaboration between industry and academia, and reproducible research. We believe that our positions, if adopted, will increase the real-world impact of future endeavours in adversarial ML, bringing both researchers and practitioners closer to their shared goal of improving the security of ML systems.
translated by 谷歌翻译
Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective. We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that Mag-Net is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples. CCS CONCEPTS• Security and privacy → Domain-specific security and privacy architectures; • Computing methodologies → Neural networks;
translated by 谷歌翻译
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimizationbased attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining noncertified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.
translated by 谷歌翻译
虽然深入学习模型取得了前所未有的成功,但他们对逆势袭击的脆弱性引起了越来越关注,特别是在部署安全关键域名时。为了解决挑战,已经提出了鲁棒性改善的许多辩护策略,包括反应性和积极主动。从图像特征空间的角度来看,由于特征的偏移,其中一些人无法达到满足结果。此外,模型学习的功能与分类结果无直接相关。与他们不同,我们考虑基本上从模型内部进行防御方法,并在攻击前后调查神经元行为。我们观察到,通过大大改变为正确标签的神经元大大改变神经元来误导模型。受其激励,我们介绍了神经元影响的概念,进一步将神经元分为前,中间和尾部。基于它,我们提出神经元水平逆扰动(NIP),第一神经元水平反应防御方法对抗对抗攻击。通过强化前神经元并削弱尾部中的弱化,辊隙可以消除几乎所有的对抗扰动,同时仍然保持高良好的精度。此外,它可以通过适应性,尤其是更大的扰动来应对不同的扰动。在三个数据集和六种模型上进行的综合实验表明,NIP优于最先进的基线对抗11个对抗性攻击。我们进一步通过神经元激活和可视化提供可解释的证据,以便更好地理解。
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
机器学习与服务(MLAAS)已成为广泛的范式,即使是通过例如,也是客户可用的最复杂的机器学习模型。一个按要求的原则。这使用户避免了数据收集,超参数调整和模型培训的耗时过程。但是,通过让客户访问(预测)模型,MLAAS提供商危害其知识产权,例如敏感培训数据,优化的超参数或学到的模型参数。对手可以仅使用预测标签创建模型的副本,并以(几乎)相同的行为。尽管已经描述了这种攻击的许多变体,但仅提出了零星的防御策略,以解决孤立的威胁。这增加了对模型窃取领域进行彻底系统化的必要性,以全面了解这些攻击是成功的原因,以及如何全面地捍卫它们。我们通过对模型窃取攻击,评估其性能以及探索不同设置中相应的防御技术来解决这一问题。我们为攻击和防御方法提出了分类法,并提供有关如何根据目标和可用资源选择正确的攻击或防御策略的准则。最后,我们分析了当前攻击策略使哪些防御能力降低。
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
在对抗机器学习中,防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下,法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充,并为缓解措施提供前进的途径,以防止将来采取类似的攻击。在本文中,我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案,该解决方案修剪了“无辜”训练样本,直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响,然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签(后门)毒物攻击和三种类型的清洁标签毒药攻击的功效,这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4%的精度和96.8%的召回。我们还表明,我们的系统与专门攻击它的四种抗纤维法措施相对强大。
translated by 谷歌翻译
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate adversarial risk as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as obscurity to an adversary, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
translated by 谷歌翻译
深度神经网络众所周知,很容易受到对抗性攻击和后门攻击的影响,在该攻击中,对输入的微小修改能够误导模型以给出错误的结果。尽管已经广泛研究了针对对抗性攻击的防御措施,但有关减轻后门攻击的调查仍处于早期阶段。尚不清楚防御这两次攻击之间是否存在任何连接和共同特征。我们对对抗性示例与深神网络的后门示例之间的联系进行了全面的研究,以寻求回答以下问题:我们可以使用对抗检测方法检测后门。我们的见解是基于这样的观察结果,即在推理过程中,对抗性示例和后门示例都有异常,与良性​​样本高度区分。结果,我们修改了四种现有的对抗防御方法来检测后门示例。广泛的评估表明,这些方法可靠地防止后门攻击,其准确性比检测对抗性实例更高。这些解决方案还揭示了模型灵敏度,激活空间和特征空间中对抗性示例,后门示例和正常样本的关系。这能够增强我们对这两次攻击和防御机会的固有特征的理解。
translated by 谷歌翻译
已知DNN容易受到所谓的对抗攻击的攻击,这些攻击操纵输入以引起不正确的结果,这可能对攻击者有益或对受害者造成损害。最近的作品提出了近似计算,作为针对机器学习攻击的防御机制。我们表明,这些方法虽然成功地用于一系列投入,但不足以解决更强大,高信任的对抗性攻击。为了解决这个问题,我们提出了DNNShield,这是一种硬件加速防御,可使响应的强度适应对抗性输入的信心。我们的方法依赖于DNN模型的动态和随机稀疏来有效地实现推理近似值,并通过对近似误差进行细粒度控制。与检测对抗输入相比,DNNShield使用稀疏推理的输出分布特征。当应用于RESNET50时,我们显示出86%的对抗检测率为86%,这超过了最先进的接近状态的检测率,开销较低。我们演示了软件/硬件加速的FPGA原型,该原型降低了DNNShield相对于仅软件CPU和GPU实现的性能影响。
translated by 谷歌翻译