Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
translated by 谷歌翻译
最近,Robustbench(Croce等人2020)已成为图像分类网络的对抗鲁棒性的广泛认可的基准。在其最常见的子任务中,Robustbench评估并在Autactack(CRoce和Hein 2020b)下的Cifar10上的培训神经网络的对抗性鲁棒性与L-Inf Perturnations限制在EPS = 8/255中。对于目前最佳表演模型的主要成绩约为60%的基线,这是为了表征这项基准是非常具有挑战性的。尽管最近的文献普遍接受,我们的目标是促进讨论抢劫案作为鲁棒性的关键指标的讨论,这可能是广泛化的实际应用。我们的论证与这篇文章有两倍,并通过本文提出过多的实验支持:我们认为i)通过ICATACK与L-INF的数据交替,EPS = 8/255是不切实际的强烈的,导致完美近似甚至通过简单的检测算法和人类观察者的对抗性样本的检测速率。我们还表明,其他攻击方法更难检测,同时实现类似的成功率。 ii)在CIFAR10这样的低分辨率数据集上导致低分辨率数据集不概括到更高的分辨率图像,因为基于梯度的攻击似乎与越来越多的分辨率变得更加可检测。
translated by 谷歌翻译
最近,对AutoAtack(Croce和Hein,2020B)框架对图像分类网络的对抗攻击已经引起了很多关注。虽然AutoAtactack显示了非常高的攻击成功率,但大多数防御方法都专注于网络硬化和鲁棒性增强,如对抗性培训。这样,目前最佳报告的方法可以承受约66%的CIFAR10对抗的例子。在本文中,我们研究了自动攻击的空间和频域属性,并提出了替代防御。在推理期间,我们检测到对抗性攻击而不是硬化网络,而不是硬化网络,而不是硬化网络。基于频域中的相当简单和快速的分析,我们介绍了两种不同的检测算法。首先,黑匣子检测器只在输入图像上运行,在两种情况下,在AutoAtack Cifar10基准测试中获得100%的检测精度,并且在ImageNet上为99.3%。其次,使用CNN特征图的分析的白箱检测器,在相同的基准上的检出率也为100%和98.7%。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
随着在图像识别中的快速进步和深度学习模型的使用,安全成为他们在安全关键系统中部署的主要关注点。由于深度学习模型的准确性和稳健性主要归因于训练样本的纯度,因此深度学习架构通常易于对抗性攻击。通过对正常图像进行微妙的扰动来获得对抗性攻击,这主要是人类,但可以严重混淆最先进的机器学习模型。什么特别的智能扰动或噪声在正常图像上添加了它导致深神经网络的灾难性分类?使用统计假设检测,我们发现条件变形自身偏析器(CVAE)令人惊讶地擅长检测难以察觉的图像扰动。在本文中,我们展示了CVAE如何有效地用于检测对图像分类网络的对抗攻击。我们展示了我们的成果,Cifar-10数据集,并展示了我们的方法如何为先前的方法提供可比性,以检测对手,同时不会与嘈杂的图像混淆,其中大多数现有方法都摇摇欲坠。
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
随着深度神经网络(DNNS)的进步在许多关键应用中表现出前所未有的性能水平,它们的攻击脆弱性仍然是一个悬而未决的问题。我们考虑在测试时间进行逃避攻击,以防止在受约束的环境中进行深入学习,其中需要满足特征之间的依赖性。这些情况可能自然出现在表格数据中,也可能是特定应用程序域中功能工程的结果,例如网络安全中的威胁检测。我们提出了一个普通的基于迭代梯度的框架,称为围栏,用于制定逃避攻击,考虑到约束域和应用要求的细节。我们将其应用于针对两个网络安全应用培训的前馈神经网络:网络流量僵尸网络分类和恶意域分类,以生成可行的对抗性示例。我们广泛评估了攻击的成功率和绩效,比较它们对几个基线的改进,并分析影响攻击成功率的因素,包括优化目标和数据失衡。我们表明,通过最少的努力(例如,生成12个其他网络连接),攻击者可以将模型的预测从恶意类更改为良性并逃避分类器。我们表明,在具有更高失衡的数据集上训练的模型更容易受到我们的围栏攻击。最后,我们证明了在受限领域进行对抗训练的潜力,以提高针对这些逃避攻击的模型弹性。
translated by 谷歌翻译
Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low-and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
基于深度神经网络的医学图像系统容易受到对抗的例子。在文献中提出了许多防御机制,然而,现有的防御者假设被动攻击者对防御系统知之甚少,并没有根据防御改变攻击战略。最近的作品表明,一个强大的自适应攻击,攻击者被认为具有完全了解防御系统的知识,可以轻松绕过现有的防御。在本文中,我们提出了一种名为Medical Aegis的新型对抗性示例防御系统。据我们所知,医疗AEGIS是文献中的第一次防范,成功地解决了对医学图像的强烈适应性的对抗性示例攻击。医疗AEGIS拥有两层保护剂:第一层垫通过去除其高频分量而削弱了攻击的对抗性操纵能力,但对原始图像的分类性能构成了最小的影响;第二层盾牌学习一组每级DNN模型来预测受保护模型的登录。偏离屏蔽的预测表明对抗性示例。盾牌受到在我们的压力测试中的观察中的观察,即在DNN模型的浅层中存在坚固的小径,自适应攻击难以破坏。实验结果表明,建议的防御精确地检测了自适应攻击,模型推理的开销具有可忽略的开销。
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
深度神经网络针对对抗性例子的脆弱性已成为将这些模型部署在敏感领域中的重要问题。事实证明,针对这种攻击的明确防御是具有挑战性的,依赖于检测对抗样本的方法只有在攻击者忽略检测机制时才有效。在本文中,我们提出了一种原则性的对抗示例检测方法,该方法可以承受规范受限的白色框攻击。受K类分类问题的启发,我们训练K二进制分类器,其中I-th二进制分类器用于区分I类的清洁数据和其他类的对抗性样本。在测试时,我们首先使用训练有素的分类器获取输入的预测标签(例如k),然后使用k-th二进制分类器来确定输入是否为干净的样本(k类)或对抗的扰动示例(其他类)。我们进一步设计了一种生成方法来通过将每个二进制分类器解释为类别条件数据的无标准密度模型来检测/分类对抗示例。我们提供上述对抗性示例检测/分类方法的全面评估,并证明其竞争性能和引人注目的特性。
translated by 谷歌翻译
Adversarial attacks pose safety and security concerns to deep learning applications, but their characteristics are under-explored. Yet largely imperceptible, a strong trace could have been left by PGD-like attacks in an adversarial example. Recall that PGD-like attacks trigger the ``local linearity'' of a network, which implies different extents of linearity for benign or adversarial examples. Inspired by this, we construct an Adversarial Response Characteristics (ARC) feature to reflect the model's gradient consistency around the input to indicate the extent of linearity. Under certain conditions, it qualitatively shows a gradually varying pattern from benign example to adversarial example, as the latter leads to Sequel Attack Effect (SAE). To quantitatively evaluate the effectiveness of ARC, we conduct experiments on CIFAR-10 and ImageNet for attack detection and attack type recognition in a challenging setting. The results suggest that SAE is an effective and unique trace of PGD-like attacks reflected through the ARC feature. The ARC feature is intuitive, light-weighted, non-intrusive, and data-undemanding.
translated by 谷歌翻译
虽然对抗性攻击检测得到了相当大的关注,但它仍然是两个观点的基本上具有挑战性的问题。首先,虽然威胁模型可以明确定义,但攻击者策略可能在这些限制范围内仍然很大。因此,检测应被视为开放式问题,与大多数电流检测方法相比,站立相反。这些方法采用封闭式视图和火车二进制探测器,从而偏置检测探测器训练期间看到的攻击。其次,有限的信息可在测试时间上获得,并且通常通过滋扰因子混淆,包括标签和图像的底层内容。我们通过基于随机子空间分析的新策略来解决这些挑战。我们提出了一种利用随机投影的性质的技术,以表征在各种子空间中的清洁和对抗性示例的行为。模型激活的自我一致性(或不一致)被利用从对抗例中辨别清洁。性能评估表明,我们的技术($ AUC \在[0.92,0.98] $)优于竞争检测策略($ AUC \在[0.30,0.79]中),同时仍然真正无法对攻击战略(针对目标/未确定的攻击) )。它还需要显着更少的校准数据(仅由干净的例子组成)而不是实现这种性能的竞争方法。
translated by 谷歌翻译
评估防御模型的稳健性是对抗对抗鲁棒性研究的具有挑战性的任务。僵化的渐变,先前已经发现了一种梯度掩蔽,以许多防御方法存在并导致鲁棒性的错误信号。在本文中,我们确定了一种更细微的情况,称为不平衡梯度,也可能导致过高的对抗性鲁棒性。当边缘损耗的一个术语的梯度主导并将攻击朝向次优化方向推动时,发生不平衡梯度的现象。为了利用不平衡的梯度,我们制定了分解利润率损失的边缘分解(MD)攻击,并通过两阶段过程分别探讨了这些术语的攻击性。我们还提出了一个Multared和Ensemble版本的MD攻击。通过调查自2018年以来提出的17个防御模型,我们发现6种型号易受不平衡梯度的影响,我们的MD攻击可以减少由最佳基线独立攻击评估的鲁棒性另外2%。我们还提供了对不平衡梯度的可能原因和有效对策的深入分析。
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
深度学习中的关键挑战之一是检测对抗例的有效策略的定义。为此,我们提出了一种名为Ensemble对抗探测器(EAD)的新型方法,用于识别对抗性示例,在标准的多字节分类场景中。 EAD结合了多个检测器,该检测器利用了预先训练的深神经网络(DNN)内部表示中的输入实例的不同属性。具体而言,EAD基于Mahalanobis距离和局部内在的维度(盖子)与基于单级支持向量机(OSVM)的新引进的方法集成了最先进的探测器。尽管所有构成方法都假定测试实例从一组正确分类的训练实例的距离越大,但概率越高,其是对手示例的概率越高,它们在计算距离的方式中不同。为了利用不同方法的有效性在捕获数据分布的不同特性,因此,有效地解决泛化和过度装备之间的权衡,EAD采用探测器特定的距离分数作为逻辑回归分类器的特征,独立的超公数后优化。我们在不同的数据集(CIFAR-10,CiFar-100和SVHN)和模型(Reset和Densenet)上评估了EAD方法,以及通过与竞争方法进行比较,关于四个对抗性攻击(FGSM,BIM,DeepFool和CW)。总的来说,我们表明EAD达到了最大的Auroc和Aupr在大多数设置和其他方面的表现。对现有技术的改进以及容易延伸EAD以包括任何任意探测器的可能性,铺平了在普遍示例性检测的广场上广泛采用的集合方法。
translated by 谷歌翻译