许多深度学习方法已成功地解决了计算机视觉和语音识别应用中的复杂任务。但是,已经发现这些模型的鲁棒性很容易受到扰动的输入或对抗性示例的攻击,这些示例是人眼无法察觉的,但导致该模型做出错误的输出决策。在这项研究中,我们适应并介绍了两个几何指标,密度和覆盖范围,并评估它们在未见数据批次中检测对抗样本中的使用。我们使用MNIST和两个来自MedMnist的现实世界生物医学数据集的经验研究这些指标,并受到了两种不同的对抗攻击。我们的实验显示了两个指标检测对抗示例的有希望的结果。我们认为,他的工作可以为这些指标在部署的机器学习系统中的使用而进一步研究,以监视对抗性示例或相关病理(例如数据集移动)可能攻击的攻击。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
深度神经网络众所周知,很容易受到对抗性攻击和后门攻击的影响,在该攻击中,对输入的微小修改能够误导模型以给出错误的结果。尽管已经广泛研究了针对对抗性攻击的防御措施,但有关减轻后门攻击的调查仍处于早期阶段。尚不清楚防御这两次攻击之间是否存在任何连接和共同特征。我们对对抗性示例与深神网络的后门示例之间的联系进行了全面的研究,以寻求回答以下问题:我们可以使用对抗检测方法检测后门。我们的见解是基于这样的观察结果,即在推理过程中,对抗性示例和后门示例都有异常,与良性​​样本高度区分。结果,我们修改了四种现有的对抗防御方法来检测后门示例。广泛的评估表明,这些方法可靠地防止后门攻击,其准确性比检测对抗性实例更高。这些解决方案还揭示了模型灵敏度,激活空间和特征空间中对抗性示例,后门示例和正常样本的关系。这能够增强我们对这两次攻击和防御机会的固有特征的理解。
translated by 谷歌翻译
随着在图像识别中的快速进步和深度学习模型的使用,安全成为他们在安全关键系统中部署的主要关注点。由于深度学习模型的准确性和稳健性主要归因于训练样本的纯度,因此深度学习架构通常易于对抗性攻击。通过对正常图像进行微妙的扰动来获得对抗性攻击,这主要是人类,但可以严重混淆最先进的机器学习模型。什么特别的智能扰动或噪声在正常图像上添加了它导致深神经网络的灾难性分类?使用统计假设检测,我们发现条件变形自身偏析器(CVAE)令人惊讶地擅长检测难以察觉的图像扰动。在本文中,我们展示了CVAE如何有效地用于检测对图像分类网络的对抗攻击。我们展示了我们的成果,Cifar-10数据集,并展示了我们的方法如何为先前的方法提供可比性,以检测对手,同时不会与嘈杂的图像混淆,其中大多数现有方法都摇摇欲坠。
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
机器学习算法和深度神经网络在几种感知和控制任务中的卓越性能正在推动该行业在安全关键应用中采用这种技术,作为自治机器人和自动驾驶车辆。然而,目前,需要解决几个问题,以使深入学习方法更可靠,可预测,安全,防止对抗性攻击。虽然已经提出了几种方法来提高深度神经网络的可信度,但大多数都是针对特定类的对抗示例量身定制的,因此未能检测到其他角落案件或不安全的输入,这些输入大量偏离训练样本。本文介绍了基于覆盖范式的轻量级监控架构,以增强针对不同不安全输入的模型鲁棒性。特别是,在用于评估多种检测逻辑的架构中提出并测试了四种覆盖分析方法。实验结果表明,该方法有效地检测强大的对抗性示例和分销外输入,引入有限的执行时间和内存要求。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
大多数对抗攻击防御方法依赖于混淆渐变。这些方法在捍卫基于梯度的攻击方面是成功的;然而,它们容易被攻击绕过,该攻击不使用梯度或近似近似和使用校正梯度的攻击。不存在不存在诸如对抗培训等梯度的防御,但这些方法通常对诸如其幅度的攻击进行假设。我们提出了一种分类模型,该模型不会混淆梯度,并且通过施工而强大而不承担任何关于攻击的知识。我们的方法将分类作为优化问题,我们“反转”在不受干扰的自然图像上培训的条件发电机,以找到生成最接近查询图像的类。我们假设潜在的脆性抗逆性攻击源是前馈分类器的高度低维性质,其允许对手发现输入空间中的小扰动,从而导致输出空间的大变化。另一方面,生成模型通常是低到高维的映射。虽然该方法与防御GaN相关,但在我们的模型中使用条件生成模型和反演而不是前馈分类是临界差异。与Defense-GaN不同,它被证明生成了容易规避的混淆渐变,我们表明我们的方法不会混淆梯度。我们展示了我们的模型对黑箱攻击的极其强劲,并与自然训练的前馈分类器相比,对白盒攻击的鲁棒性提高。
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://github.com/kabkabm/defensegan.
translated by 谷歌翻译
机器学习与服务(MLAAS)已成为广泛的范式,即使是通过例如,也是客户可用的最复杂的机器学习模型。一个按要求的原则。这使用户避免了数据收集,超参数调整和模型培训的耗时过程。但是,通过让客户访问(预测)模型,MLAAS提供商危害其知识产权,例如敏感培训数据,优化的超参数或学到的模型参数。对手可以仅使用预测标签创建模型的副本,并以(几乎)相同的行为。尽管已经描述了这种攻击的许多变体,但仅提出了零星的防御策略,以解决孤立的威胁。这增加了对模型窃取领域进行彻底系统化的必要性,以全面了解这些攻击是成功的原因,以及如何全面地捍卫它们。我们通过对模型窃取攻击,评估其性能以及探索不同设置中相应的防御技术来解决这一问题。我们为攻击和防御方法提出了分类法,并提供有关如何根据目标和可用资源选择正确的攻击或防御策略的准则。最后,我们分析了当前攻击策略使哪些防御能力降低。
translated by 谷歌翻译
深度神经网络(DNNS)最近在许多分类任务中取得了巨大的成功。不幸的是,它们容易受到对抗性攻击的影响,这些攻击会产生对抗性示例,这些示例具有很小的扰动,以欺骗DNN模型,尤其是在模型共享方案中。事实证明,对抗性训练是最有效的策略,它将对抗性示例注入模型训练中,以提高DNN模型的稳健性,以对对抗性攻击。但是,基于现有的对抗性示例的对抗训练无法很好地推广到标准,不受干扰的测试数据。为了在标准准确性和对抗性鲁棒性之间取得更好的权衡,我们提出了一个新型的对抗训练框架,称为潜在边界引导的对抗训练(梯子),该训练(梯子)在潜在的边界引导的对抗性示例上对对手进行对手训练DNN模型。与大多数在输入空间中生成对抗示例的现有方法相反,梯子通过增加对潜在特征的扰动而产生了无数的高质量对抗示例。扰动是沿SVM构建的具有注意机制的决策边界的正常情况进行的。我们从边界场的角度和可视化视图分析了生成的边界引导的对抗示例的优点。与Vanilla DNN和竞争性底线相比,对MNIST,SVHN,CELEBA和CIFAR-10的广泛实验和详细分析验证了梯子在标准准确性和对抗性鲁棒性之间取得更好的权衡方面的有效性。
translated by 谷歌翻译
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
translated by 谷歌翻译
由于它们在各个域中的大量成功,深入的学习技术越来越多地用于设计网络入侵检测解决方案,该解决方案检测和减轻具有高精度检测速率和最小特征工程的未知和已知的攻击。但是,已经发现,深度学习模型容易受到可以误导模型的数据实例,以使所谓的分类决策不正确(对抗示例)。此类漏洞允许攻击者通过向恶意流量添加小的狡猾扰动来逃避检测并扰乱系统的关键功能。在计算机视觉域中广泛研究了深度对抗学习的问题;但是,它仍然是网络安全应用中的开放研究领域。因此,本调查探讨了在网络入侵检测领域采用对抗机器学习的不同方面的研究,以便为潜在解决方案提供方向。首先,调查研究基于它们对产生对抗性实例的贡献来分类,评估ML的NID对逆势示例的鲁棒性,并捍卫这些模型的这种攻击。其次,我们突出了调查研究中确定的特征。此外,我们讨论了现有的通用对抗攻击对NIDS领域的适用性,启动拟议攻击在现实世界方案中的可行性以及现有缓解解决方案的局限性。
translated by 谷歌翻译
对卷积神经网络(CNN)的对抗性攻击的存在质疑这种模型对严重应用的适合度。攻击操纵输入图像,使得错误分类是在对人类观察者看上去正常的同时唤起的 - 因此它们不容易被检测到。在不同的上下文中,CNN隐藏层的反向传播激活(对给定输入的“特征响应”)有助于可视化人类“调试器” CNN“在计算其输出时对CNN”的看法。在这项工作中,我们提出了一种新颖的检测方法,以防止攻击。我们通过在特征响应中跟踪对抗扰动来做到这一点,从而可以使用平均局部空间熵自动检测。该方法不会改变原始的网络体系结构,并且完全可以解释。实验证实了我们对在Imagenet训练的大规模模型的最新攻击方法的有效性。
translated by 谷歌翻译