机器学习算法已被证明通过系统修改(例如,图像识别)中的输入(例如,对抗性示例)的系统修改(例如,对抗性示例)容易受到对抗操作的影响。在默认威胁模型下,对手利用了图像的无约束性质。每个功能(像素)完全由对手控制。但是,尚不清楚这些攻击如何转化为限制对手可以修改的特征以及如何修改特征的约束域(例如,网络入侵检测)。在本文中,我们探讨了受约束的域是否比不受约束的域对对抗性示例生成算法不那么脆弱。我们创建了一种用于生成对抗草图的算法:针对性的通用扰动向量,该向量在域约束的信封内编码特征显着性。为了评估这些算法的性能,我们在受约束(例如网络入侵检测)和不受约束(例如图像识别)域中评估它们。结果表明,我们的方法在约束域中产生错误分类率,这些域与不受约束的域(大于95%)相当。我们的调查表明,受约束域暴露的狭窄攻击表面仍然足够大,可以制作成功的对抗性例子。因此,约束似乎并不能使域变得健壮。实际上,只有五个随机选择的功能,仍然可以生成对抗性示例。
translated by 谷歌翻译
机器学习容易受到对抗的示例 - 输入,旨在使模型表现不佳。但是,如果对逆势示例代表建模域中的现实输入,则尚不清楚。不同的域,如网络和网络钓鱼具有域制约束 - 在对手必须满足攻击方面必须满足要实现的攻击(除了任何对手特定的目标)之间的特征之间的复杂关系。在本文中,我们探讨了域限制如何限制对抗性能力以及对手如何适应创建现实(符合限制)示例的策略。在此,我们开发从数据学习域约束的技术,并展示如何将学习的约束集成到对抗性制作过程中。我们评估我们在网络入侵和网络钓鱼数据集中的方法的功效,并发现:(1)最多82%的对抗实例由最先进的制作算法产生的违规结构域约束,(2)域约束对对抗性鲁棒例子;强制约束产生模型精度的增加高达34%。我们不仅观察到对手必须改变投入以满足领域约束,但这些约束使得产生有效的对抗例子的产生远远挑战。
translated by 谷歌翻译
Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
translated by 谷歌翻译
随着深度神经网络(DNNS)的进步在许多关键应用中表现出前所未有的性能水平,它们的攻击脆弱性仍然是一个悬而未决的问题。我们考虑在测试时间进行逃避攻击,以防止在受约束的环境中进行深入学习,其中需要满足特征之间的依赖性。这些情况可能自然出现在表格数据中,也可能是特定应用程序域中功能工程的结果,例如网络安全中的威胁检测。我们提出了一个普通的基于迭代梯度的框架,称为围栏,用于制定逃避攻击,考虑到约束域和应用要求的细节。我们将其应用于针对两个网络安全应用培训的前馈神经网络:网络流量僵尸网络分类和恶意域分类,以生成可行的对抗性示例。我们广泛评估了攻击的成功率和绩效,比较它们对几个基线的改进,并分析影响攻击成功率的因素,包括优化目标和数据失衡。我们表明,通过最少的努力(例如,生成12个其他网络连接),攻击者可以将模型的预测从恶意类更改为良性并逃避分类器。我们表明,在具有更高失衡的数据集上训练的模型更容易受到我们的围栏攻击。最后,我们证明了在受限领域进行对抗训练的潜力,以提高针对这些逃避攻击的模型弹性。
translated by 谷歌翻译
Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
在过去的十年中,已经对对抗性的例子,旨在诱导机器学习模型中最坏情况行为的输入进行了广泛的研究。然而,我们对这一现象的理解源于相当零散的知识库。目前,有少数攻击,每个攻击在威胁模型中都有不同的假设和无与伦比的最优定义。在本文中,我们提出了一种系统的方法来表征最坏情况(即最佳)对手。我们首先通过将攻击组件雾化到表面和旅行者中,引入对抗机器学习中攻击的扩展分解。通过分解,我们列举了组件以创建576次攻击(以前没有探索568次攻击)。接下来,我们提出了帕累托合奏攻击(PEA):上限攻击性能的理论攻击。有了我们的新攻击,我们衡量相对于PEA的性能:稳健和非稳定模型,七个数据集和三个扩展的基于LP的威胁模型,其中包含计算成本,从而形式化了对抗性策略的空间。从我们的评估中,我们发现攻击性能是高度背景的:域,稳健性和威胁模型可以对攻击效率产生深远的影响。我们的调查表明,未来衡量机器学习安全性的研究应:(1)与域和威胁模型背景相关,并且(2)超越了当今使用的少数已知攻击。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
由于它们在各个域中的大量成功,深入的学习技术越来越多地用于设计网络入侵检测解决方案,该解决方案检测和减轻具有高精度检测速率和最小特征工程的未知和已知的攻击。但是,已经发现,深度学习模型容易受到可以误导模型的数据实例,以使所谓的分类决策不正确(对抗示例)。此类漏洞允许攻击者通过向恶意流量添加小的狡猾扰动来逃避检测并扰乱系统的关键功能。在计算机视觉域中广泛研究了深度对抗学习的问题;但是,它仍然是网络安全应用中的开放研究领域。因此,本调查探讨了在网络入侵检测领域采用对抗机器学习的不同方面的研究,以便为潜在解决方案提供方向。首先,调查研究基于它们对产生对抗性实例的贡献来分类,评估ML的NID对逆势示例的鲁棒性,并捍卫这些模型的这种攻击。其次,我们突出了调查研究中确定的特征。此外,我们讨论了现有的通用对抗攻击对NIDS领域的适用性,启动拟议攻击在现实世界方案中的可行性以及现有缓解解决方案的局限性。
translated by 谷歌翻译
Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10 30 . We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.
translated by 谷歌翻译
许多机器学习问题在表格域中使用数据。对抗性示例可能对这些应用尤其有害。然而,现有关于对抗鲁棒性的作品主要集中在图像和文本域中的机器学习模型。我们认为,由于表格数据和图像或文本之间的差异,现有的威胁模型不适合表格域。这些模型没有捕获该成本比不可识别更重要,也不能使对手可以将不同的价值归因于通过部署不同的对手示例获得的效用。我们表明,由于这些差异,用于图像的攻击和防御方法和文本无法直接应用于表格设置。我们通过提出新的成本和公用事业感知的威胁模型来解决这些问题,该模型量身定制了针对表格域的攻击者的攻击者的约束。我们介绍了一个框架,使我们能够设计攻击和防御机制,从而导致模型免受成本或公用事业意识的对手的影响,例如,受到一定美元预算约束的对手。我们表明,我们的方法在与对应于对抗性示例具有经济和社会影响的应用相对应的三个表格数据集中有效。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.
translated by 谷歌翻译
Strengthening the robustness of machine learning-based Android malware detectors in the real world requires incorporating realizable adversarial examples (RealAEs), i.e., AEs that satisfy the domain constraints of Android malware. However, existing work focuses on generating RealAEs in the problem space, which is known to be time-consuming and impractical for adversarial training. In this paper, we propose to generate RealAEs in the feature space, leading to a simpler and more efficient solution. Our approach is driven by a novel interpretation of Android malware properties in the feature space. More concretely, we extract feature-space domain constraints by learning meaningful feature dependencies from data and applying them by constructing a robust feature space. Our experiments on DREBIN, a well-known Android malware detector, demonstrate that our approach outperforms the state-of-the-art defense, Sec-SVM, against realistic gradient- and query-based attacks. Additionally, we demonstrate that generating feature-space RealAEs is faster than generating problem-space RealAEs, indicating its high applicability in adversarial training. We further validate the ability of our learned feature-space domain constraints in representing the Android malware properties by showing that (i) re-training detectors with our feature-space RealAEs largely improves model performance on similar problem-space RealAEs and (ii) using our feature-space domain constraints can help distinguish RealAEs from unrealizable AEs (unRealAEs).
translated by 谷歌翻译
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
translated by 谷歌翻译
在过去的几十年中,人工智能的兴起使我们有能力解决日常生活中最具挑战性的问题,例如癌症的预测和自主航行。但是,如果不保护对抗性攻击,这些应用程序可能不会可靠。此外,最近的作品表明,某些对抗性示例可以在不同的模型中转移。因此,至关重要的是避免通过抵抗对抗性操纵的强大模型进行这种可传递性。在本文中,我们提出了一种基于特征随机化的方法,该方法抵抗了八次针对测试阶段深度学习模型的对抗性攻击。我们的新方法包括改变目标网络分类器中的训练策略并选择随机特征样本。我们认为攻击者具有有限的知识和半知识条件,以进行最普遍的对抗性攻击。我们使用包括现实和合成攻击的众所周知的UNSW-NB15数据集评估了方法的鲁棒性。之后,我们证明我们的策略优于现有的最新方法,例如最强大的攻击,包括针对特定的对抗性攻击进行微调网络模型。最后,我们的实验结果表明,我们的方法可以确保目标网络并抵抗对抗性攻击的转移性超过60%。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate adversarial risk as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as obscurity to an adversary, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
translated by 谷歌翻译
第五代(5G)网络必须支持数十亿个异质设备,同时保证最佳服务质量(QoS)。这样的要求是不可能单独满足人类努力的,而机器学习(ML)代表了5G中的核心资产。然而,已知ML容易受到对抗例子的影响。此外,正如我们的论文所表明的那样,5G上下文暴露于另一种类型的对抗ML攻击,而现有威胁模型无法正式化。由于缺乏可用于对抗性ML研究的ML供电的5G设备,因此对此类风险的积极评估也有挑战性。为了解决这些问题,我们提出了一种新型的对抗ML威胁模型,该模型特别适合5G场景,不可知ML所解决的精确函数。与现有的ML威胁模型相反,我们的攻击不需要对目标5G系统的任何妥协,同时由于QoS保证和5G网络的开放性质仍然可行。此外,我们为基于公共数据的现实ML安全评估提供了一个原始框架。我们主动评估我们的威胁模型对5G中设想的ML的6个应用。我们的攻击会影响训练和推理阶段,可能会降低最先进的ML系统的性能,并且与以前的攻击相比,进入障碍较低。
translated by 谷歌翻译