Adversarial images are created with the intention of causing an image classifier to produce a misclassification. In this paper, we propose that adversarial images should be evaluated based on semantic mismatch, rather than label mismatch, as used in current work. In other words, we propose that an image of a "mug" would be considered adversarial if classified as "turnip", but not as "cup", as current systems would assume. Our novel idea of taking semantic misclassification into account in the evaluation of adversarial images offers two benefits. First, it is a more realistic conceptualization of what makes an image adversarial, which is important in order to fully understand the implications of adversarial images for security and privacy. Second, it makes it possible to evaluate the transferability of adversarial images to a real-world classifier, without requiring the classifier's label set to have been available during the creation of the images. The paper carries out an evaluation of a transfer attack on a real-world image classifier that is made possible by our semantic misclassification approach. The attack reveals patterns in the semantics of adversarial misclassifications that could not be investigated using conventional label mismatch.
translated by 谷歌翻译
Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partialinformation setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.
translated by 谷歌翻译
深度神经网络已被证明容易受到对抗图像的影响。常规攻击努力争取严格限制扰动的不可分割的对抗图像。最近,研究人员已采取行动探索可区分但非奇异的对抗图像,并证明色彩转化攻击是有效的。在这项工作中,我们提出了对抗颜色过滤器(ADVCF),这是一种新颖的颜色转换攻击,在简单颜色滤波器的参数空间中通过梯度信息进行了优化。特别是,明确指定了我们的颜色滤波器空间,以便从攻击和防御角度来对对抗性色转换进行系统的鲁棒性分析。相反,由于缺乏这种明确的空间,现有的颜色转换攻击并不能为系统分析提供机会。我们通过用户研究进一步进行了对成功率和图像可接受性的不同颜色转化攻击之间的广泛比较。其他结果为在另外三个视觉任务中针对ADVCF的模型鲁棒性提供了有趣的新见解。我们还强调了ADVCF的人类解剖性,该advcf在实际使用方案中有希望,并显示出比对图像可接受性和效率的最新人解释的色彩转化攻击的优越性。
translated by 谷歌翻译
尽管利用对抗性示例的可传递性可以达到非目标攻击的攻击成功率,但它在有针对性的攻击中不能很好地工作,因为从源图像到目标类别的梯度方向通常在不同的DNN中有所不同。为了提高目标攻击的可转移性,最近的研究使生成的对抗示例的特征与从辅助网络或生成对抗网络中学到的目标类别的特征分布保持一致。但是,这些作品假定培训数据集可用,并且需要大量时间来培训网络,这使得很难应用于现实世界。在本文中,我们从普遍性的角度重新审视具有针对性转移性的对抗性示例,并发现高度普遍的对抗扰动往往更容易转移。基于此观察结果,我们提出了图像(LI)攻击的局部性,以提高目标传递性。具体而言,Li不仅仅是使用分类损失,而是引入了对抗性扰动的原始图像和随机裁剪的图像之间的特征相似性损失,这使得对抗性扰动的特征比良性图像更为主导,因此提高了目标传递性的性能。通过将图像的局部性纳入优化扰动中,LI攻击强调,有针对性的扰动应与多样化的输入模式,甚至本地图像贴片有关。广泛的实验表明,LI可以实现基于转移的目标攻击的高成功率。在攻击Imagenet兼容数据集时,LI与现有最新方法相比,LI的提高为12 \%。
translated by 谷歌翻译
Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
Though CNNs have achieved the state-of-the-art performance on various vision tasks, they are vulnerable to adversarial examples -crafted by adding human-imperceptible perturbations to clean images. However, most of the existing adversarial attacks only achieve relatively low success rates under the challenging black-box setting, where the attackers have no knowledge of the model structure and parameters. To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns. Instead of only using the original images to generate adversarial examples, our method applies random transformations to the input images at each iteration. Extensive experiments on ImageNet show that the proposed attack method can generate adversarial examples that transfer much better to different networks than existing baselines. By evaluating our method against top defense solutions and official baselines from NIPS 2017 adversarial competition, the enhanced attack reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%. We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future. Code is available at https: //github.com/cihangxie/DI-2-FGSM .
translated by 谷歌翻译
An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system. * Work is done while visiting UC Berkeley.
translated by 谷歌翻译
Blackbox对抗攻击可以分为基于转移和基于查询的攻击。转移方法不需要受害模型的任何反馈,而是与基于查询的方法相比提供较低的成功率。查询攻击通常需要大量的成功查询。为了达到两种方法,最近的努力都试图将它们结合起来,但仍需要数百个查询才能获得高成功率(尤其是针对目标攻击)。在本文中,我们提出了一种通过替代集合搜索(基地)进行黑框攻击的新方法,该方法可以使用极少量的查询来生成非常成功的黑盒攻击。我们首先定义了扰动机,该机器通过在固定的替代模型上最小化加权损失函数来生成扰动的图像。为了为给定受害者模型生成攻击,我们使用扰动机产生的查询搜索损失函数中的权重。由于搜索空间的尺寸很小(与替代模型的数量相同),因此搜索需要少量查询。我们证明,与经过Imagenet训练的不同图像分类器(包括VGG-19,Densenet-121和Resnext-50)上的最新图像分类器相比,我们提出的方法的查询至少少了30倍,其查询至少少了30倍。特别是,我们的方法平均需要每张图像3个查询,以实现目标攻击的成功率超过90%,而对于非目标攻击的成功率超过99%,每个图像的1-2查询。我们的方法对Google Cloud Vision API也有效,并获得了91%的非目标攻击成功率,每张图像2.9查询。我们还表明,我们提出的方法生成的扰动是高度转移的,可以用于硬标签黑盒攻击。
translated by 谷歌翻译
虽然ImageNet最初被提出为计算机愿景领域的性能基准的数据集,但它也支持各种其他研究工作。对抗机器学习是一种这样的研究努力,采用欺骗性输入来制作错误的预测。为了评估对抗机器学习领域的攻击和防御,Imagenet仍然是最常用的数据集之一。但是,尚待调查的主题是对抗性实例被错误分类的课程的性质。在本文中,我们对这些错误分类类进行了详细的分析,利用了想象群类层次结构并测量了对逆势示例的不受干扰的起源中上述类别的相对位置。我们发现71%的普遍例子,即实现模型 - 模型对抗性转移性的普遍例子被错误分类为对底层源图像预测的前5个类之一。我们还发现,实际上,大量未确定的错误分类子集实际上是分类到语义上类似的课程。根据这些调查结果,我们讨论在评估未确定的对抗性成功时需要考虑到Imageenet类层次结构。此外,我们倡导未来的研究努力,以合并分类信息。
translated by 谷歌翻译
对抗性实例的有趣现象引起了机器学习中的显着关注,对社区可能更令人惊讶的是存在普遍对抗扰动(UAPS),即欺骗目标DNN的单一扰动。随着对深层分类器的关注,本调查总结了最近普遍对抗攻击的进展,讨论了攻击和防御方的挑战,以及uap存在的原因。我们的目标是将此工作扩展为动态调查,该调查将定期更新其内容,以遵循关于在广泛的域中的UAP或通用攻击的新作品,例如图像,音频,视频,文本等。将讨论相关更新:https://bit.ly/2sbqlgg。我们欢迎未来的作者在该领域的作品,联系我们,包括您的新发现。
translated by 谷歌翻译
In this work, we study the black-box targeted attack problem from the model discrepancy perspective. On the theoretical side, we present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack. We reveal that the attack error on a target model mainly depends on empirical attack error on the substitute model and the maximum model discrepancy among substitute models. On the algorithmic side, we derive a new algorithm for black-box targeted attacks based on our theoretical analysis, in which we additionally minimize the maximum model discrepancy(M3D) of the substitute models when training the generator to generate adversarial examples. In this way, our model is capable of crafting highly transferable adversarial examples that are robust to the model variation, thus improving the success rate for attacking the black-box model. We conduct extensive experiments on the ImageNet dataset with different classification models, and our proposed approach outperforms existing state-of-the-art methods by a significant margin. Our codes will be released.
translated by 谷歌翻译
基于深度学习的图像识别系统已广泛部署在当今世界的移动设备上。然而,在最近的研究中,深入学习模型被证明易受对抗的例子。一种逆势例的一个变种,称为对抗性补丁,由于其强烈的攻击能力而引起了研究人员的注意。虽然对抗性补丁实现了高攻击成功率,但由于补丁和原始图像之间的视觉不一致,它们很容易被检测到。此外,它通常需要对文献中的对抗斑块产生的大量数据,这是计算昂贵且耗时的。为了解决这些挑战,我们提出一种方法来产生具有一个单一图像的不起眼的对抗性斑块。在我们的方法中,我们首先通过利用多尺度发生器和鉴别器来决定基于受害者模型的感知敏感性的补丁位置,然后以粗糙的方式产生对抗性斑块。鼓励修补程序与具有对抗性训练的背景图像一致,同时保留强烈的攻击能力。我们的方法显示了白盒设置中的强烈攻击能力以及通过对具有不同架构和培训方法的各种型号的广泛实验,通过广泛的实验进行黑盒设置的优异转移性。与其他对抗贴片相比,我们的对抗斑块具有最大忽略的风险,并且可以避免人类观察,这是由显着性图和用户评估结果的插图支持的人类观察。最后,我们表明我们的对抗性补丁可以应用于物理世界。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
对抗性示例可用于恶意和秘密地改变模型的预测。众所周知,为一个模型设计的对抗示例也可以传输到其他模型。这构成了主要威胁,因为这意味着攻击者可以以黑框方式对准系统。在可转让性的领域,研究人员提出了使攻击更加可转移的方法,并使模型更强大,以转移的示例。但是,据我们所知,尚无作品提出一种在黑盒攻击者的角度对对抗性示例的转移性进行排名的方法。这是一项重要的任务,因为攻击者可能只使用一组选定的示例,因此需要选择最有可能传输的样本。在本文中,我们建议一种方法来排名在不访问受害者模型的情况下对对抗性示例的可传递性。为此,我们定义并估算了有关受害者信息有限的样本的预期可传递性。我们还探讨了实用的方案:对手可以选择要攻击的最佳样本以及对手必须使用特定样本,但可以选择不同的扰动。通过我们的实验,我们发现我们的排名方法可以将攻击者的成功率提高高达80%(而无需排名)。
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Convolutional neural networks have demonstrated high accuracy on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. For example, imperceptible perturbations added to clean images can cause convolutional neural networks to fail. In this paper, we propose to utilize randomization at inference time to mitigate adversarial effects. Specifically, we use two randomization operations: random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner. Extensive experiments demonstrate that the proposed randomization method is very effective at defending against both single-step and iterative attacks. Our method provides the following advantages: 1) no additional training or fine-tuning, 2) very few additional computations, 3) compatible with other adversarial defense methods. By combining the proposed randomization method with an adversarially trained model, it achieves a normalized score of 0.924 (ranked No.2 among 107 defense teams) in the NIPS 2017 adversarial examples defense challenge, which is far better than using adversarial training alone with a normalized score of 0.773 (ranked No.56). The code is public available at https: //github.com/cihangxie/NIPS2017_adv_challenge_defense.
translated by 谷歌翻译
对抗性攻击只着眼于改变分类器的预测,但是它们的危险在很大程度上取决于班级的错误方式。例如,当自动驾驶系统将波斯猫误认为是暹罗猫时,这几乎不是问题。但是,如果它以120公里/小时的最低速度标志误认为猫,可能会出现严重的问题。作为对更有威胁性的对抗性攻击的垫脚石,我们考虑了超级阶级的对抗性攻击,这不仅会导致不仅级别的班级,而且会导致超类。我们在准确性,速度和稳定性方面对超级类对抗攻击(现有和19种新方法)进行了首次全面分析,并确定了几种实现更好性能的策略。尽管这项研究旨在超类错误分类,但这些发现可以应用于涉及多个类别的其他问题设置,例如TOP-K和多标签分类攻击。
translated by 谷歌翻译
已知用于图像分类的深神经网络(DNN)容易受到对抗性例子的影响。而且,对抗性示例具有可转移性,这意味着DNN模型的对抗示例可以欺骗其他具有非平凡概率的黑框模型。这给出了基于转移的对抗攻击,其中使用了预验证或已知模型(称为替代模型)产生的对抗示例来进行黑盒攻击。关于如何从给定的替代模型中生成对抗性示例以实现更好的可传递性,有一些工作。但是,训练一种特殊的替代模型以生成具有更好可传递性的对抗性示例的情况相对较小的探索。在本文中,我们提出了一种培训具有丰富黑暗知识的替代模型的方法,以提高替代模型产生的对抗性示例的对抗性转移性。该训练有素的替代模型被命名为“黑暗代理模型”(DSM),培训DSM的建议方法由两个关键组成部分组成:一种教师模型提取黑暗知识并提供软标签,以及增强的混合增强技能,增强了训练数据的黑暗知识。已经进行了广泛的实验,以表明所提出的方法可以基本上改善替代模型的不同体系结构和优化者的替代模型的对抗性转移性,以生成对抗性示例。我们还表明,所提出的方法可以应用于包含黑暗知识(例如面部验证)的基于转移攻击的其他情况。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译