随着机器学习技术的快速发展,在日常生活的几乎每个方面都已部署了深度学习模式。但是,这些模型的隐私和安全性受到反对派攻击的威胁。其中黑盒攻击更接近现实,其中可以从模型中获取有限的知识。在本文中,我们提供了关于对抗性攻击的基本背景知识,并分析了四个黑箱攻击算法:匪徒,NES,Square Attack和ZoSignsgd全面。我们还探索了新提出的方形攻击方法,方面规模,希望提高其查询效率。
translated by 谷歌翻译
We propose the Square Attack, a score-based black-box l2and l∞-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized squareshaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least 1.8 and up to 3 compared to the recent state-ofthe-art l∞-attack of Al-Dujaili & OReilly (2020). Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.
translated by 谷歌翻译
深度神经网络(DNNS)在各种方案中对对抗数据敏感,包括黑框方案,在该方案中,攻击者只允许查询训练有素的模型并接收输出。现有的黑框方法用于创建对抗性实例的方法是昂贵的,通常使用梯度估计或培训替换网络。本文介绍了\ textit {Attackar},这是一种基于分数的进化,黑框攻击。 Attackar是基于一个新的目标函数,可用于无梯度优化问题。攻击仅需要访问分类器的输出徽标,因此不受梯度掩蔽的影响。不需要其他信息,使我们的方法更适合现实生活中的情况。我们使用三个基准数据集(MNIST,CIFAR10和Imagenet)使用三种不同的最先进模型(Inception-V3,Resnet-50和VGG-16-BN)测试其性能。此外,我们评估了Attackar在非分辨率转换防御和最先进的强大模型上的性能。我们的结果表明,在准确性得分和查询效率方面,攻击性的表现出色。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
虽然深度神经网络在各种任务中表现出前所未有的性能,但对对抗性示例的脆弱性阻碍了他们在安全关键系统中的部署。许多研究表明,即使在黑盒设置中也可能攻击,其中攻击者无法访问目标模型的内部信息。大多数黑匣子攻击基于查询,每个都可以获得目标模型的输入输出,并且许多研究侧重于减少所需查询的数量。在本文中,我们注意了目标模型的输出完全对应于查询输入的隐含假设。如果将某些随机性引入模型中,它可以打破假设,因此,基于查询的攻击可能在梯度估计和本地搜索中具有巨大的困难,这是其攻击过程的核心。从这种动机来看,我们甚至观察到一个小的添加剂输入噪声可以中和大多数基于查询的攻击和名称这个简单但有效的方法小噪声防御(SND)。我们分析了SND如何防御基于查询的黑匣子攻击,并展示其与CIFAR-10和ImageNet数据集的八种最先进的攻击有效性。即使具有强大的防御能力,SND几乎保持了原始的分类准确性和计算速度。通过在推断下仅添加一行代码,SND很容易适用于预先训练的模型。
translated by 谷歌翻译
为了适用于现实情况,提出了边界攻击(BAS),并仅使用决策信息确保了100%的攻击成功率。但是,现有的BA方法通过利用简单的随机抽样(SRS)来估算梯度来制作对抗性示例,从而消耗大量模型查询。为了克服SRS的弊端,本文提出了基于拉丁超立方体采样的边界攻击(LHS-BA)以节省查询预算。与SR相比,LHS在相同数量的随机样品中具有更好的均匀性。因此,这些随机样品的平均值比SRS估计的平均梯度更接近真实梯度。在包括MNIST,CIFAR和IMAGENET-1K在内的基准数据集上进行了各种实验。实验结果表明,就查询效率而言,拟议的LHS-BA优于最先进的BA方法。源代码可在https://github.com/gzhu-dvl/lhs-ba上公开获得。
translated by 谷歌翻译
愚弄深度神经网络(DNN)与黑匣子优化已成为一种流行的对抗攻击方式,因为DNN的结构先验知识始终是未知的。尽管如此,最近的黑匣子对抗性攻击可能会努力平衡其在解决高分辨率图像中产生的对抗性示例(AES)的攻击能力和视觉质量。在本文中,我们基于大规模的多目标进化优化,提出了一种关注引导的黑盒逆势攻击,称为LMOA。通过考虑图像的空间语义信息,我们首先利用注意图来确定扰动像素。而不是攻击整个图像,减少了具有注意机制的扰动像素可以有助于避免维度的臭名臭氧,从而提高攻击性能。其次,采用大规模的多目标进化算法在突出区域中遍历降低的像素。从其特征中受益,所产生的AES有可能在人类视力不可知的同时愚弄目标DNN。广泛的实验结果已经验证了所提出的LMOA在ImageNet数据集中的有效性。更重要的是,与现有的黑匣子对抗性攻击相比,产生具有更好的视觉质量的高分辨率AE更具竞争力。
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for 2 and ∞ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than several state-of-the-art decision-based adversarial attacks. It also achieves competitive performance in attacking several widely-used defense mechanisms.
translated by 谷歌翻译
基于随机搜索方案的对抗攻击已经获得最近的最先进的导致黑盒稳健性评估。然而,正如我们在这项工作中所展示的那样,他们在不同查询预算制度中的效率取决于潜在的提案分布的手动设计和启发式调整。我们研究如何通过基于攻击期间获得的信息在线调整提案分发来解决这个问题。我们考虑方攻击,这是一种最先进的基于分数的黑盒攻击,并演示了学习控制器如何改进其性能,该控制器可以在攻击期间调整提案分布的参数。我们使用白色框访问的CIFAR10模型上使用基于梯度的端到端培训培训控制器。我们展示将学习的控制器堵塞到攻击中,始终如一地将其黑盒稳健性估计在不同的查询制度中提高到20%,对于具有黑盒访问的广泛不同型号。我们进一步表明,学习的自适应原理将原理转移到其他数据分布,例如CIFAR100或ImageNet以及目标攻击设置。
translated by 谷歌翻译
Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partialinformation setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.
translated by 谷歌翻译
机器学习模型严重易于来自对抗性示例的逃避攻击。通常,对逆势示例的修改输入类似于原始输入的修改输入,在WhiteBox设置下由对手的WhiteBox设置构成,完全访问模型。然而,最近的攻击已经显示出使用BlackBox攻击的对逆势示例的查询号显着减少。特别是,警报是从越来越多的机器学习提供的经过培训的模型的访问界面中利用分类决定作为包括Google,Microsoft,IBM的服务提供商,并由包含这些模型的多种应用程序使用的服务提供商来利用培训的模型。对手仅利用来自模型的预测标签的能力被区别为基于决策的攻击。在我们的研究中,我们首先深入潜入最近的ICLR和SP的最先进的决策攻击,以突出发现低失真对抗采用梯度估计方法的昂贵性质。我们开发了一种强大的查询高效攻击,能够避免在梯度估计方法中看到的嘈杂渐变中的局部最小和误导中的截留。我们提出的攻击方法,ramboattack利用随机块坐标下降的概念来探索隐藏的分类器歧管,针对扰动来操纵局部输入功能以解决梯度估计方法的问题。重要的是,ramboattack对对对手和目标类别可用的不同样本输入更加强大。总的来说,对于给定的目标类,ramboattack被证明在实现给定查询预算的较低失真时更加强大。我们使用大规模的高分辨率ImageNet数据集来策划我们的广泛结果,并在GitHub上开源我们的攻击,测试样本和伪影。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
最近的研究表明,即使在攻击者无法访问模型信息的黑匣子场景中,基于深模型的检测器也容易受到对抗示例的影响。大多数现有的攻击方法旨在最大程度地减少真正的积极速率,这通常显示出较差的攻击性能,因为在受攻击的边界框中可以检测到另一个最佳的边界框成为新的真实积极的框架。为了解决这一挑战,我们建议最大程度地降低真实的正速率并最大化误报率,这可以鼓励更多的假阳性对象阻止新的真实正面边界框的产生。它被建模为多目标优化(MOP)问题,通用算法可以搜索帕累托最佳选择。但是,我们的任务具有超过200万个决策变量,导致搜索效率较低。因此,我们将标准的遗传算法扩展到了随机子集选择和称为GARSDC的分裂和矛盾,从而显着提高了效率。此外,为了减轻通用算法中人口质量的敏感性,我们利用具有相似骨架的不同检测器之间的可转移性产生了梯度优先人口。与最先进的攻击方法相比,GARSDC在地图中平均减少12.0,在广泛的实验中查询约1000倍。我们的代码可以在https://github.com/liangsiyuan21/ garsdc找到。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone -- sans a reasoning of how said answer was derived -- is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image -- hardly influencing the network's output -- we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.
translated by 谷歌翻译
由于它们对机器学习系统部署的可靠性的影响,对抗性样本的可转移性成为严重关注的问题,因为它们发现了进入许多关键应用程序的方式。了解影响对抗性样本可转移性的因素可以帮助专家了解如何建立鲁棒和可靠的机器学习系统的明智决策。本研究的目标是通过以攻击为中心的方法提供对对抗性样本可转移性背后的机制的见解。这种攻击的视角解释了通过评估机器学习攻击的影响(在给定的输入数据集中的影响来解释对抗性样本。为实现这一目标,我们使用攻击者模型产生对抗性样本并将这些样本转移到受害者模型中。我们分析了受害者模型对抗对抗样本的行为,并概述了可能影响对抗性样本可转移性的四种因素。虽然这些因素不一定是详尽无遗的,但它们对机器学习系统的研究人员和从业者提供了有用的见解。
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Video classification systems are vulnerable to adversarial attacks, which can create severe security problems in video verification. Current black-box attacks need a large number of queries to succeed, resulting in high computational overhead in the process of attack. On the other hand, attacks with restricted perturbations are ineffective against defenses such as denoising or adversarial training. In this paper, we focus on unrestricted perturbations and propose StyleFool, a black-box video adversarial attack via style transfer to fool the video classification system. StyleFool first utilizes color theme proximity to select the best style image, which helps avoid unnatural details in the stylized videos. Meanwhile, the target class confidence is additionally considered in targeted attacks to influence the output distribution of the classifier by moving the stylized video closer to or even across the decision boundary. A gradient-free method is then employed to further optimize the adversarial perturbations. We carry out extensive experiments to evaluate StyleFool on two standard datasets, UCF-101 and HMDB-51. The experimental results demonstrate that StyleFool outperforms the state-of-the-art adversarial attacks in terms of both the number of queries and the robustness against existing defenses. Moreover, 50% of the stylized videos in untargeted attacks do not need any query since they can already fool the video classification model. Furthermore, we evaluate the indistinguishability through a user study to show that the adversarial samples of StyleFool look imperceptible to human eyes, despite unrestricted perturbations.
translated by 谷歌翻译
深度神经网络容易受到称为对抗性攻击的小输入扰动。通过迭代最大限度地减少网络对真正阶级标签的信心来构建这些对手的事实,我们提出了旨在反对这种效果的反对派层。特别地,我们的层在对手1的相反方向上产生输入扰动,并馈送分类器的输入的扰动版本。我们的方法是无培训和理论上的支持。我们通过将我们的层与名义上和强大的培训模型组合来验证我们的方法的有效性,并从黑盒进行大规模实验到CIFAR10,CIFAR100和ImageNet的自适应攻击。我们的层显着提高了模型鲁棒性,同时在清洁准确性上没有成本。
translated by 谷歌翻译