Adversarial attacks can easily fool object recognition systems based on deep neural networks (DNNs). Although many defense methods have been proposed in recent years, most of them can still be adaptively evaded. One reason for the weak adversarial robustness may be that DNNs are only supervised by category labels and do not have part-based inductive bias like the recognition process of humans. Inspired by a well-known theory in cognitive psychology -- recognition-by-components, we propose a novel object recognition model ROCK (Recognizing Object by Components with human prior Knowledge). It first segments parts of objects from images, then scores part segmentation results with predefined human prior knowledge, and finally outputs prediction based on the scores. The first stage of ROCK corresponds to the process of decomposing objects into parts in human vision. The second stage corresponds to the decision process of the human brain. ROCK shows better robustness than classical recognition models across various attack settings. These results encourage researchers to rethink the rationality of currently widely-used DNN-based object recognition models and explore the potential of part-based models, once important but recently ignored, for improving robustness.
translated by 谷歌翻译
我们表明,将人类的先验知识与端到端学习相结合可以通过引入基于零件的对象分类模型来改善深神经网络的鲁棒性。我们认为,更丰富的注释形式有助于指导神经网络学习更多可靠的功能,而无需更多的样本或更大的模型。我们的模型将零件分割模型与一个微小的分类器结合在一起,并经过训练的端到端,以同时将对象分割为各个部分,然后对分段对象进行分类。从经验上讲,与所有三个数据集的Resnet-50基线相比,我们的基于部分的模型既具有更高的精度和更高的对抗性鲁棒性。例如,鉴于相同的鲁棒性,我们部分模型的清洁准确性高达15个百分点。我们的实验表明,这些模型还减少了纹理偏见,并对共同的腐败和虚假相关性产生更好的鲁棒性。该代码可在https://github.com/chawins/adv-part-model上公开获得。
translated by 谷歌翻译
人类严重依赖于形状信息来识别对象。相反,卷积神经网络(CNNS)偏向于纹理。这也许是CNNS易受对抗性示例的影响的主要原因。在这里,我们探索如何将偏差纳入CNN,以提高其鲁棒性。提出了两种算法,基于边缘不变,以中等难以察觉的扰动。在第一个中,分类器在具有边缘图作为附加信道的图像上进行前列地培训。在推断时间,边缘映射被重新计算并连接到图像。在第二算法中,训练了条件GaN,以将边缘映射从干净和/或扰动图像转换为清洁图像。推断在与输入的边缘图对应的生成图像上完成。超过10个数据集的广泛实验证明了算法对FGSM和$ \ ELL_ infty $ PGD-40攻击的有效性。此外,我们表明a)边缘信息还可以使其他对抗训练方法有益,并且B)在边缘增强输入上培训的CNNS对抗自然图像损坏,例如运动模糊,脉冲噪声和JPEG压缩,而不是仅培训的CNNS RGB图像。从更广泛的角度来看,我们的研究表明,CNN不会充分占对鲁棒性至关重要的图像结构。代码可用:〜\ url {https://github.com/aliborji/shapedefense.git}。
translated by 谷歌翻译
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, cause deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the target is a pixel or a receptive field in segmentation, and an object proposal in detection). This inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations. Based on this idea, we propose a novel algorithm named Dense Adversary Generation (DAG), which generates a large family of adversarial examples, and applies to a wide range of state-of-the-art deep networks for segmentation and detection. We also find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transferability across networks with the same architecture is more significant than in other cases. Besides, summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of blackbox adversarial attack.
translated by 谷歌翻译
视觉变形金刚(VITS)处理将图像输入图像作为通过自我关注的斑块;比卷积神经网络(CNNS)彻底不同的结构。这使得研究Vit模型的对抗特征空间及其可转移性有趣。特别是,我们观察到通过常规逆势攻击发现的对抗性模式,即使对于大型Vit模型,也表现出非常低的黑箱可转移性。但是,我们表明这种现象仅是由于不利用VITS的真实表示潜力的次优攻击程序。深紫色由多个块组成,具有一致的架构,包括自我关注和前馈层,其中每个块能够独立地产生类令牌。仅使用最后一类令牌(传统方法)制定攻击并不直接利用存储在早期令牌中的辨别信息,从而导致VITS的逆势转移性差。使用Vit模型的组成性质,我们通过引入特定于Vit模型结构的两种新策略来增强现有攻击的可转移性。 (i)自我合奏:我们提出了一种通过将单vit模型解剖到网络的集合来找到多种判别途径的方法。这允许在每个VIT块处明确地利用特定于类信息。 (ii)令牌改进:我们建议改进令牌,以进一步增强每种Vit障碍的歧视能力。我们的令牌细化系统地将类令牌系统组合在补丁令牌中保留的结构信息。在一个视觉变压器中发现的分类器的集合中应用于此类精炼令牌时,对抗攻击具有明显更高的可转移性。
translated by 谷歌翻译
对抗斑块攻击通过在指定的局部区域中注入对抗像素来误导神经网络。补丁攻击可以在各种任务中非常有效,并且可以通过附件(例如贴纸)在现实世界对象上实现。尽管攻击模式的多样性,但对抗斑块往往具有高质感,并且外观与自然图像不同。我们利用此属性,并在patchzero上进行patchzero,这是一种针对白色框对面补丁的任务不合时宜的防御。具体而言,我们的防御通过用平均像素值重新粉刷来检测对抗性像素和“零”斑块区域。我们将补丁检测问题作为语义分割任务提出,以便我们的模型可以推广到任何大小和形状的贴片。我们进一步设计了一个两阶段的对抗训练计划,以防止更强烈的适应性攻击。我们在图像分类(ImageNet,resisc45),对象检测(Pascal VOC)和视频分类(UCF101)数据集上彻底评估PatchZero。我们的方法可实现SOTA的稳健精度,而不会在良性表现中降解。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
在过去的十年中,深度学习急剧改变了传统的手工艺特征方式,具有强大的功能学习能力,从而极大地改善了传统任务。然而,最近已经证明了深层神经网络容易受到对抗性例子的影响,这种恶意样本由小型设计的噪音制作,误导了DNNs做出错误的决定,同时仍然对人类无法察觉。对抗性示例可以分为数字对抗攻击和物理对抗攻击。数字对抗攻击主要是在实验室环境中进行的,重点是改善对抗性攻击算法的性能。相比之下,物理对抗性攻击集中于攻击物理世界部署的DNN系统,这是由于复杂的物理环境(即亮度,遮挡等),这是一项更具挑战性的任务。尽管数字对抗和物理对抗性示例之间的差异很小,但物理对抗示例具有特定的设计,可以克服复杂的物理环境的效果。在本文中,我们回顾了基于DNN的计算机视觉任务任务中的物理对抗攻击的开发,包括图像识别任务,对象检测任务和语义细分。为了完整的算法演化,我们将简要介绍不涉及身体对抗性攻击的作品。我们首先提出一个分类方案,以总结当前的物理对抗攻击。然后讨论现有的物理对抗攻击的优势和缺点,并专注于用于维持对抗性的技术,当应用于物理环境中时。最后,我们指出要解决的当前身体对抗攻击的问题并提供有前途的研究方向。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
对抗性的例子揭示了神经网络的脆弱性和不明原因的性质。研究对抗性实例的辩护具有相当大的实际重要性。大多数逆势的例子,错误分类网络通常无法被人类不可检测。在本文中,我们提出了一种防御模型,将分类器培训成具有形状偏好的人类感知分类模型。包括纹理传输网络(TTN)和辅助防御生成的对冲网络(GAN)的所提出的模型被称为人类感知辅助防御GaN(had-GaN)。 TTN用于扩展清洁图像的纹理样本,并有助于分类器聚焦在其形状上。 GaN用于为模型形成培训框架并生成必要的图像。在MNIST,时尚 - MNIST和CIFAR10上进行的一系列实验表明,所提出的模型优于网络鲁棒性的最先进的防御方法。该模型还证明了对抗性实例的防御能力的显着改善。
translated by 谷歌翻译
对基于机器学习的分类器以及防御机制的对抗攻击已在单一标签分类问题的背景下广泛研究。在本文中,我们将注意力转移到多标签分类,其中关于所考虑的类别中的关系的域知识可以提供自然的方法来发现不连贯的预测,即与培训数据之外的对抗的例子相关的预测分配。我们在框架中探讨这种直觉,其中一阶逻辑知识被转换为约束并注入半监督的学习问题。在此设置中,约束分类器学会满足边际分布的域知识,并且可以自然地拒绝具有不连贯预测的样本。尽管我们的方法在训练期间没有利用任何对攻击的知识,但我们的实验分析令人惊讶地推出了域名知识约束可以有效地帮助检测对抗性示例,特别是如果攻击者未知这样的约束。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
深度神经网络已被证明容易受到对抗图像的影响。常规攻击努力争取严格限制扰动的不可分割的对抗图像。最近,研究人员已采取行动探索可区分但非奇异的对抗图像,并证明色彩转化攻击是有效的。在这项工作中,我们提出了对抗颜色过滤器(ADVCF),这是一种新颖的颜色转换攻击,在简单颜色滤波器的参数空间中通过梯度信息进行了优化。特别是,明确指定了我们的颜色滤波器空间,以便从攻击和防御角度来对对抗性色转换进行系统的鲁棒性分析。相反,由于缺乏这种明确的空间,现有的颜色转换攻击并不能为系统分析提供机会。我们通过用户研究进一步进行了对成功率和图像可接受性的不同颜色转化攻击之间的广泛比较。其他结果为在另外三个视觉任务中针对ADVCF的模型鲁棒性提供了有趣的新见解。我们还强调了ADVCF的人类解剖性,该advcf在实际使用方案中有希望,并显示出比对图像可接受性和效率的最新人解释的色彩转化攻击的优越性。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
深度神经网络的图像分类容易受到对抗性扰动的影响。图像分类可以通过在输入图像中添加人造小且不可察觉的扰动来轻松愚弄。作为最有效的防御策略之一,提出了对抗性训练,以解决分类模型的脆弱性,其中创建了对抗性示例并在培训期间注入培训数据中。在过去的几年中,对分类模型的攻击和防御进行了深入研究。语义细分作为分类的扩展,最近也受到了极大的关注。最近的工作表明,需要大量的攻击迭代来创建有效的对抗性示例来欺骗分割模型。该观察结果既可以使鲁棒性评估和对分割模型的对抗性培训具有挑战性。在这项工作中,我们提出了一种称为SEGPGD的有效有效的分割攻击方法。此外,我们提供了收敛分析,以表明在相同数量的攻击迭代下,提出的SEGPGD可以创建比PGD更有效的对抗示例。此外,我们建议将SEGPGD应用于分割对抗训练的基础攻击方法。由于SEGPGD可以创建更有效的对抗性示例,因此使用SEGPGD的对抗训练可以提高分割模型的鲁棒性。我们的建议还通过对流行分割模型体系结构和标准分段数据集进行了验证。
translated by 谷歌翻译
虽然深度神经网络(DNN)在许多真实的任务中实现了出色的性能,但它们非常容易受到对抗的攻击。对抗这种攻击的主要防御是对抗的,一种技术,通过将对抗噪声引入其输入来训练DNN培训以训练为对抗性攻击的技术。此程序是有效的,但必须在培训阶段进行。在这项工作中,我们提出了增强随机森林(ARF),这是一个简单易用的策略,用于在不修改其权重的情况下强化现有的预磨损DNN。对于每个图像,我们通过应用不同颜色,模糊,噪声和几何变换来生成随机测试时间增强。然后我们使用DNN的Logits输出来训练一个简单的随机林来预测真正的类标签。我们的方法在自然图像的分类上最小的妥协,实现了最先进的对抗鲁棒性对白和黑匣子攻击的多样性。我们也针对许多适应性的白盒攻击测试ARF,并在与对抗训练结合时显示出优异的结果。代码可在https://github.com/giladcohen/arf获得。
translated by 谷歌翻译