共同突出的对象检测(Cosod)最近实现了重大进展,并在检索相关任务中发挥了关键作用。但是,它不可避免地构成了完全新的安全问题,即,高度个人和敏感的内容可能会通过强大的COSOD方法提取。在本文中,我们从对抗性攻击的角度解决了这个问题,并确定了一种小说任务:对抗的共同显着性攻击。特别地,给定从包含某种常见和突出对象的一组图像中选择的图像,我们的目标是生成可能误导Cosod方法以预测不正确的共突变区域的侵略性版本。注意,与分类的一般白盒对抗攻击相比,这项新任务面临两种额外的挑战:(1)由于本集团中图像的不同外观,成功率低; (2)Cosod方法的低可转换性由于Cosod管道之间的差异相当差异。为了解决这些挑战,我们提出了第一个黑匣子联合对抗的暴露和噪声攻击(JADENA),在那里我们共同和本地调整图像的曝光和添加剂扰动,根据新设计的高特征级对比度敏感损失功能。我们的方法,没有关于最先进的Cosod方法的任何信息,导致各种共同显着性检测数据集的显着性能下降,并使共同突出的物体无法检测到。这在适当地确保目前在互联网上共享的大量个人照片中可以具有很强的实际效益。此外,我们的方法是用于评估Cosod方法的稳健性的指标的潜力。
translated by 谷歌翻译
对于黑盒攻击,替代模型和受害者模型之间的差距通常很大,这表现为弱攻击性能。通过观察到,可以通过同时攻击多样的模型来提高对抗性示例的可传递性,并提出模型增强方法,这些模型通过使用转换图像模拟不同的模型。但是,空间域的现有转换不会转化为显着多样化的增强模型。为了解决这个问题,我们提出了一种新型的频谱模拟攻击,以针对正常训练和防御模型制作更容易转移的对抗性例子。具体而言,我们将频谱转换应用于输入,从而在频域中执行模型增强。从理论上讲,我们证明了从频域中得出的转换导致不同的频谱显着图,这是我们提出的指标,以反映替代模型的多样性。值得注意的是,我们的方法通常可以与现有攻击结合使用。 Imagenet数据集的广泛实验证明了我们方法的有效性,\ textit {e.g。},攻击了九个最先进的防御模型,其平均成功率为\ textbf {95.4 \%}。我们的代码可在\ url {https://github.com/yuyang-long/ssa}中获得。
translated by 谷歌翻译
Pixel-wise prediction with deep neural network has become an effective paradigm for salient object detection (SOD) and achieved remarkable performance. However, very few SOD models are robust against adversarial attacks which are visually imperceptible for human visual attention. The previous work robust saliency (ROSA) shuffles the pre-segmented superpixels and then refines the coarse saliency map by the densely connected conditional random field (CRF). Different from ROSA that relies on various pre- and post-processings, this paper proposes a light-weight Learnable Noise (LeNo) to defend adversarial attacks for SOD models. LeNo preserves accuracy of SOD models on both adversarial and clean images, as well as inference speed. In general, LeNo consists of a simple shallow noise and noise estimation that embedded in the encoder and decoder of arbitrary SOD networks respectively. Inspired by the center prior of human visual attention mechanism, we initialize the shallow noise with a cross-shaped gaussian distribution for better defense against adversarial attacks. Instead of adding additional network components for post-processing, the proposed noise estimation modifies only one channel of the decoder. With the deeply-supervised noise-decoupled training on state-of-the-art RGB and RGB-D SOD networks, LeNo outperforms previous works not only on adversarial images but also on clean images, which contributes stronger robustness for SOD. Our code is available at https://github.com/ssecv/LeNo.
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.
translated by 谷歌翻译
Deep neural networks are vulnerable to adversarial examples, which can mislead classifiers by adding imperceptible perturbations. An intriguing property of adversarial examples is their good transferability, making black-box attacks feasible in real-world applications. Due to the threat of adversarial attacks, many methods have been proposed to improve the robustness. Several state-of-the-art defenses are shown to be robust against transferable adversarial examples. In this paper, we propose a translation-invariant attack method to generate more transferable adversarial examples against the defense models. By optimizing a perturbation over an ensemble of translated images, the generated adversarial example is less sensitive to the white-box model being attacked and has better transferability. To improve the efficiency of attacks, we further show that our method can be implemented by convolving the gradient at the untranslated image with a pre-defined kernel. Our method is generally applicable to any gradient-based attack method. Extensive experiments on the ImageNet dataset validate the effectiveness of the proposed method. Our best attack fools eight state-of-the-art defenses at an 82% success rate on average based only on the transferability, demonstrating the insecurity of the current defense techniques.
translated by 谷歌翻译
Though CNNs have achieved the state-of-the-art performance on various vision tasks, they are vulnerable to adversarial examples -crafted by adding human-imperceptible perturbations to clean images. However, most of the existing adversarial attacks only achieve relatively low success rates under the challenging black-box setting, where the attackers have no knowledge of the model structure and parameters. To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns. Instead of only using the original images to generate adversarial examples, our method applies random transformations to the input images at each iteration. Extensive experiments on ImageNet show that the proposed attack method can generate adversarial examples that transfer much better to different networks than existing baselines. By evaluating our method against top defense solutions and official baselines from NIPS 2017 adversarial competition, the enhanced attack reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%. We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future. Code is available at https: //github.com/cihangxie/DI-2-FGSM .
translated by 谷歌翻译
深度神经网络容易受到通过对输入对难以察觉的变化进行制作的对抗性示例。但是,这些对手示例在适用于模型及其参数的白盒设置中最成功。寻找可转移到其他模型或在黑匣子设置中开发的对抗性示例显着更加困难。在本文中,我们提出了可转移的对抗性实例的方向聚集的对抗性攻击。我们的方法在攻击过程中使用聚集方向,以避免产生的对抗性示例在白盒模型上过度拟合。关于Imagenet的广泛实验表明,我们的提出方法显着提高了对抗性实例的可转移性,优于最先进的攻击,特别是对抗对抗性稳健的模型。我们所提出的方法的最佳平均攻击成功率达到94.6 \%,针对三种对手训练模型和94.8%抵御五种防御方法。它还表明,目前的防御方法不会阻止可转移的对抗性攻击。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
深度神经网络已被证明容易受到对抗图像的影响。常规攻击努力争取严格限制扰动的不可分割的对抗图像。最近,研究人员已采取行动探索可区分但非奇异的对抗图像,并证明色彩转化攻击是有效的。在这项工作中,我们提出了对抗颜色过滤器(ADVCF),这是一种新颖的颜色转换攻击,在简单颜色滤波器的参数空间中通过梯度信息进行了优化。特别是,明确指定了我们的颜色滤波器空间,以便从攻击和防御角度来对对抗性色转换进行系统的鲁棒性分析。相反,由于缺乏这种明确的空间,现有的颜色转换攻击并不能为系统分析提供机会。我们通过用户研究进一步进行了对成功率和图像可接受性的不同颜色转化攻击之间的广泛比较。其他结果为在另外三个视觉任务中针对ADVCF的模型鲁棒性提供了有趣的新见解。我们还强调了ADVCF的人类解剖性,该advcf在实际使用方案中有希望,并显示出比对图像可接受性和效率的最新人解释的色彩转化攻击的优越性。
translated by 谷歌翻译
深面识别(FR)在几个具有挑战性的数据集上取得了很高的准确性,并促进了成功的现实世界应用程序,甚至表现出对照明变化的高度鲁棒性,通常被认为是对FR系统的主要威胁。但是,在现实世界中,有限的面部数据集无法完全涵盖由不同的照明条件引起的照明变化。在本文中,我们从新角度(即对抗性攻击)研究对FR的照明的威胁,并确定一项新任务,即对对抗性的重视。鉴于面部图像,对抗性的重新获得旨在在欺骗最先进的深FR方法的同时产生自然重新的对应物。为此,我们首先提出了基于物理模型的对抗重新攻击(ARA),称为反照率基于反击的对抗性重新攻击(AQ-ARA)。它在物理照明模型和FR系统的指导下生成了自然的对抗光,并合成了对抗性重新重新确认的面部图像。此外,我们通过训练对抗性重新确定网络(ARNET)提出自动预测性的对抗重新攻击(AP-ARA),以根据不同的输入面自动以一步的方式自动预测对抗光,从而允许对效率敏感的应用。更重要的是,我们建议将上述数字攻击通过精确的重新确定设备将上述数字攻击转移到物理ARA(PHY-AARA)上,从而使估计的对抗照明条件在现实世界中可再现。我们在两个公共数据集上验证了三种最先进的深FR方法(即面部,街道和符号)的方法。广泛而有见地的结果表明,我们的工作可以产生逼真的对抗性重新贴心的面部图像,轻松地欺骗了fr,从而揭示了特定的光方向和优势的威胁。
translated by 谷歌翻译
深度神经网络容易受到对抗的例子,这可以通过添加微妙的扰动来欺骗深层模型。虽然现有的攻击已经取得了有希望的结果,但它仍然在黑盒设置下留下长途来产生可转移的对抗性示例。为此,本文提出提高对抗示例的可转移性,并将双阶段特征级扰动应用于现有模型,以隐式创建一组不同的模型。然后在迭代期间由纵向集合融合这些模型。该方法被称为双级网络侵蚀(DSNE)。我们对非残留和残余网络进行全面的实验,并获得更多可转移的对抗实例,其计算成本类似于最先进的方法。特别地,对于残余网络,通过将残余块信息偏置到跳过连接,可以显着改善对抗性示例的可转移性。我们的工作为神经网络的建筑脆弱性提供了新的见解,并对神经网络的稳健性带来了新的挑战。
translated by 谷歌翻译
虽然近年来,在2D图像领域的攻击和防御中,许多努力已经探讨了3D模型的脆弱性。现有的3D攻击者通常在点云上执行点明智的扰动,从而导致变形的结构或异常值,这很容易被人类察觉。此外,它们的对抗示例是在白盒设置下产生的,当转移到攻击远程黑匣子型号时经常遭受低成功率。在本文中,我们通过提出一种新的难以察觉的转移攻击(ITA):1)难以察觉的3D点云攻击来自两个新的和具有挑战性的观点:1)难以察觉:沿着邻域表面的正常向量限制每个点的扰动方向,导致产生具有类似几何特性的示例,从而增强了难以察觉。 2)可转移性:我们开发了一个对抗性转变模型,以产生最有害的扭曲,并强制实施对抗性示例来抵抗它,从而提高其对未知黑匣子型号的可转移性。此外,我们建议通过学习更辨别的点云表示来培训更强大的黑盒3D模型来防御此类ITA攻击。广泛的评估表明,我们的ITA攻击比最先进的人更令人无法察觉和可转让,并验证我们的国防战略的优势。
translated by 谷歌翻译
我们介绍了三级管道:调整多样化输入(RDIM),多样性集合(DEM)和区域配件,共同产生可转移的对抗性示例。我们首先探讨现有攻击之间的内部关系,并提出能够利用这种关系的RDIM。然后我们提出DEM,多尺度版本的RDIM,生成多尺度梯度。在前两个步骤之后,我们将价值转换为迭代拟合的区域。 RDIM和区域拟合不需要额外的运行时间,这三个步骤可以充分集成到其他攻击中。我们最好的攻击愚弄了六个黑匣子防御,平均成功率为93%,这均高于最先进的基于梯度的攻击。此外,我们重新思考现有的攻击,而不是简单地堆叠在旧的旧方法上以获得更好的性能。预计我们的调查结果将成为探索攻击方法之间内部关系的开始。代码在https://github.com/278287847/DEM中获得。
translated by 谷歌翻译
对抗攻击使他们的成功取得了“愚弄”DNN等,基于梯度的算法成为一个主流。基于线性假设[12],在$ \ ell_ \ infty $约束下,在梯度上应用于渐变的$符号$操作是生成扰动的良好选择。然而,存在来自这种操作的副作用,因为它导致真实梯度与扰动之间的方向偏差。换句话说,当前方法包含真实梯度和实际噪声之间的间隙,这导致偏置和低效的攻击。因此,在理论上,基于泰勒膨胀,偏差地分析了$ \符号$,即快速梯度非符号法(FGNM)的校正。值得注意的是,FGNM是一般例程,它可以在基于梯度的攻击中无缝地更换传统的$符号$操作,以可忽略的额外计算成本。广泛的实验证明了我们方法的有效性。具体来说,我们的大多数和\ textBF {27.5 \%}平均突出了它们,平均而言。我们的匿名代码是公开可用的:\ url {https://git.io/mm -fgnm}。
translated by 谷歌翻译
人群计数已被广泛用于估计安全至关重要的场景中的人数,被证明很容易受到物理世界中对抗性例子的影响(例如,对抗性斑块)。尽管有害,但对抗性例子也很有价值,对于评估和更好地理解模型的鲁棒性也很有价值。但是,现有的对抗人群计算的对抗性示例生成方法在不同的黑盒模型之间缺乏强大的可传递性,这限制了它们对现实世界系统的实用性。本文提出了与模型不变特征正相关的事实,本文提出了感知的对抗贴片(PAP)生成框架,以使用模型共享的感知功能来定制对对抗性的扰动。具体来说,我们将一种自适应人群密度加权方法手工制作,以捕获各种模型中不变的量表感知特征,并利用密度引导的注意力来捕获模型共享的位置感知。证明它们都可以提高我们对抗斑块的攻击性转移性。广泛的实验表明,我们的PAP可以在数字世界和物理世界中实现最先进的进攻性能,并且以大幅度的优于以前的提案(最多+685.7 MAE和+699.5 MSE)。此外,我们从经验上证明,对我们的PAP进行的对抗训练可以使香草模型的性能受益,以减轻人群计数的几个实际挑战,包括跨数据集的概括(高达-376.0 MAE和-376.0 MAE和-354.9 MSE)和对复杂背景的鲁棒性(上升)至-10.3 MAE和-16.4 MSE)。
translated by 谷歌翻译
Deep learning-based 3D object detectors have made significant progress in recent years and have been deployed in a wide range of applications. It is crucial to understand the robustness of detectors against adversarial attacks when employing detectors in security-critical applications. In this paper, we make the first attempt to conduct a thorough evaluation and analysis of the robustness of 3D detectors under adversarial attacks. Specifically, we first extend three kinds of adversarial attacks to the 3D object detection task to benchmark the robustness of state-of-the-art 3D object detectors against attacks on KITTI and Waymo datasets, subsequently followed by the analysis of the relationship between robustness and properties of detectors. Then, we explore the transferability of cross-model, cross-task, and cross-data attacks. We finally conduct comprehensive experiments of defense for 3D detectors, demonstrating that simple transformations like flipping are of little help in improving robustness when the strategy of transformation imposed on input point cloud data is exposed to attackers. Our findings will facilitate investigations in understanding and defending the adversarial attacks against 3D object detectors to advance this field.
translated by 谷歌翻译
虽然基于深度学习的视频识别模型取得了显着的成功,但它们易于通过在清洁视频样本上添加人难以扰动而产生的对抗性示例。如最近的研究所述,对抗性示例是可转换的,这使得对现实世界应用中的黑匣子攻击是可行的。然而,当攻击其他视频模型和基于转移的视频模型的转移攻击时,大多数现有的对抗性攻击方法具有差的可转移性仍未开发。为此,我们建议促进对视频识别模型的黑匣子攻击的视频逆势示例的可转移性。通过广泛的分析,我们发现不同的视频识别模型依赖于不同的鉴别性时间模式,导致视频逆势示例的可转移性差。这使我们引入了延时翻译攻击方法,该方法优化了一组时间翻译视频剪辑上的对抗扰动。通过在翻译视频中产生对抗性示例,所得到的对手示例对白盒模型中存在的时间模式不太敏感,因此可以更好地转移。在动力学-400数据集和UCF-101数据集上的广泛实验表明,我们的方法可以显着提高视频逆势示例的可转移性。对于对视频识别模型的基于转移的攻击,在UCF-101上实现了动力学-400和48.60%的61.56%的平均攻击成功率。代码可在https://github.com/zhipeng-wei/tt上获得。
translated by 谷歌翻译
Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/\sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
translated by 谷歌翻译
与此同时,黑匣子对抗攻击已经吸引了令人印象深刻的注意,在深度学习安全领域的实际应用,同时,由于无法访问目标模型的网络架构或内部权重,非常具有挑战性。基于假设:如果一个例子对多种型号保持过逆势,那么它更有可能将攻击能力转移到其他模型,基于集合的对抗攻击方法是高效的,用于黑匣子攻击。然而,集合攻击的方式相当不那么调查,并且现有的集合攻击只是均匀地融合所有型号的输出。在这项工作中,我们将迭代集合攻击视为随机梯度下降优化过程,其中不同模型上梯度的变化可能导致众多局部Optima差。为此,我们提出了一种新的攻击方法,称为随机方差减少了整体(SVRE)攻击,这可以降低集合模型的梯度方差,并充分利用集合攻击。标准想象数据集的经验结果表明,所提出的方法可以提高对抗性可转移性,并且优于现有的集合攻击显着。
translated by 谷歌翻译