基于K-Nearest的邻居(KNN)的深度学习方法,由于其简单性和几何解释性,已应用于许多应用。但是,尚未对基于KNN的分类模型的鲁棒性进行彻底探索,而KNN攻击策略欠发达。在本文中,我们提出了对敌对的软knn(询问)损失,以设计更有效的KNN攻击策略,并为他们提供更好的防御能力。我们的问损失方法有两个优势。首先,与以前的作品中提出的目标相比,问问损失可以更好地近似KNN分类错误的可能性。其次,询问损失是可以解释的:它保留了扰动输入和课堂参考数据之间的相互信息。我们使用询问损失来生成一种名为Ask-Attack(Ask-ATK)的新颖攻击方法,该方法显示出相对于先前的KNN攻击,显示出了卓越的攻击效率和准确性降解。然后,基于Ask-ATK,我们得出了一个Ask \ supessline {def} ense(ask-def)方法,该方法优化了Ask-ATK引起的最坏情况训练损失。 CIFAR-10(IMAGENET)上的实验表明,(i)Ask-Atk成就$ \ geq 13 \%$($ \ geq 13 \%$)提高了先前的KNN攻击的攻击成功率,以及(ii)ask-def $ \ geq 6.9 \%$($ \ geq 3.5 \%$)在稳健性改善方面胜过常规的对抗训练方法。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
最近的研究表明,深神经网络(DNN)易受对抗性攻击的影响,包括逃避和后门(中毒)攻击。在防守方面,有密集的努力,改善了对逃避袭击的经验和可怜的稳健性;然而,对后门攻击的可稳健性仍然很大程度上是未开发的。在本文中,我们专注于认证机器学习模型稳健性,反对一般威胁模型,尤其是后门攻击。我们首先通过随机平滑技术提供统一的框架,并展示如何实例化以证明对逃避和后门攻击的鲁棒性。然后,我们提出了第一个强大的培训过程Rab,以平滑训练有素的模型,并证明其稳健性对抗后门攻击。我们派生机学习模型的稳健性突出了培训的机器学习模型,并证明我们的鲁棒性受到紧张。此外,我们表明,可以有效地训练强大的平滑模型,以适用于诸如k最近邻分类器的简单模型,并提出了一种精确的平滑训练算法,该算法消除了从这种模型的噪声分布采样采样的需要。经验上,我们对MNIST,CIFAR-10和Imagenet数据集等DNN,差异私有DNN和K-NN模型等不同机器学习(ML)型号进行了全面的实验,并为反卧系攻击提供认证稳健性的第一个基准。此外,我们在SPAMBase表格数据集上评估K-NN模型,以展示所提出的精确算法的优点。对多元化模型和数据集的综合评价既有关于普通训练时间攻击的进一步强劲学习策略的多样化模型和数据集的综合评价。
translated by 谷歌翻译
引起超越预测的对手示例被广泛用于评估和改善机器学习模型的鲁棒性。然而,目前的研究侧重于监督学习任务,依赖于地面真理数据标签,目标目标或从训练有素的分类器的监督。在本文中,我们提出了一种为无监督模型产生对抗性示例的框架,并证明了数据增强的新应用。我们的框架利用相互信息神经估算器作为信息理论相似度措施,以产生未经监督的对抗示例。我们提出了一种新的MinMax算法,可提供可提供的融合保证,以便有效地产生无监督的对抗性示例。我们的框架也可以扩展到受监督的对抗性示例。在使用无监督的对冲示例作为用于模型再检验的简单插件数据增强工具时,在不同无监督的任务和数据集中一直观察到显着的改进,包括数据重建,表示学习和对比学习。我们的结果表明,通过对抗示例研究和改善无监督机器学习的新方法和相当大的优势。
translated by 谷歌翻译
对抗训练(AT)在防御对抗例子方面表现出色。最近的研究表明,示例对于AT期间模型的最终鲁棒性并不同样重要,即,所谓的硬示例可以攻击容易表现出比对最终鲁棒性的鲁棒示例更大的影响。因此,保证硬示例的鲁棒性对于改善模型的最终鲁棒性至关重要。但是,定义有效的启发式方法来寻找辛苦示例仍然很困难。在本文中,受到信息瓶颈(IB)原则的启发,我们发现了一个具有高度共同信息及其相关的潜在表示的例子,更有可能受到攻击。基于此观察,我们提出了一种新颖有效的对抗训练方法(Infoat)。鼓励Infoat找到具有高相互信息的示例,并有效利用它们以提高模型的最终鲁棒性。实验结果表明,与几种最先进的方法相比,Infoat在不同数据集和模型之间达到了最佳的鲁棒性。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
有必要提高某些特殊班级的表现,或者特别保护它们免受对抗学习的攻击。本文提出了一个将成本敏感分类和对抗性学习结合在一起的框架,以训练可以区分受保护和未受保护的类的模型,以使受保护的类别不太容易受到对抗性示例的影响。在此框架中,我们发现在训练深神经网络(称为Min-Max属性)期间,一个有趣的现象,即卷积层中大多数参数的绝对值。基于这种最小的最大属性,该属性是在随机分布的角度制定和分析的,我们进一步建立了一个针对对抗性示例的新防御模型,以改善对抗性鲁棒性。构建模型的一个优点是,它的性能比标准模型更好,并且可以与对抗性训练相结合,以提高性能。在实验上证实,对于所有类别的平均准确性,我们的模型在没有发生攻击时几乎与现有模型一样,并且在发生攻击时比现有模型更好。具体而言,关于受保护类的准确性,提议的模型比发生攻击时的现有模型要好得多。
translated by 谷歌翻译
对抗性可转移性是一种有趣的性质 - 针对一个模型制作的对抗性扰动也是对另一个模型有效的,而这些模型来自不同的模型家庭或培训过程。为了更好地保护ML系统免受对抗性攻击,提出了几个问题:对抗性转移性的充分条件是什么,以及如何绑定它?有没有办法降低对抗的转移性,以改善合奏ML模型的鲁棒性?为了回答这些问题,在这项工作中,我们首先在理论上分析和概述了模型之间的对抗性可转移的充分条件;然后提出一种实用的算法,以减少集合内基础模型之间的可转换,以提高其鲁棒性。我们的理论分析表明,只有促进基础模型梯度之间的正交性不足以确保低可转移性;与此同时,模型平滑度是控制可转移性的重要因素。我们还在某些条件下提供了对抗性可转移性的下界和上限。灵感来自我们的理论分析,我们提出了一种有效的可转让性,减少了平滑(TRS)集合培训策略,以通过实施基础模型之间的梯度正交性和模型平滑度来培训具有低可转换性的强大集成。我们对TRS进行了广泛的实验,并与6个最先进的集合基线进行比较,防止不同数据集的8个白箱攻击,表明所提出的TRS显着优于所有基线。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
translated by 谷歌翻译
基于深度神经网络(DNN)的智能信息(IOT)系统已被广泛部署在现实世界中。然而,发现DNNS易受对抗性示例的影响,这提高了人们对智能物联网系统的可靠性和安全性的担忧。测试和评估IOT系统的稳健性成为必要和必要。最近已经提出了各种攻击和策略,但效率问题仍未纠正。现有方法是计算地广泛或耗时,这在实践中不适用。在本文中,我们提出了一种称为攻击启发GaN(AI-GaN)的新框架,在有条件地产生对抗性实例。曾经接受过培训,可以有效地给予对抗扰动的输入图像和目标类。我们在白盒设置的不同数据集中应用AI-GaN,黑匣子设置和由最先进的防御保护的目标模型。通过广泛的实验,AI-GaN实现了高攻击成功率,优于现有方法,并显着降低了生成时间。此外,首次,AI-GaN成功地缩放到复杂的数据集。 Cifar-100和Imagenet,所有课程中的成功率约为90美元。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimizationbased attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining noncertified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
最大限度的训练原则,最大限度地减少最大的对抗性损失,也称为对抗性培训(AT),已被证明是一种提高对抗性鲁棒性的最先进的方法。尽管如此,超出了在对抗环境中尚未经过严格探索的最小最大优化。在本文中,我们展示了如何利用多个领域的最小最大优化的一般框架,以推进不同类型的对抗性攻击的设计。特别是,给定一组风险源,最小化最坏情况攻击损失可以通过引入在域集的概率单纯x上最大化的域权重来重新重整为最小最大问题。我们在三次攻击生成问题中展示了这个统一的框架 - 攻击模型集合,在多个输入下设计了通用扰动,并制作攻击对数据转换的弹性。广泛的实验表明,我们的方法导致对现有的启发式策略以及对培训的最先进的防御方法而言,鲁棒性改善,培训对多种扰动类型具有稳健。此外,我们发现,从我们的MIN-MAX框架中学到的自调整域权重可以提供整体工具来解释跨域攻击难度的攻击水平。代码可在https://github.com/wangjksjtu/minmaxsod中获得。
translated by 谷歌翻译
Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to normbounded attacks. However these defenses either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.
translated by 谷歌翻译
对抗性的鲁棒性已经成为深度学习的核心目标,无论是在理论和实践中。然而,成功的方法来改善对抗的鲁棒性(如逆势训练)在不受干扰的数据上大大伤害了泛化性能。这可能会对对抗性鲁棒性如何影响现实世界系统的影响(即,如果它可以提高未受干扰的数据的准确性),许多人可能选择放弃鲁棒性)。我们提出内插对抗培训,该培训最近雇用了在对抗培训框架内基于插值的基于插值的培训方法。在CiFar -10上,对抗性训练增加了标准测试错误(当没有对手时)从4.43%到12.32%,而我们的内插对抗培训我们保留了对抗性的鲁棒性,同时实现了仅6.45%的标准测试误差。通过我们的技术,强大模型标准误差的相对增加从178.1%降至仅为45.5%。此外,我们提供内插对抗性培训的数学分析,以确认其效率,并在鲁棒性和泛化方面展示其优势。
translated by 谷歌翻译