对机器学习模型的逃避攻击通常通过迭代探测固定目标模型成功,从而曾经成功的攻击将反复成功。应对这种威胁的一种有希望的方法是使模型成为对抗输入的行动目标。为此,我们介绍了Morphence-2.0,这是一个由分布外(OOD)检测提供动力的可扩展移动目标防御(MTD),以防止对抗性例子。通过定期移动模型的决策功能,Morphence-2.0使重复或相关攻击成功的挑战变得极大。 Morphence-2.0以基本模型生成的模型池以引入足够随机性的方式对预测查询进行响应。通过OOD检测,Morphence-2.0配备了调度方法,该方法将对抗性示例分配给了强大的决策功能,并将良性样本分配给了未防御的准确模型。为了确保重复或相关的攻击失败,已部署的模型池在达到查询预算后​​自动到期,并且模型池被提前生成的新模型池无缝替换。我们在两个基准图像分类数据集(MNIST和CIFAR10)上评估Morphence-2.0,以4个参考攻击(3个白框和1个黑色框)。 Morphence-2.0始终优于先前的防御能力,同时保留清洁数据的准确性和降低攻击转移性。我们还表明,当由OOD检测提供动力时,Morphence-2.0能够精确地对模型的决策功能进行基于输入的运动,从而导致对对抗和良性查询的预测准确性更高。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with stronger robustness to blackbox attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks (Kurakin et al., 2017c). However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
时间序列数据在许多现实世界中(例如,移动健康)和深神经网络(DNNS)中产生,在解决它们方面已取得了巨大的成功。尽管他们成功了,但对他们对对抗性攻击的稳健性知之甚少。在本文中,我们提出了一个通过统计特征(TSA-STAT)}称为时间序列攻击的新型对抗框架}。为了解决时间序列域的独特挑战,TSA-STAT对时间序列数据的统计特征采取限制来构建对抗性示例。优化的多项式转换用于创建比基于加性扰动的攻击(就成功欺骗DNN而言)更有效的攻击。我们还提供有关构建对抗性示例的统计功能规范的认证界限。我们对各种现实世界基准数据集的实验表明,TSA-STAT在欺骗DNN的时间序列域和改善其稳健性方面的有效性。 TSA-STAT算法的源代码可在https://github.com/tahabelkhouja/time-series-series-attacks-via-statity-features上获得
translated by 谷歌翻译
发言人识别系统(SRSS)最近被证明容易受到对抗攻击的影响,从而引发了重大的安全问题。在这项工作中,我们系统地研究了基于确保SRSS的基于对抗性训练的防御。根据SRSS的特征,我们提出了22种不同的转换,并使用扬声器识别的7种最新有前途的对抗攻击(4个白盒和3个Black-Box)对其进行了彻底评估。仔细考虑了国防评估中的最佳实践,我们分析了转换的强度以承受适应性攻击。我们还评估并理解它们与对抗训练相结合的自适应攻击的有效性。我们的研究提供了许多有用的见解和发现,其中许多与图像和语音识别域中的结论是新的或不一致的,例如,可变和恒定的比特率语音压缩具有不同的性能,并且某些不可差的转换仍然有效地抗衡。当前有希望的逃避技术通常在图像域中很好地工作。我们证明,与完整的白色盒子设置中的唯一对抗性训练相比,提出的新型功能级转换与对抗训练相比是相当有效的,例如,将准确性提高了13.62%,而攻击成本则达到了两个数量级,而其他攻击成本则增加了。转型不一定会提高整体防御能力。这项工作进一步阐明了该领域的研究方向。我们还发布了我们的评估平台SpeakerGuard,以促进进一步的研究。
translated by 谷歌翻译
已知深度学习系统容易受到对抗例子的影响。特别是,基于查询的黑框攻击不需要深入学习模型的知识,而可以通过提交查询和检查收益来计算网络上的对抗示例。最近的工作在很大程度上提高了这些攻击的效率,证明了它们在当今的ML-AS-A-Service平台上的实用性。我们提出了Blacklight,这是针对基于查询的黑盒对抗攻击的新防御。推动我们设计的基本见解是,为了计算对抗性示例,这些攻击在网络上进行了迭代优化,从而在输入空间中产生了非常相似的图像查询。 Blacklight使用在概率内容指纹上运行的有效相似性引擎来检测高度相似的查询来检测基于查询的黑盒攻击。我们根据各种模型和图像分类任务对八次最先进的攻击进行评估。 Blacklight通常只有几次查询后,都可以识别所有这些。通过拒绝所有检测到的查询,即使攻击者在帐户禁令或查询拒绝之后持续提交查询,Blacklight也可以防止任何攻击完成。 Blacklight在几个强大的对策中也很强大,包括最佳的黑盒攻击,该攻击近似于效率的白色框攻击。最后,我们说明了黑光如何推广到其他域,例如文本分类。
translated by 谷歌翻译
Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
translated by 谷歌翻译
基于深度神经网络(DNN)的智能信息(IOT)系统已被广泛部署在现实世界中。然而,发现DNNS易受对抗性示例的影响,这提高了人们对智能物联网系统的可靠性和安全性的担忧。测试和评估IOT系统的稳健性成为必要和必要。最近已经提出了各种攻击和策略,但效率问题仍未纠正。现有方法是计算地广泛或耗时,这在实践中不适用。在本文中,我们提出了一种称为攻击启发GaN(AI-GaN)的新框架,在有条件地产生对抗性实例。曾经接受过培训,可以有效地给予对抗扰动的输入图像和目标类。我们在白盒设置的不同数据集中应用AI-GaN,黑匣子设置和由最先进的防御保护的目标模型。通过广泛的实验,AI-GaN实现了高攻击成功率,优于现有方法,并显着降低了生成时间。此外,首次,AI-GaN成功地缩放到复杂的数据集。 Cifar-100和Imagenet,所有课程中的成功率约为90美元。
translated by 谷歌翻译
作为研究界,我们仍然缺乏对对抗性稳健性的进展的系统理解,这通常使得难以识别训练强大模型中最有前途的想法。基准稳健性的关键挑战是,其评估往往是出错的导致鲁棒性高估。我们的目标是建立对抗性稳健性的标准化基准,尽可能准确地反映出考虑在合理的计算预算范围内所考虑的模型的稳健性。为此,我们首先考虑图像分类任务并在允许的型号上引入限制(可能在将来宽松)。我们评估了与AutoAtrack的对抗鲁棒性,白和黑箱攻击的集合,最近在大规模研究中显示,与原始出版物相比,改善了几乎所有稳健性评估。为防止对自动攻击进行新防御的过度适应,我们欢迎基于自适应攻击的外部评估,特别是在自动攻击稳健性潜在高估的地方。我们的排行榜,托管在https://robustbench.github.io/,包含120多个模型的评估,并旨在反映在$ \ ell_ \ infty $的一套明确的任务上的图像分类中的当前状态 - 和$ \ ell_2 $ -Threat模型和共同腐败,未来可能的扩展。此外,我们开源源是图书馆https://github.com/robustbench/robustbench,可以提供对80多个强大模型的统一访问,以方便他们的下游应用程序。最后,根据收集的模型,我们分析了稳健性对分布换档,校准,分配检测,公平性,隐私泄漏,平滑度和可转移性的影响。
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
Recent increases in the computational demands of deep neural networks (DNNs) have sparked interest in efficient deep learning mechanisms, e.g., quantization or pruning. These mechanisms enable the construction of a small, efficient version of commercial-scale models with comparable accuracy, accelerating their deployment to resource-constrained devices. In this paper, we study the security considerations of publishing on-device variants of large-scale models. We first show that an adversary can exploit on-device models to make attacking the large models easier. In evaluations across 19 DNNs, by exploiting the published on-device models as a transfer prior, the adversarial vulnerability of the original commercial-scale models increases by up to 100x. We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase. Based on the insights, we propose a defense, $similarity$-$unpairing$, that fine-tunes on-device models with the objective of reducing the similarity. We evaluated our defense on all the 19 DNNs and found that it reduces the transferability up to 90% and the number of queries required by a factor of 10-100x. Our results suggest that further research is needed on the security (or even privacy) threats caused by publishing those efficient siblings.
translated by 谷歌翻译
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
translated by 谷歌翻译
最近对机器学习(ML)模型的攻击,例如逃避攻击,具有对抗性示例,并通过提取攻击窃取了一些模型,构成了几种安全性和隐私威胁。先前的工作建议使用对抗性训练从对抗性示例中保护模型,以逃避模型的分类并恶化其性能。但是,这种保护技术会影响模型的决策边界及其预测概率,因此可能会增加模型隐私风险。实际上,仅使用对模型预测输出的查询访问的恶意用户可以提取它并获得高智能和高保真替代模型。为了更大的提取,这些攻击利用了受害者模型的预测概率。实际上,所有先前关于提取攻击的工作都没有考虑到出于安全目的的培训过程中的变化。在本文中,我们提出了一个框架,以评估具有视觉数据集对对抗训练的模型的提取攻击。据我们所知,我们的工作是第一个进行此类评估的工作。通过一项广泛的实证研究,我们证明了受对抗训练的模型比在自然训练情况下获得的模型更容易受到提取攻击的影响。他们可以达到高达$ \ times1.2 $更高的准确性和同意,而疑问低于$ \ times0.75 $。我们还发现,与从自然训练的(即标准)模型中提取的DNN相比,从鲁棒模型中提取的对抗性鲁棒性能力可通过提取攻击(即从鲁棒模型提取的深神经网络(DNN)提取的深神网络(DNN))传递。
translated by 谷歌翻译
虽然深度神经网络在各种任务中表现出前所未有的性能,但对对抗性示例的脆弱性阻碍了他们在安全关键系统中的部署。许多研究表明,即使在黑盒设置中也可能攻击,其中攻击者无法访问目标模型的内部信息。大多数黑匣子攻击基于查询,每个都可以获得目标模型的输入输出,并且许多研究侧重于减少所需查询的数量。在本文中,我们注意了目标模型的输出完全对应于查询输入的隐含假设。如果将某些随机性引入模型中,它可以打破假设,因此,基于查询的攻击可能在梯度估计和本地搜索中具有巨大的困难,这是其攻击过程的核心。从这种动机来看,我们甚至观察到一个小的添加剂输入噪声可以中和大多数基于查询的攻击和名称这个简单但有效的方法小噪声防御(SND)。我们分析了SND如何防御基于查询的黑匣子攻击,并展示其与CIFAR-10和ImageNet数据集的八种最先进的攻击有效性。即使具有强大的防御能力,SND几乎保持了原始的分类准确性和计算速度。通过在推断下仅添加一行代码,SND很容易适用于预先训练的模型。
translated by 谷歌翻译
深度卷积神经网络(CNN)很容易被输入图像的细微,不可察觉的变化所欺骗。为了解决此漏洞,对抗训练会创建扰动模式,并将其包括在培训设置中以鲁棒性化模型。与仅使用阶级有限信息的现有对抗训练方法(例如,使用交叉渗透损失)相反,我们建议利用功能空间中的其他信息来促进更强的对手,这些信息又用于学习强大的模型。具体来说,我们将使用另一类的目标样本的样式和内容信息以及其班级边界信息来创建对抗性扰动。我们以深入监督的方式应用了我们提出的多任务目标,从而提取了多尺度特征知识,以创建最大程度地分开对手。随后,我们提出了一种最大边缘对抗训练方法,该方法可最大程度地减少源图像与其对手之间的距离,并最大程度地提高对手和目标图像之间的距离。与最先进的防御能力相比,我们的对抗训练方法表明了强大的鲁棒性,可以很好地推广到自然发生的损坏和数据分配变化,并保留了清洁示例的模型准确性。
translated by 谷歌翻译
在模型提取攻击中,对手可以通过反复查询并根据获得的预测来窃取通过公共API暴露的机器学习模型。为了防止模型窃取,现有的防御措施专注于检测恶意查询,截断或扭曲输出,因此必然会为合法用户引入鲁棒性和模型实用程序之间的权衡。取而代之的是,我们建议通过要求用户在阅读模型的预测之前完成工作证明来阻碍模型提取。这可以通过大大增加(甚至高达100倍)来阻止攻击者,以利用查询访问模型提取所需的计算工作。由于我们校准完成每个查询的工作证明所需的努力,因此这仅为常规用户(最多2倍)引入一个轻微的开销。为了实现这一目标,我们的校准应用了来自差异隐私的工具来衡量查询揭示的信息。我们的方法不需要对受害者模型进行任何修改,可以通过机器学习从业人员来应用其公开暴露的模型免于轻易被盗。
translated by 谷歌翻译
积极学习是一种降低标签成本以构建高质量机器学习模型的既定技术。主动学习的核心组件是确定应选择哪些数据来注释的采集功能。最先进的采集功能 - 更重要的是主动学习技术 - 已经旨在最大限度地提高清洁性能(例如,准确性)并忽视了鲁棒性,这是一种受到越来越受关注的重要品质。因此,主动学习产生准确但不强大的模型。在本文中,我们提出了一种积极的学习过程,集成了对抗性培训的积极学习过程 - 最熟悉的制作强大模型的方法。通过对11个采集函数的实证研究,4个数据集,6个DNN架构和15105培训的DNN,我们表明,强大的主动学习可以产生具有鲁棒性的模型(对抗性示例的准确性),范围从2.35 \%到63.85 \%,而标准主动学习系统地实现了可忽略不计的鲁棒性(小于0.20 \%)。然而,我们的研究还揭示了在稳健性方面,在准确性上表现良好的采集功能比随机抽样更糟糕。因此,我们检查了它背后的原因,并设计了一个新的采购功能,这些功能既可定位清洁的性能和鲁棒性。我们的采集功能 - 基于熵(DRE)的基于密度的鲁棒采样 - 优于鲁棒性的其他采集功能(包括随机),最高可达24.40 \%(特别是3.84 \%),同时仍然存在竞争力准确性。此外,我们证明了DRE适用于测试选择度量,用于模型再培训,并从所有比较功能中脱颖而出,高达8.21%的鲁棒性。
translated by 谷歌翻译