已知DNN容易受到所谓的对抗攻击的攻击,这些攻击操纵输入以引起不正确的结果,这可能对攻击者有益或对受害者造成损害。最近的作品提出了近似计算,作为针对机器学习攻击的防御机制。我们表明,这些方法虽然成功地用于一系列投入,但不足以解决更强大,高信任的对抗性攻击。为了解决这个问题,我们提出了DNNShield,这是一种硬件加速防御,可使响应的强度适应对抗性输入的信心。我们的方法依赖于DNN模型的动态和随机稀疏来有效地实现推理近似值,并通过对近似误差进行细粒度控制。与检测对抗输入相比,DNNShield使用稀疏推理的输出分布特征。当应用于RESNET50时,我们显示出86%的对抗检测率为86%,这超过了最先进的接近状态的检测率,开销较低。我们演示了软件/硬件加速的FPGA原型,该原型降低了DNNShield相对于仅软件CPU和GPU实现的性能影响。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
对图像分类的侵扰贴片攻击攻击图像的深度神经网络(DNN),其在图像的有界区域内注射任意扭曲,可以产生鲁棒(IE在物理世界中的侵犯)和普遍(即,在任何情况下保持对抗的侵犯扰动输入)。这种攻击可能导致现实世界的DNN系统中的严重后果。这项工作提出了jujutsu,一种检测和减轻稳健和普遍的对抗性补丁攻击的技术。对于检测,jujutsu利用攻击“通用属性 - jujutsu首先定位潜在的对抗性补丁区域,然后策略性地将其传送到新图像中的专用区域,以确定它是否真正恶意。对于攻击缓解,jujutsu通过图像修正来利用攻击本地化性质,以在攻击损坏的像素中综合语义内容,并重建“清洁”图像。我们在四个不同的数据集中评估jujutsu(想象成,想象力,celeba和place365),并表明Jujutsu实现了卓越的性能,并且显着优于现有技术。我们发现jujutsu可以进一步防御基本攻击的不同变体,包括1)物理攻击; 2)目标不同课程的攻击; 3)攻击构造不同形状和4)适应攻击的修补程序。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
已知深度学习系统容易受到对抗例子的影响。特别是,基于查询的黑框攻击不需要深入学习模型的知识,而可以通过提交查询和检查收益来计算网络上的对抗示例。最近的工作在很大程度上提高了这些攻击的效率,证明了它们在当今的ML-AS-A-Service平台上的实用性。我们提出了Blacklight,这是针对基于查询的黑盒对抗攻击的新防御。推动我们设计的基本见解是,为了计算对抗性示例,这些攻击在网络上进行了迭代优化,从而在输入空间中产生了非常相似的图像查询。 Blacklight使用在概率内容指纹上运行的有效相似性引擎来检测高度相似的查询来检测基于查询的黑盒攻击。我们根据各种模型和图像分类任务对八次最先进的攻击进行评估。 Blacklight通常只有几次查询后,都可以识别所有这些。通过拒绝所有检测到的查询,即使攻击者在帐户禁令或查询拒绝之后持续提交查询,Blacklight也可以防止任何攻击完成。 Blacklight在几个强大的对策中也很强大,包括最佳的黑盒攻击,该攻击近似于效率的白色框攻击。最后,我们说明了黑光如何推广到其他域,例如文本分类。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
机器学习算法和深度神经网络在几种感知和控制任务中的卓越性能正在推动该行业在安全关键应用中采用这种技术,作为自治机器人和自动驾驶车辆。然而,目前,需要解决几个问题,以使深入学习方法更可靠,可预测,安全,防止对抗性攻击。虽然已经提出了几种方法来提高深度神经网络的可信度,但大多数都是针对特定类的对抗示例量身定制的,因此未能检测到其他角落案件或不安全的输入,这些输入大量偏离训练样本。本文介绍了基于覆盖范式的轻量级监控架构,以增强针对不同不安全输入的模型鲁棒性。特别是,在用于评估多种检测逻辑的架构中提出并测试了四种覆盖分析方法。实验结果表明,该方法有效地检测强大的对抗性示例和分销外输入,引入有限的执行时间和内存要求。
translated by 谷歌翻译
Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10 30 . We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.
translated by 谷歌翻译
Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to normbounded attacks. However these defenses either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.
translated by 谷歌翻译
发言人识别系统(SRSS)最近被证明容易受到对抗攻击的影响,从而引发了重大的安全问题。在这项工作中,我们系统地研究了基于确保SRSS的基于对抗性训练的防御。根据SRSS的特征,我们提出了22种不同的转换,并使用扬声器识别的7种最新有前途的对抗攻击(4个白盒和3个Black-Box)对其进行了彻底评估。仔细考虑了国防评估中的最佳实践,我们分析了转换的强度以承受适应性攻击。我们还评估并理解它们与对抗训练相结合的自适应攻击的有效性。我们的研究提供了许多有用的见解和发现,其中许多与图像和语音识别域中的结论是新的或不一致的,例如,可变和恒定的比特率语音压缩具有不同的性能,并且某些不可差的转换仍然有效地抗衡。当前有希望的逃避技术通常在图像域中很好地工作。我们证明,与完整的白色盒子设置中的唯一对抗性训练相比,提出的新型功能级转换与对抗训练相比是相当有效的,例如,将准确性提高了13.62%,而攻击成本则达到了两个数量级,而其他攻击成本则增加了。转型不一定会提高整体防御能力。这项工作进一步阐明了该领域的研究方向。我们还发布了我们的评估平台SpeakerGuard,以促进进一步的研究。
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
我们提出了HashTAG,这是一种在检测性能上具有可证实范围的深度神经网络(DNN)对故障注射攻击的高精度检测的第一个框架。故障注射攻击中最近的文献显示了尺寸翻转引起的严重DNN精度劣化。在这种情况下,攻击者通过篡改程序的DRAM存储器来在DNN执行期间改变几个权重位。要检测运行时位翻转,HashTag在部署之前从良性DNN中提取唯一签名。签名后来用于验证DNN的完整性,并验证推动输出在速度。我们提出了一种新颖的敏感性分析方案,可准确地将最脆弱的DNN层识别到故障注射攻击。然后通过使用低碰撞散列函数对易受攻击层中的基础重量进行编码来构建DNN签名。部署DNN时,在推理期间从目标层提取新的哈希,并与地面真相签名进行比较。 HASHTAG采用了一种轻量级方法,可确保嵌入式平台上的低开销和实时故障检测。对各种DNN的最先进的位翻转攻击的广泛评估在攻击检测和执行开销方面,展示了HashTAG的竞争优势。
translated by 谷歌翻译
Video compression plays a crucial role in video streaming and classification systems by maximizing the end-user quality of experience (QoE) at a given bandwidth budget. In this paper, we conduct the first systematic study for adversarial attacks on deep learning-based video compression and downstream classification systems. Our attack framework, dubbed RoVISQ, manipulates the Rate-Distortion ($\textit{R}$-$\textit{D}$) relationship of a video compression model to achieve one or both of the following goals: (1) increasing the network bandwidth, (2) degrading the video quality for end-users. We further devise new objectives for targeted and untargeted attacks to a downstream video classification service. Finally, we design an input-invariant perturbation that universally disrupts video compression and classification systems in real time. Unlike previously proposed attacks on video classification, our adversarial perturbations are the first to withstand compression. We empirically show the resilience of RoVISQ attacks against various defenses, i.e., adversarial training, video denoising, and JPEG compression. Our extensive experimental results on various video datasets show RoVISQ attacks deteriorate peak signal-to-noise ratio by up to 5.6dB and the bit-rate by up to $\sim$ 2.4$\times$ while achieving over 90$\%$ attack success rate on a downstream classifier. Our user study further demonstrates the effect of RoVISQ attacks on users' QoE.
translated by 谷歌翻译
普遍的对策扰动是图像不可思议的和模型 - 无关的噪声,当添加到任何图像时可以误导训练的深卷积神经网络进入错误的预测。由于这些普遍的对抗性扰动可以严重危害实践深度学习应用的安全性和完整性,因此现有技术使用额外的神经网络来检测输入图像源的这些噪声的存在。在本文中,我们展示了一种攻击策略,即通过流氓手段激活(例如,恶意软件,木马)可以通过增强AI硬件加速器级的对抗噪声来绕过这些现有对策。我们使用Conv2D功能软件内核的共同仿真和FuseSoC环境下的硬件的Verilog RTL模型的共同仿真,展示了关于几个深度学习模型的加速度普遍对抗噪声。
translated by 谷歌翻译
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to * Pin-Yu Chen and Huan Zhang contribute equally to this work.
translated by 谷歌翻译
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
translated by 谷歌翻译
机器学习模型严重易于来自对抗性示例的逃避攻击。通常,对逆势示例的修改输入类似于原始输入的修改输入,在WhiteBox设置下由对手的WhiteBox设置构成,完全访问模型。然而,最近的攻击已经显示出使用BlackBox攻击的对逆势示例的查询号显着减少。特别是,警报是从越来越多的机器学习提供的经过培训的模型的访问界面中利用分类决定作为包括Google,Microsoft,IBM的服务提供商,并由包含这些模型的多种应用程序使用的服务提供商来利用培训的模型。对手仅利用来自模型的预测标签的能力被区别为基于决策的攻击。在我们的研究中,我们首先深入潜入最近的ICLR和SP的最先进的决策攻击,以突出发现低失真对抗采用梯度估计方法的昂贵性质。我们开发了一种强大的查询高效攻击,能够避免在梯度估计方法中看到的嘈杂渐变中的局部最小和误导中的截留。我们提出的攻击方法,ramboattack利用随机块坐标下降的概念来探索隐藏的分类器歧管,针对扰动来操纵局部输入功能以解决梯度估计方法的问题。重要的是,ramboattack对对对手和目标类别可用的不同样本输入更加强大。总的来说,对于给定的目标类,ramboattack被证明在实现给定查询预算的较低失真时更加强大。我们使用大规模的高分辨率ImageNet数据集来策划我们的广泛结果,并在GitHub上开源我们的攻击,测试样本和伪影。
translated by 谷歌翻译
深度学习的进步使得广泛的有希望的应用程序。然而,这些系统容易受到对抗机器学习(AML)攻击的影响;对他们的意见的离前事实制作的扰动可能导致他们错误分类。若干最先进的对抗性攻击已经证明他们可以可靠地欺骗分类器,使这些攻击成为一个重大威胁。对抗性攻击生成算法主要侧重于创建成功的例子,同时控制噪声幅度和分布,使检测更加困难。这些攻击的潜在假设是脱机产生的对抗噪声,使其执行时间是次要考虑因素。然而,最近,攻击者机会自由地产生对抗性示例的立即对抗攻击已经可能。本文介绍了一个新问题:我们如何在实时约束下产生对抗性噪音,以支持这种实时对抗攻击?了解这一问题提高了我们对这些攻击对实时系统构成的威胁的理解,并为未来防御提供安全评估基准。因此,我们首先进行对抗生成算法的运行时间分析。普遍攻击脱机产生一般攻击,没有在线开销,并且可以应用于任何输入;然而,由于其一般性,他们的成功率是有限的。相比之下,在特定输入上工作的在线算法是计算昂贵的,使它们不适合在时间约束下的操作。因此,我们提出房间,一种新型实时在线脱机攻击施工模型,其中离线组件用于预热在线算法,使得可以在时间限制下产生高度成功的攻击。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective. We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that Mag-Net is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples. CCS CONCEPTS• Security and privacy → Domain-specific security and privacy architectures; • Computing methodologies → Neural networks;
translated by 谷歌翻译