Deep Neural Networks (DNN) are becoming increasingly more important in assisted and automated driving. Using such entities which are obtained using machine learning is inevitable: tasks such as recognizing traffic signs cannot be developed reasonably using traditional software development methods. DNN however do have the problem that they are mostly black boxes and therefore hard to understand and debug. One particular problem is that they are prone to hidden backdoors. This means that the DNN misclassifies its input, because it considers properties that should not be decisive for the output. Backdoors may either be introduced by malicious attackers or by inappropriate training. In any case, detecting and removing them is important in the automotive area, as they might lead to safety violations with potentially severe consequences. In this paper, we introduce a novel method to remove backdoors. Our method works for both intentional as well as unintentional backdoors. We also do not require prior knowledge about the shape or distribution of backdoors. Experimental evidence shows that our method performs well on several medium-sized examples.
translated by 谷歌翻译
在现实世界应用中的深度神经网络(DNN)的成功受益于丰富的预训练模型。然而,回溯预训练模型可以对下游DNN的部署构成显着的特洛伊木马威胁。现有的DNN测试方法主要旨在在对抗性设置中找到错误的角壳行为,但未能发现由强大的木马攻击所制作的后门。观察特洛伊木马网络行为表明,它们不仅由先前的工作所提出的单一受损神经元反射,而且归因于在多个神经元的激活强度和频率中的关键神经路径。这项工作制定了DNN后门测试,并提出了录音机框架。通过少量良性示例的关键神经元的差异模糊,我们识别特洛伊木马路径,特别是临界人,并通过模拟所识别的路径中的关键神经元来产生后门测试示例。广泛的实验表明了追索者的优越性,比现有方法更高的检测性能。通过隐秘的混合和自适应攻击来检测到后门的录音机更好,现有方法无法检测到。此外,我们的实验表明,录音所可能会揭示模型动物园中的模型的潜在潜在的背面。
translated by 谷歌翻译
Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
典型的深神经网络(DNN)后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中,我们提出了一个新的后门触发器,该扳机易于生成,不可察觉和高效。新的触发器是一个均匀生成的三维(3D)二进制图案,可以水平和/或垂直重复和镜像,并将其超级贴在三通道图像上,以训练后式DNN模型。新型触发器分散在整个图像中,对单个像素产生微弱的扰动,但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明,随着图像的分辨率提高,触发因素越来越有效。实验是使用MNIST,CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面,新触发的表现优于现有的触发器,例如Badnet,Trojaned NN和隐藏的后门。新的触发因素达到了几乎100%的攻击成功率,仅将分类准确性降低了不到0.7%-2.4%,并使最新的防御技术无效。
translated by 谷歌翻译
We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities. We find that natural backdoors are widely existing, with most injected backdoor attacks having natural correspondences. We categorize these natural backdoors and propose a general detection framework. It finds 315 natural backdoors in the 56 normally trained models downloaded from the Internet, covering all the different categories, while existing scanners designed for injected backdoors can at most detect 65 backdoors. We also study the root causes and defense of natural backdoors.
translated by 谷歌翻译
后门攻击已被证明是对深度学习模型的严重安全威胁,并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是,在许多情况下,尤其是对于高级后门攻击,违反了上述观察结果,阻碍了现有防御的性能和适用性。在本文中,我们提出了基于新观察的后门防御范围。也就是说,有效的后门攻击通常需要对中毒训练样本的高预测置信度,以确保训练有素的模型具有很高的可能性。基于此观察结果,Dtinspector首先学习一个可以改变最高信心数据的预测的补丁,然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击,四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。
translated by 谷歌翻译
最近的研究表明,尽管在许多现实世界应用上达到了很高的精度,但深度神经网络(DNN)可以被换式:通过将触发的数据样本注入培训数据集中,对手可以将受过训练的模型误导到将任何测试数据分类为将任何测试数据分类为只要提出触发模式,目标类。为了消除此类后门威胁,已经提出了各种方法。特别是,一系列研究旨在净化潜在的损害模型。但是,这项工作的一个主要限制是访问足够的原始培训数据的要求:当可用的培训数据受到限制时,净化性能要差得多。在这项工作中,我们提出了对抗重量掩蔽(AWM),这是一种即使在单一设置中也能擦除神经后门的新颖方法。我们方法背后的关键思想是将其提出为最小最大优化问题:首先,对抗恢复触发模式,然后(软)掩盖对恢复模式敏感的网络权重。对几个基准数据集的全面评估表明,AWM在很大程度上可以改善对各种可用培训数据集大小的其他最先进方法的纯化效果。
translated by 谷歌翻译
本文提出了针对回顾性神经网络(Badnets)的新型两级防御(NNOCULICULE),该案例在响应该字段中遇到的回溯测试输入,修复了预部署和在线的BADNET。在预部署阶段,NNICULICULE与清洁验证输入的随机扰动进行检测,以部分减少后门的对抗影响。部署后,NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即,它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序,产生我们的最终防御。关于全面的后门攻击套件的实证评估表明,NNOCLICULE优于所有最先进的防御,以制定限制性假设,并且仅在特定的后门攻击上工作,或者在适应性攻击中失败。相比之下,NNICULICULE使得最小的假设并提供有效的防御,即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。
translated by 谷歌翻译
在对抗机器学习中,防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下,法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充,并为缓解措施提供前进的途径,以防止将来采取类似的攻击。在本文中,我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案,该解决方案修剪了“无辜”训练样本,直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响,然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签(后门)毒物攻击和三种类型的清洁标签毒药攻击的功效,这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4%的精度和96.8%的召回。我们还表明,我们的系统与专门攻击它的四种抗纤维法措施相对强大。
translated by 谷歌翻译
视觉变压器(VITS)具有与卷积神经网络相比,具有较小的感应偏置的根本不同的结构。随着绩效的提高,VIT的安全性和鲁棒性也非常重要。与许多最近利用VIT反对对抗性例子的鲁棒性的作品相反,本文调查了代表性的病因攻击,即后门。我们首先检查了VIT对各种后门攻击的脆弱性,发现VIT也很容易受到现有攻击的影响。但是,我们观察到,VIT的清洁数据准确性和后门攻击成功率在位置编码之前对补丁转换做出了明显的反应。然后,根据这一发现,我们为VIT提出了一种通过补丁处理来捍卫基于补丁的触发后门攻击的有效方法。在包括CIFAR10,GTSRB和Tinyimagenet在内的几个基准数据集上评估了这些表演,这些数据表明,该拟议的新颖防御在减轻VIT的后门攻击方面非常成功。据我们所知,本文提出了第一个防御性策略,该策略利用了反对后门攻击的VIT的独特特征。
translated by 谷歌翻译
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
translated by 谷歌翻译
Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.
translated by 谷歌翻译
已知深层神经网络(DNN)容易受到后门攻击和对抗攻击的影响。在文献中,这两种攻击通常被视为明显的问题并分别解决,因为它们分别属于训练时间和推理时间攻击。但是,在本文中,我们发现它们之间有一个有趣的联系:对于具有后门种植的模型,我们观察到其对抗性示例具有与触发样品相似的行为,即都激活了同一DNN神经元的子集。这表明将后门种植到模型中会严重影响模型的对抗性例子。基于这一观察结果,我们设计了一种新的对抗性微调(AFT)算法,以防止后门攻击。我们从经验上表明,在5次最先进的后门攻击中,我们的船尾可以有效地擦除后门触发器,而无需在干净的样品上明显的性能降解,并显着优于现有的防御方法。
translated by 谷歌翻译
最近,后门攻击已成为对深神经网络(DNN)模型安全性的新兴威胁。迄今为止,大多数现有研究都集中于对未压缩模型的后门攻击。尽管在实际应用中广泛使用的压缩DNN的脆弱性尚未得到利用。在本文中,我们建议研究和发展针对紧凑型DNN模型(RIBAC)的强大和不可感知的后门攻击。通过对重要设计旋钮进行系统分析和探索,我们提出了一个框架,该框架可以有效地学习适当的触发模式,模型参数和修剪口罩。从而同时达到高触发隐形性,高攻击成功率和高模型效率。跨不同数据集的广泛评估,包括针对最先进的防御机制的测试,证明了RIBAC的高鲁棒性,隐身性和模型效率。代码可从https://github.com/huyvnphan/eccv2022-ribac获得
translated by 谷歌翻译
普遍的后门是由动态和普遍的输入扰动触发的。它们可以被攻击者故意注射,也可以自然存在于经过正常训练的模型中。它们的性质与传统的静态和局部后门不同,可以通过扰动带有一些固定图案的小输入区域来触发,例如带有纯色的贴片。现有的防御技术对于传统后门非常有效。但是,它们可能对普遍的后门无法正常工作,尤其是在后门去除和模型硬化方面。在本文中,我们提出了一种针对普遍的后门,包括天然和注射后门的新型模型硬化技术。我们基于通过特殊转换层增强的编码器架构来开发一般的普遍攻击。该攻击可以对现有的普遍后门攻击进行建模,并通过类距离进行量化。因此,使用我们在对抗训练中攻击的样品可以使模型与这些后门漏洞相比。我们对9个具有15个模型结构的9个数据集的评估表明,我们的技术可以平均扩大阶级距离59.65%,精度降解且没有稳健性损失,超过了五种硬化技术,例如对抗性训练,普遍的对抗训练,Moth,Moth等, 。它可以将六次普遍后门攻击的攻击成功率从99.06%降低到1.94%,超过七种最先进的后门拆除技术。
translated by 谷歌翻译
后门攻击已成为深度神经网络(DNN)的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果,但仍然尚不清楚是否可以设计强大的培训方法,以防止后门触发器首先注入训练的模型。在本文中,我们介绍了\ emph {反后门学习}的概念,旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看,我们确定了两个后门攻击的固有特征,因为他们的弱点2)后门任务与特定类(后门目标类)相关联。根据这两个弱点,我们提出了一般学习计划,反后门学习(ABL),在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制,帮助分离早期训练阶段的后台示例,2)在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验,针对10个最先进的攻击,我们经验证明,后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
现代自动驾驶汽车采用最先进的DNN模型来解释传感器数据并感知环境。但是,DNN模型容易受到不同类型的对抗攻击的影响,这对车辆和乘客的安全性和安全性构成了重大风险。一个突出的威胁是后门攻击,对手可以通过中毒训练样本来妥协DNN模型。尽管已经大量精力致力于调查后门攻击对传统的计算机视觉任务,但很少探索其对自主驾驶场景的实用性和适用性,尤其是在物理世界中。在本文中,我们针对车道检测系统,该系统是许多自动驾驶任务,例如导航,车道切换的必不可少的模块。我们设计并实现了对此类系统的第一次物理后门攻击。我们的攻击是针对不同类型的车道检测算法的全面有效的。具体而言,我们引入了两种攻击方法(毒药和清洁量)来生成中毒样本。使用这些样品,训练有素的车道检测模型将被后门感染,并且可以通过公共物体(例如,交通锥)进行启动,以进行错误的检测,导致车辆从道路上或在相反的车道上行驶。对公共数据集和物理自动驾驶汽车的广泛评估表明,我们的后门攻击对各种防御解决方案都是有效,隐秘和强大的。我们的代码和实验视频可以在https://sites.google.com/view/lane-detection-attack/lda中找到。
translated by 谷歌翻译
最近的作品表明,深度学习模型容易受到后门中毒攻击的影响,在这些攻击中,这些攻击灌输了与外部触发模式或物体(例如贴纸,太阳镜等)的虚假相关性。我们发现这种外部触发信号是不必要的,因为可以使用基于旋转的图像转换轻松插入高效的后门。我们的方法通过旋转有限数量的对象并将其标记错误来构建中毒数据集;一旦接受过培训,受害者的模型将在运行时间推理期间做出不良的预测。它表现出明显的攻击成功率,同时通过有关图像分类和对象检测任务的全面实证研究来保持清洁绩效。此外,我们评估了标准数据增强技术和针对我们的攻击的四种不同的后门防御措施,发现它们都无法作为一致的缓解方法。正如我们在图像分类和对象检测应用程序中所示,我们的攻击只能在现实世界中轻松部署在现实世界中。总体而言,我们的工作突出了一个新的,简单的,物理上可实现的,高效的矢量,用于后门攻击。我们的视频演示可在https://youtu.be/6jif8wnx34m上找到。
translated by 谷歌翻译