Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.
translated by 谷歌翻译
Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
最近的研究表明,深层神经网络容易受到不同类型的攻击,例如对抗性攻击,数据中毒攻击和后门攻击。其中,后门攻击是最狡猾的攻击,几乎可以在深度学习管道的每个阶段发生。因此,后门攻击吸引了学术界和行业的许多兴趣。但是,大多数现有的后门攻击方法对于某些轻松的预处理(例如常见数据转换)都是可见的或脆弱的。为了解决这些限制,我们提出了一种强大而无形的后门攻击,称为“毒药”。具体而言,我们首先利用图像结构作为目标中毒区域,并用毒药(信息)填充它们以生成触发图案。由于图像结构可以在数据转换期间保持其语义含义,因此这种触发模式对数据转换本质上是强大的。然后,我们利用深度注射网络将这种触发模式嵌入封面图像中,以达到隐身性。与现有流行的后门攻击方法相比,毒药的墨水在隐形和健壮性方面都优于表现。通过广泛的实验,我们证明了毒药不仅是不同数据集和网络体系结构的一般性,而且对于不同的攻击场景也很灵活。此外,它对许多最先进的防御技术也具有非常强烈的抵抗力。
translated by 谷歌翻译
最近,后门攻击已成为对深神经网络(DNN)模型安全性的新兴威胁。迄今为止,大多数现有研究都集中于对未压缩模型的后门攻击。尽管在实际应用中广泛使用的压缩DNN的脆弱性尚未得到利用。在本文中,我们建议研究和发展针对紧凑型DNN模型(RIBAC)的强大和不可感知的后门攻击。通过对重要设计旋钮进行系统分析和探索,我们提出了一个框架,该框架可以有效地学习适当的触发模式,模型参数和修剪口罩。从而同时达到高触发隐形性,高攻击成功率和高模型效率。跨不同数据集的广泛评估,包括针对最先进的防御机制的测试,证明了RIBAC的高鲁棒性,隐身性和模型效率。代码可从https://github.com/huyvnphan/eccv2022-ribac获得
translated by 谷歌翻译
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
translated by 谷歌翻译
Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
AI安全社区的一个主要目标是为现实世界应用安全可靠地生产和部署深入学习模型。为此,近年来,在生产阶段(或培训阶段)和相应的防御中,基于数据中毒基于深度神经网络(DNN)的后门攻击以及相应的防御。具有讽刺意味的是,部署阶段的后门攻击,这些攻击通常可以在不专业用户的设备中发生,因此可以说是在现实世界的情景中威胁要威胁,得以更少的关注社区。我们将这种警惕的不平衡归因于现有部署阶段后门攻击算法的弱实用性以及现实世界攻击示范的不足。为了填补空白,在这项工作中,我们研究了对DNN的部署阶段后门攻击的现实威胁。我们基于普通使用的部署阶段攻击范式 - 对抗对抗权重攻击的研究,主体选择性地修改模型权重,以将后台嵌入到部署的DNN中。为了实现现实的实用性,我们提出了第一款灰度盒和物理可实现的重量攻击算法,即替换注射,即子网替换攻击(SRA),只需要受害者模型的架构信息,并且可以支持现实世界中的物理触发器。进行了广泛的实验模拟和系统级真实的世界攻击示范。我们的结果不仅提出了所提出的攻击算法的有效性和实用性,还揭示了一种新型计算机病毒的实际风险,这些计算机病毒可能会广泛传播和悄悄地将后门注入用户设备中的DNN模型。通过我们的研究,我们要求更多地关注DNN在部署阶段的脆弱性。
translated by 谷歌翻译
后门攻击已成为深度神经网络(DNN)的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果,但仍然尚不清楚是否可以设计强大的培训方法,以防止后门触发器首先注入训练的模型。在本文中,我们介绍了\ emph {反后门学习}的概念,旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看,我们确定了两个后门攻击的固有特征,因为他们的弱点2)后门任务与特定类(后门目标类)相关联。根据这两个弱点,我们提出了一般学习计划,反后门学习(ABL),在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制,帮助分离早期训练阶段的后台示例,2)在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验,针对10个最先进的攻击,我们经验证明,后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。
translated by 谷歌翻译
最近的研究表明,深度神经网络(DNN)容易受到后门攻击的影响,后门攻击会导致DNN的恶意行为,当时特定的触发器附在输入图像上时。进一步证明,感染的DNN具有一系列通道,与正常通道相比,该通道对后门触发器更敏感。然后,将这些通道修剪可有效缓解后门行为。要定位这些通道,自然要考虑其Lipschitzness,这可以衡量他们对输入上最严重的扰动的敏感性。在这项工作中,我们介绍了一个名为Channel Lipschitz常数(CLC)的新颖概念,该概念定义为从输入图像到每个通道输出的映射的Lipschitz常数。然后,我们提供经验证据,以显示CLC(UCLC)上限与通道激活的触发激活变化之间的强相关性。由于可以从重量矩阵直接计算UCLC,因此我们可以以无数据的方式检测潜在的后门通道,并在感染的DNN上进行简单修剪以修复模型。提出的基于lipschitzness的通道修剪(CLP)方法非常快速,简单,无数据且可靠,可以选择修剪阈值。进行了广泛的实验来评估CLP的效率和有效性,CLP的效率和有效性也可以在主流防御方法中获得最新的结果。源代码可在https://github.com/rkteddy/channel-lipschitzness基于普通范围内获得。
translated by 谷歌翻译
在对抗机器学习中,防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下,法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充,并为缓解措施提供前进的途径,以防止将来采取类似的攻击。在本文中,我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案,该解决方案修剪了“无辜”训练样本,直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响,然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签(后门)毒物攻击和三种类型的清洁标签毒药攻击的功效,这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4%的精度和96.8%的召回。我们还表明,我们的系统与专门攻击它的四种抗纤维法措施相对强大。
translated by 谷歌翻译
后门攻击已被证明是对深度学习模型的严重安全威胁,并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是,在许多情况下,尤其是对于高级后门攻击,违反了上述观察结果,阻碍了现有防御的性能和适用性。在本文中,我们提出了基于新观察的后门防御范围。也就是说,有效的后门攻击通常需要对中毒训练样本的高预测置信度,以确保训练有素的模型具有很高的可能性。基于此观察结果,Dtinspector首先学习一个可以改变最高信心数据的预测的补丁,然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击,四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。
translated by 谷歌翻译
典型的深神经网络(DNN)后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中,我们提出了一个新的后门触发器,该扳机易于生成,不可察觉和高效。新的触发器是一个均匀生成的三维(3D)二进制图案,可以水平和/或垂直重复和镜像,并将其超级贴在三通道图像上,以训练后式DNN模型。新型触发器分散在整个图像中,对单个像素产生微弱的扰动,但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明,随着图像的分辨率提高,触发因素越来越有效。实验是使用MNIST,CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面,新触发的表现优于现有的触发器,例如Badnet,Trojaned NN和隐藏的后门。新的触发因素达到了几乎100%的攻击成功率,仅将分类准确性降低了不到0.7%-2.4%,并使最新的防御技术无效。
translated by 谷歌翻译
特洛伊木马后门是针对神经网络(NN)分类器的中毒攻击,对手试图利用(高度理想的)模型重用属性将特洛伊木马植入模型参数中,以通过中毒训练过程进行后门漏洞。大多数针对特洛伊木马攻击的防御措施都假设了白盒设置,其中防守者可以访问NN的内部状态,或者能够通过它进行后传播。在这项工作中,我们提出了一个更实用的黑盒防御,称为Trojdef,只能在NN上进行前进。 Trojdef试图通过监视输入因随机噪声反复扰动预测置信度的变化来识别和滤除特洛伊木马输入(即用Trojan触发器增强的输入)。我们根据预测输出得出一个函数,该函数称为预测置信度,以决定输入示例是否为特洛伊木马。直觉是,由于错误分类仅取决于触发因素,因此特洛伊木马的输入更加稳定,而由于分类特征的扰动,良性输入会受到损失。通过数学分析,我们表明,如果攻击者在注入后门时是完美的,则将训练特洛伊木马感染的模型以学习适当的预测置信度结合,该模型用于区分特洛伊木马和良性输入,并在任意扰动下。但是,由于攻击者在注入后门时可能不是完美的,因此我们将非线性转换引入了预测置信度,以提高实际环境中的检测准确性。广泛的经验评估表明,即使分类器体系结构,培训过程或超参数变化,Trojdef的表现明显优于州的防御能力,并且在不同的设置下也很稳定。
translated by 谷歌翻译
深度神经网络(DNNS)在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常起作用,但是当输入中的某些模式触发时,会产生预定义的目标标签。现有防御通常依赖于通用后门设置的假设,其中有毒样品共享相同的均匀扳机。但是,最近的高级后门攻击表明,这种假设在动态后门中不再有效,在动态后门中,触发者因输入而异,从而击败了现有的防御。在这项工作中,我们提出了一种新颖的技术BEATRIX(通过革兰氏矩阵检测)。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性,还可以捕获表示形式的适当高阶信息。通过从正常样本的激活模式中学习类条件统计,BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能,BEATRIX利用基于内核的测试,而无需对表示分布进行任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明,我们的方法在检测动态后门时达到了91.1%的F1得分,而最新技术只能达到36.9%。
translated by 谷歌翻译
在现实世界应用中的深度神经网络(DNN)的成功受益于丰富的预训练模型。然而,回溯预训练模型可以对下游DNN的部署构成显着的特洛伊木马威胁。现有的DNN测试方法主要旨在在对抗性设置中找到错误的角壳行为,但未能发现由强大的木马攻击所制作的后门。观察特洛伊木马网络行为表明,它们不仅由先前的工作所提出的单一受损神经元反射,而且归因于在多个神经元的激活强度和频率中的关键神经路径。这项工作制定了DNN后门测试,并提出了录音机框架。通过少量良性示例的关键神经元的差异模糊,我们识别特洛伊木马路径,特别是临界人,并通过模拟所识别的路径中的关键神经元来产生后门测试示例。广泛的实验表明了追索者的优越性,比现有方法更高的检测性能。通过隐秘的混合和自适应攻击来检测到后门的录音机更好,现有方法无法检测到。此外,我们的实验表明,录音所可能会揭示模型动物园中的模型的潜在潜在的背面。
translated by 谷歌翻译
现代自动驾驶汽车采用最先进的DNN模型来解释传感器数据并感知环境。但是,DNN模型容易受到不同类型的对抗攻击的影响,这对车辆和乘客的安全性和安全性构成了重大风险。一个突出的威胁是后门攻击,对手可以通过中毒训练样本来妥协DNN模型。尽管已经大量精力致力于调查后门攻击对传统的计算机视觉任务,但很少探索其对自主驾驶场景的实用性和适用性,尤其是在物理世界中。在本文中,我们针对车道检测系统,该系统是许多自动驾驶任务,例如导航,车道切换的必不可少的模块。我们设计并实现了对此类系统的第一次物理后门攻击。我们的攻击是针对不同类型的车道检测算法的全面有效的。具体而言,我们引入了两种攻击方法(毒药和清洁量)来生成中毒样本。使用这些样品,训练有素的车道检测模型将被后门感染,并且可以通过公共物体(例如,交通锥)进行启动,以进行错误的检测,导致车辆从道路上或在相反的车道上行驶。对公共数据集和物理自动驾驶汽车的广泛评估表明,我们的后门攻击对各种防御解决方案都是有效,隐秘和强大的。我们的代码和实验视频可以在https://sites.google.com/view/lane-detection-attack/lda中找到。
translated by 谷歌翻译
Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motion-based triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -- models trained under Flareon learn powerful target-conditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community: https://github.com/lafeat/flareon.
translated by 谷歌翻译
本文提出了针对回顾性神经网络(Badnets)的新型两级防御(NNOCULICULE),该案例在响应该字段中遇到的回溯测试输入,修复了预部署和在线的BADNET。在预部署阶段,NNICULICULE与清洁验证输入的随机扰动进行检测,以部分减少后门的对抗影响。部署后,NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即,它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序,产生我们的最终防御。关于全面的后门攻击套件的实证评估表明,NNOCLICULE优于所有最先进的防御,以制定限制性假设,并且仅在特定的后门攻击上工作,或者在适应性攻击中失败。相比之下,NNICULICULE使得最小的假设并提供有效的防御,即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。
translated by 谷歌翻译