Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.
translated by 谷歌翻译
后门攻击已成为深度神经网络(DNN)的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果,但仍然尚不清楚是否可以设计强大的培训方法,以防止后门触发器首先注入训练的模型。在本文中,我们介绍了\ emph {反后门学习}的概念,旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看,我们确定了两个后门攻击的固有特征,因为他们的弱点2)后门任务与特定类(后门目标类)相关联。根据这两个弱点,我们提出了一般学习计划,反后门学习(ABL),在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制,帮助分离早期训练阶段的后台示例,2)在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验,针对10个最先进的攻击,我们经验证明,后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。
translated by 谷歌翻译
人群计数是一项回归任务,它估计场景图像中的人数,在一系列安全至关重要的应用程序中起着至关重要的作用,例如视频监视,交通监控和流量控制。在本文中,我们研究了基于深度学习的人群计数模型对后门攻击的脆弱性,这是对深度学习的主要安全威胁。后门攻击者通过数据中毒将后门触发植入目标模型,以控制测试时间的预测。与已经开发和测试的大多数现有后门攻击的图像分类模型不同,人群计数模型是输出多维密度图的回归模型,因此需要不同的技术来操纵。在本文中,我们提出了两次新颖的密度操纵后门攻击(DMBA $^{ - } $和DMBA $^{+} $),以攻击模型以产生任意的大或小密度估计。实验结果证明了我们对五个经典人群计数模型和四种类型数据集的DMBA攻击的有效性。我们还深入分析了后门人群计数模型的独特挑战,并揭示了有效攻击的两个关键要素:1)完整而密集的触发器以及2)操纵地面真相计数或密度图。我们的工作可以帮助评估人群计数模型对潜在后门攻击的脆弱性。
translated by 谷歌翻译
后门攻击已被证明是对深度学习模型的严重安全威胁,并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是,在许多情况下,尤其是对于高级后门攻击,违反了上述观察结果,阻碍了现有防御的性能和适用性。在本文中,我们提出了基于新观察的后门防御范围。也就是说,有效的后门攻击通常需要对中毒训练样本的高预测置信度,以确保训练有素的模型具有很高的可能性。基于此观察结果,Dtinspector首先学习一个可以改变最高信心数据的预测的补丁,然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击,四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
最近的研究表明,深层神经网络容易受到不同类型的攻击,例如对抗性攻击,数据中毒攻击和后门攻击。其中,后门攻击是最狡猾的攻击,几乎可以在深度学习管道的每个阶段发生。因此,后门攻击吸引了学术界和行业的许多兴趣。但是,大多数现有的后门攻击方法对于某些轻松的预处理(例如常见数据转换)都是可见的或脆弱的。为了解决这些限制,我们提出了一种强大而无形的后门攻击,称为“毒药”。具体而言,我们首先利用图像结构作为目标中毒区域,并用毒药(信息)填充它们以生成触发图案。由于图像结构可以在数据转换期间保持其语义含义,因此这种触发模式对数据转换本质上是强大的。然后,我们利用深度注射网络将这种触发模式嵌入封面图像中,以达到隐身性。与现有流行的后门攻击方法相比,毒药的墨水在隐形和健壮性方面都优于表现。通过广泛的实验,我们证明了毒药不仅是不同数据集和网络体系结构的一般性,而且对于不同的攻击场景也很灵活。此外,它对许多最先进的防御技术也具有非常强烈的抵抗力。
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
典型的深神经网络(DNN)后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中,我们提出了一个新的后门触发器,该扳机易于生成,不可察觉和高效。新的触发器是一个均匀生成的三维(3D)二进制图案,可以水平和/或垂直重复和镜像,并将其超级贴在三通道图像上,以训练后式DNN模型。新型触发器分散在整个图像中,对单个像素产生微弱的扰动,但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明,随着图像的分辨率提高,触发因素越来越有效。实验是使用MNIST,CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面,新触发的表现优于现有的触发器,例如Badnet,Trojaned NN和隐藏的后门。新的触发因素达到了几乎100%的攻击成功率,仅将分类准确性降低了不到0.7%-2.4%,并使最新的防御技术无效。
translated by 谷歌翻译
Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.
translated by 谷歌翻译
本文提出了针对回顾性神经网络(Badnets)的新型两级防御(NNOCULICULE),该案例在响应该字段中遇到的回溯测试输入,修复了预部署和在线的BADNET。在预部署阶段,NNICULICULE与清洁验证输入的随机扰动进行检测,以部分减少后门的对抗影响。部署后,NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即,它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序,产生我们的最终防御。关于全面的后门攻击套件的实证评估表明,NNOCLICULE优于所有最先进的防御,以制定限制性假设,并且仅在特定的后门攻击上工作,或者在适应性攻击中失败。相比之下,NNICULICULE使得最小的假设并提供有效的防御,即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。
translated by 谷歌翻译
普遍的后门是由动态和普遍的输入扰动触发的。它们可以被攻击者故意注射,也可以自然存在于经过正常训练的模型中。它们的性质与传统的静态和局部后门不同,可以通过扰动带有一些固定图案的小输入区域来触发,例如带有纯色的贴片。现有的防御技术对于传统后门非常有效。但是,它们可能对普遍的后门无法正常工作,尤其是在后门去除和模型硬化方面。在本文中,我们提出了一种针对普遍的后门,包括天然和注射后门的新型模型硬化技术。我们基于通过特殊转换层增强的编码器架构来开发一般的普遍攻击。该攻击可以对现有的普遍后门攻击进行建模,并通过类距离进行量化。因此,使用我们在对抗训练中攻击的样品可以使模型与这些后门漏洞相比。我们对9个具有15个模型结构的9个数据集的评估表明,我们的技术可以平均扩大阶级距离59.65%,精度降解且没有稳健性损失,超过了五种硬化技术,例如对抗性训练,普遍的对抗训练,Moth,Moth等, 。它可以将六次普遍后门攻击的攻击成功率从99.06%降低到1.94%,超过七种最先进的后门拆除技术。
translated by 谷歌翻译
Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.
translated by 谷歌翻译
视觉变压器(VIT)最近在各种视觉任务上表现出了典范的性能,并被用作CNN的替代方案。它们的设计基于一种自我发挥的机制,该机制将图像作为一系列斑块进行处理,与CNN相比,这是完全不同的。因此,研究VIT是否容易受到后门攻击的影响很有趣。当攻击者出于恶意目的,攻击者毒害培训数据的一小部分时,就会发生后门攻击。模型性能在干净的测试图像上很好,但是攻击者可以通过在测试时间显示触发器来操纵模型的决策。据我们所知,我们是第一个证明VIT容易受到后门攻击的人。我们还发现VIT和CNNS之间存在着有趣的差异 - 解释算法有效地突出了VIT的测试图像的触发因素,但没有针对CNN。基于此观察结果,我们提出了一个测试时间图像阻止VIT的防御,这将攻击成功率降低了很大。代码可在此处找到:https://github.com/ucdvision/backdoor_transformer.git
translated by 谷歌翻译
最近的研究表明,尽管在许多现实世界应用上达到了很高的精度,但深度神经网络(DNN)可以被换式:通过将触发的数据样本注入培训数据集中,对手可以将受过训练的模型误导到将任何测试数据分类为将任何测试数据分类为只要提出触发模式,目标类。为了消除此类后门威胁,已经提出了各种方法。特别是,一系列研究旨在净化潜在的损害模型。但是,这项工作的一个主要限制是访问足够的原始培训数据的要求:当可用的培训数据受到限制时,净化性能要差得多。在这项工作中,我们提出了对抗重量掩蔽(AWM),这是一种即使在单一设置中也能擦除神经后门的新颖方法。我们方法背后的关键思想是将其提出为最小最大优化问题:首先,对抗恢复触发模式,然后(软)掩盖对恢复模式敏感的网络权重。对几个基准数据集的全面评估表明,AWM在很大程度上可以改善对各种可用培训数据集大小的其他最先进方法的纯化效果。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
最近的研究表明,深度神经网络(DNN)容易受到后门攻击的影响,后门攻击会导致DNN的恶意行为,当时特定的触发器附在输入图像上时。进一步证明,感染的DNN具有一系列通道,与正常通道相比,该通道对后门触发器更敏感。然后,将这些通道修剪可有效缓解后门行为。要定位这些通道,自然要考虑其Lipschitzness,这可以衡量他们对输入上最严重的扰动的敏感性。在这项工作中,我们介绍了一个名为Channel Lipschitz常数(CLC)的新颖概念,该概念定义为从输入图像到每个通道输出的映射的Lipschitz常数。然后,我们提供经验证据,以显示CLC(UCLC)上限与通道激活的触发激活变化之间的强相关性。由于可以从重量矩阵直接计算UCLC,因此我们可以以无数据的方式检测潜在的后门通道,并在感染的DNN上进行简单修剪以修复模型。提出的基于lipschitzness的通道修剪(CLP)方法非常快速,简单,无数据且可靠,可以选择修剪阈值。进行了广泛的实验来评估CLP的效率和有效性,CLP的效率和有效性也可以在主流防御方法中获得最新的结果。源代码可在https://github.com/rkteddy/channel-lipschitzness基于普通范围内获得。
translated by 谷歌翻译
We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities. We find that natural backdoors are widely existing, with most injected backdoor attacks having natural correspondences. We categorize these natural backdoors and propose a general detection framework. It finds 315 natural backdoors in the 56 normally trained models downloaded from the Internet, covering all the different categories, while existing scanners designed for injected backdoors can at most detect 65 backdoors. We also study the root causes and defense of natural backdoors.
translated by 谷歌翻译
已知深层神经网络(DNN)容易受到后门攻击和对抗攻击的影响。在文献中,这两种攻击通常被视为明显的问题并分别解决,因为它们分别属于训练时间和推理时间攻击。但是,在本文中,我们发现它们之间有一个有趣的联系:对于具有后门种植的模型,我们观察到其对抗性示例具有与触发样品相似的行为,即都激活了同一DNN神经元的子集。这表明将后门种植到模型中会严重影响模型的对抗性例子。基于这一观察结果,我们设计了一种新的对抗性微调(AFT)算法,以防止后门攻击。我们从经验上表明,在5次最先进的后门攻击中,我们的船尾可以有效地擦除后门触发器,而无需在干净的样品上明显的性能降解,并显着优于现有的防御方法。
translated by 谷歌翻译
深度神经网络(DNNS)在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常起作用,但是当输入中的某些模式触发时,会产生预定义的目标标签。现有防御通常依赖于通用后门设置的假设,其中有毒样品共享相同的均匀扳机。但是,最近的高级后门攻击表明,这种假设在动态后门中不再有效,在动态后门中,触发者因输入而异,从而击败了现有的防御。在这项工作中,我们提出了一种新颖的技术BEATRIX(通过革兰氏矩阵检测)。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性,还可以捕获表示形式的适当高阶信息。通过从正常样本的激活模式中学习类条件统计,BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能,BEATRIX利用基于内核的测试,而无需对表示分布进行任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明,我们的方法在检测动态后门时达到了91.1%的F1得分,而最新技术只能达到36.9%。
translated by 谷歌翻译
Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motion-based triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -- models trained under Flareon learn powerful target-conditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community: https://github.com/lafeat/flareon.
translated by 谷歌翻译