最近,后门攻击已成为对深神经网络(DNN)模型安全性的新兴威胁。迄今为止,大多数现有研究都集中于对未压缩模型的后门攻击。尽管在实际应用中广泛使用的压缩DNN的脆弱性尚未得到利用。在本文中,我们建议研究和发展针对紧凑型DNN模型(RIBAC)的强大和不可感知的后门攻击。通过对重要设计旋钮进行系统分析和探索,我们提出了一个框架,该框架可以有效地学习适当的触发模式,模型参数和修剪口罩。从而同时达到高触发隐形性,高攻击成功率和高模型效率。跨不同数据集的广泛评估,包括针对最先进的防御机制的测试,证明了RIBAC的高鲁棒性,隐身性和模型效率。代码可从https://github.com/huyvnphan/eccv2022-ribac获得
translated by 谷歌翻译
Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.
translated by 谷歌翻译
Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.
translated by 谷歌翻译
最近的研究表明,深度神经网络(DNN)容易受到后门攻击的影响,后门攻击会导致DNN的恶意行为,当时特定的触发器附在输入图像上时。进一步证明,感染的DNN具有一系列通道,与正常通道相比,该通道对后门触发器更敏感。然后,将这些通道修剪可有效缓解后门行为。要定位这些通道,自然要考虑其Lipschitzness,这可以衡量他们对输入上最严重的扰动的敏感性。在这项工作中,我们介绍了一个名为Channel Lipschitz常数(CLC)的新颖概念,该概念定义为从输入图像到每个通道输出的映射的Lipschitz常数。然后,我们提供经验证据,以显示CLC(UCLC)上限与通道激活的触发激活变化之间的强相关性。由于可以从重量矩阵直接计算UCLC,因此我们可以以无数据的方式检测潜在的后门通道,并在感染的DNN上进行简单修剪以修复模型。提出的基于lipschitzness的通道修剪(CLP)方法非常快速,简单,无数据且可靠,可以选择修剪阈值。进行了广泛的实验来评估CLP的效率和有效性,CLP的效率和有效性也可以在主流防御方法中获得最新的结果。源代码可在https://github.com/rkteddy/channel-lipschitzness基于普通范围内获得。
translated by 谷歌翻译
后门攻击已成为深度神经网络(DNN)的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果,但仍然尚不清楚是否可以设计强大的培训方法,以防止后门触发器首先注入训练的模型。在本文中,我们介绍了\ emph {反后门学习}的概念,旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看,我们确定了两个后门攻击的固有特征,因为他们的弱点2)后门任务与特定类(后门目标类)相关联。根据这两个弱点,我们提出了一般学习计划,反后门学习(ABL),在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制,帮助分离早期训练阶段的后台示例,2)在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验,针对10个最先进的攻击,我们经验证明,后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。
translated by 谷歌翻译
最近的研究表明,深层神经网络容易受到不同类型的攻击,例如对抗性攻击,数据中毒攻击和后门攻击。其中,后门攻击是最狡猾的攻击,几乎可以在深度学习管道的每个阶段发生。因此,后门攻击吸引了学术界和行业的许多兴趣。但是,大多数现有的后门攻击方法对于某些轻松的预处理(例如常见数据转换)都是可见的或脆弱的。为了解决这些限制,我们提出了一种强大而无形的后门攻击,称为“毒药”。具体而言,我们首先利用图像结构作为目标中毒区域,并用毒药(信息)填充它们以生成触发图案。由于图像结构可以在数据转换期间保持其语义含义,因此这种触发模式对数据转换本质上是强大的。然后,我们利用深度注射网络将这种触发模式嵌入封面图像中,以达到隐身性。与现有流行的后门攻击方法相比,毒药的墨水在隐形和健壮性方面都优于表现。通过广泛的实验,我们证明了毒药不仅是不同数据集和网络体系结构的一般性,而且对于不同的攻击场景也很灵活。此外,它对许多最先进的防御技术也具有非常强烈的抵抗力。
translated by 谷歌翻译
Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motion-based triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -- models trained under Flareon learn powerful target-conditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community: https://github.com/lafeat/flareon.
translated by 谷歌翻译
最近的研究表明,尽管在许多现实世界应用上达到了很高的精度,但深度神经网络(DNN)可以被换式:通过将触发的数据样本注入培训数据集中,对手可以将受过训练的模型误导到将任何测试数据分类为将任何测试数据分类为只要提出触发模式,目标类。为了消除此类后门威胁,已经提出了各种方法。特别是,一系列研究旨在净化潜在的损害模型。但是,这项工作的一个主要限制是访问足够的原始培训数据的要求:当可用的培训数据受到限制时,净化性能要差得多。在这项工作中,我们提出了对抗重量掩蔽(AWM),这是一种即使在单一设置中也能擦除神经后门的新颖方法。我们方法背后的关键思想是将其提出为最小最大优化问题:首先,对抗恢复触发模式,然后(软)掩盖对恢复模式敏感的网络权重。对几个基准数据集的全面评估表明,AWM在很大程度上可以改善对各种可用培训数据集大小的其他最先进方法的纯化效果。
translated by 谷歌翻译
已知深层神经网络(DNN)容易受到后门攻击和对抗攻击的影响。在文献中,这两种攻击通常被视为明显的问题并分别解决,因为它们分别属于训练时间和推理时间攻击。但是,在本文中,我们发现它们之间有一个有趣的联系:对于具有后门种植的模型,我们观察到其对抗性示例具有与触发样品相似的行为,即都激活了同一DNN神经元的子集。这表明将后门种植到模型中会严重影响模型的对抗性例子。基于这一观察结果,我们设计了一种新的对抗性微调(AFT)算法,以防止后门攻击。我们从经验上表明,在5次最先进的后门攻击中,我们的船尾可以有效地擦除后门触发器,而无需在干净的样品上明显的性能降解,并显着优于现有的防御方法。
translated by 谷歌翻译
大量证据表明,深神经网络(DNN)容易受到后门攻击的影响,这激发了后门检测方法的发展。现有的后门检测方法通常是针对具有单个特定类型(例如基于补丁或基于扰动)的后门攻击而定制的。但是,在实践中,对手可能会产生多种类型的后门攻击,这挑战了当前的检测策略。基于以下事实:对抗性扰动与触发模式高度相关,本文提出了自适应扰动生成(APG)框架,以通过自适应注射对抗性扰动来检测多种类型的后门攻击。由于不同的触发模式在相同的对抗扰动下显示出高度多样的行为,因此我们首先设计了全球到本地策略,以通过调整攻击的区域和预算来适应多种类型的后门触发器。为了进一步提高扰动注入的效率,我们引入了梯度引导的掩模生成策略,以寻找最佳区域以进行对抗攻击。在多个数据集(CIFAR-10,GTSRB,Tiny-Imagenet)上进行的广泛实验表明,我们的方法以大幅度优于最先进的基线(+12%)。
translated by 谷歌翻译
近年来,AI系统的安全性吸引了越来越多的研究,特别是在医学成像领域。为了开发安全的医学图像分析(MIA)系统,必须研究可能的后门攻击(BAS),这可以将隐藏的恶意行为嵌入系统中。然而,由于成像方式的多样性(例如,X射线,CT和MRI)和分析任务(例如,分类,检测和分割),设计可以应用于各种MIA系统的统一BA方法是具有挑战性的。大多数现有的BA方法旨在攻击自然图像分类模型,该模型将空间触发器应用于训练图像,不可避免地破坏中毒像素的语义,导致攻击密集预测模型的失败。为了解决这个问题,我们提出了一种新的基于频率喷射的次频率注入的后门攻击方法(FIBA),其能够在各种MIA任务中提供攻击。具体地,FIBA利用频域中的触发功能,该频域可以通过线性地组合两个图像的光谱幅度将触发图像的低频信息注入中毒图像中。由于它保留了中毒图像像素的语义,因此FIBA可以对分类和密集预测模型进行攻击。 MIA三个基准测试的实验(即,2019年为皮肤病变分类,肾脏肿瘤分割的试剂盒-19,用于内窥镜伪影检测的EAD-2019),验证了FIBA的有效性及其在级的优越性 - 攻击MIA模型的艺术方法以及绕过后门防御。代码将在https://github.com/hazardfy/fiba上获得。
translated by 谷歌翻译
近年来,提出了基于培训数据中毒的许多后门攻击。然而,在实践中,这些后门攻击容易受到图像压缩的影响。当压缩后门实例时,将销毁特定后门触发器的特征,这可能导致后门攻击性能恶化。在本文中,我们提出了一种基于特征一致性培训的压缩后门攻击。据我们所知,这是第一个对图像压缩强大的后门攻击。首先,将返回码图像及其压缩版本输入深神经网络(DNN)进行培训。然后,通过DNN的内部层提取每个图像的特征。接下来,最小化后门图像和其压缩版本之间的特征差异。结果,DNN将压缩图像的特征视为特征空间中的后门图像的特征。培训后,对抗DNN的后门攻击是对图像压缩的强大。此外,我们考虑了三种不同的图像按压(即,JPEG,JPEG2000,WEBP),使得后门攻击对多个图像压缩算法具有鲁棒性。实验结果表明了拟议的后门攻击的有效性和稳健性。当后门实例被压缩时,常见后攻击攻击的攻击成功率低于10%,而我们压缩后门的攻击成功率大于97%。即使在低压缩质量压缩后,压缩攻击也仍然是坚固的。此外,广泛的实验表明,我们的压缩后卫攻击具有抗拒未在训练过程中使用的图像压缩的泛化能力。
translated by 谷歌翻译
后门学习是研究深神经网络(DNNS)脆弱性的一个新兴而重要的话题。在快速武器竞赛的地位上,正在连续或同时提出许多开创性的后门攻击和防御方法。但是,我们发现对新方法的评估通常是不可思议的,以验证其主张和实际绩效,这主要是由于快速发展,不同的环境以及实施和可重复性的困难。没有彻底的评估和比较,很难跟踪当前的进度并设计文献的未来发展路线图。为了减轻这一困境,我们建立了一个名为Backdoorbench的后门学习的全面基准。它由一个可扩展的基于模块化的代码库(当前包括8个最先进(SOTA)攻击和9种SOTA防御算法的实现),以及完整的后门学习的标准化协议。我们还基于5个模型和4个数据集,对9个防御措施的每对8次攻击进行全面评估,总共8,000对评估。我们从不同的角度进一步介绍了对这8,000次评估的不同角度,研究了对国防算法,中毒比率,模型和数据集对后门学习的影响。 \ url {https://backdoorbench.github.io}公开获得了Backdoorbench的所有代码和评估。
translated by 谷歌翻译
在对抗机器学习中,防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下,法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充,并为缓解措施提供前进的途径,以防止将来采取类似的攻击。在本文中,我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案,该解决方案修剪了“无辜”训练样本,直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响,然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签(后门)毒物攻击和三种类型的清洁标签毒药攻击的功效,这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4%的精度和96.8%的召回。我们还表明,我们的系统与专门攻击它的四种抗纤维法措施相对强大。
translated by 谷歌翻译
视觉变压器(VITS)具有与卷积神经网络相比,具有较小的感应偏置的根本不同的结构。随着绩效的提高,VIT的安全性和鲁棒性也非常重要。与许多最近利用VIT反对对抗性例子的鲁棒性的作品相反,本文调查了代表性的病因攻击,即后门。我们首先检查了VIT对各种后门攻击的脆弱性,发现VIT也很容易受到现有攻击的影响。但是,我们观察到,VIT的清洁数据准确性和后门攻击成功率在位置编码之前对补丁转换做出了明显的反应。然后,根据这一发现,我们为VIT提出了一种通过补丁处理来捍卫基于补丁的触发后门攻击的有效方法。在包括CIFAR10,GTSRB和Tinyimagenet在内的几个基准数据集上评估了这些表演,这些数据表明,该拟议的新颖防御在减轻VIT的后门攻击方面非常成功。据我们所知,本文提出了第一个防御性策略,该策略利用了反对后门攻击的VIT的独特特征。
translated by 谷歌翻译
Model compression and model defense for deep neural networks (DNNs) have been extensively and individually studied. Considering the co-importance of model compactness and robustness in practical applications, several prior works have explored to improve the adversarial robustness of the sparse neural networks. However, the structured sparse models obtained by the exiting works suffer severe performance degradation for both benign and robust accuracy, thereby causing a challenging dilemma between robustness and structuredness of the compact DNNs. To address this problem, in this paper, we propose CSTAR, an efficient solution that can simultaneously impose the low-rankness-based Compactness, high STructuredness and high Adversarial Robustness on the target DNN models. By formulating the low-rankness and robustness requirement within the same framework and globally determining the ranks, the compressed DNNs can simultaneously achieve high compression performance and strong adversarial robustness. Evaluations for various DNN models on different datasets demonstrate the effectiveness of CSTAR. Compared with the state-of-the-art robust structured pruning methods, CSTAR shows consistently better performance. For instance, when compressing ResNet-18 on CIFAR-10, CSTAR can achieve up to 20.07% and 11.91% improvement for benign accuracy and robust accuracy, respectively. For compressing ResNet-18 with 16x compression ratio on Imagenet, CSTAR can obtain 8.58% benign accuracy gain and 4.27% robust accuracy gain compared to the existing robust structured pruning method.
translated by 谷歌翻译
Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
translated by 谷歌翻译
典型的深神经网络(DNN)后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中,我们提出了一个新的后门触发器,该扳机易于生成,不可察觉和高效。新的触发器是一个均匀生成的三维(3D)二进制图案,可以水平和/或垂直重复和镜像,并将其超级贴在三通道图像上,以训练后式DNN模型。新型触发器分散在整个图像中,对单个像素产生微弱的扰动,但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明,随着图像的分辨率提高,触发因素越来越有效。实验是使用MNIST,CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面,新触发的表现优于现有的触发器,例如Badnet,Trojaned NN和隐藏的后门。新的触发因素达到了几乎100%的攻击成功率,仅将分类准确性降低了不到0.7%-2.4%,并使最新的防御技术无效。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
人群计数是一项回归任务,它估计场景图像中的人数,在一系列安全至关重要的应用程序中起着至关重要的作用,例如视频监视,交通监控和流量控制。在本文中,我们研究了基于深度学习的人群计数模型对后门攻击的脆弱性,这是对深度学习的主要安全威胁。后门攻击者通过数据中毒将后门触发植入目标模型,以控制测试时间的预测。与已经开发和测试的大多数现有后门攻击的图像分类模型不同,人群计数模型是输出多维密度图的回归模型,因此需要不同的技术来操纵。在本文中,我们提出了两次新颖的密度操纵后门攻击(DMBA $^{ - } $和DMBA $^{+} $),以攻击模型以产生任意的大或小密度估计。实验结果证明了我们对五个经典人群计数模型和四种类型数据集的DMBA攻击的有效性。我们还深入分析了后门人群计数模型的独特挑战,并揭示了有效攻击的两个关键要素:1)完整而密集的触发器以及2)操纵地面真相计数或密度图。我们的工作可以帮助评估人群计数模型对潜在后门攻击的脆弱性。
translated by 谷歌翻译