机器学习容易受到对抗操作的影响。先前的文献表明,在训练阶段,攻击者可以操纵数据和数据采样程序以控制模型行为。一个共同的攻击目标是种植后门,即迫使受害者模型学会识别只有对手知道的触发因素。在本文中,我们引入了一类新的后门攻击类,这些攻击隐藏在模型体系结构内,即在用于训练的功能的电感偏置中。这些后门很容易实现,例如,通过为其他人将在不知不觉中重复使用的后式模型体系结构发布开源代码。我们证明,模型架构后门代表了一个真正的威胁,与其他方法不同,可以从头开始进行完整的重新训练。我们将建筑后门背后的主要构建原理(例如输入和输出之间的链接)形式化,并描述对它们的一些可能的保护。我们评估了对不同尺度的计算机视觉基准测试的攻击,并证明在各种培训环境中,潜在的脆弱性无处不在。
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
translated by 谷歌翻译
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
translated by 谷歌翻译
典型的深神经网络(DNN)后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中,我们提出了一个新的后门触发器,该扳机易于生成,不可察觉和高效。新的触发器是一个均匀生成的三维(3D)二进制图案,可以水平和/或垂直重复和镜像,并将其超级贴在三通道图像上,以训练后式DNN模型。新型触发器分散在整个图像中,对单个像素产生微弱的扰动,但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明,随着图像的分辨率提高,触发因素越来越有效。实验是使用MNIST,CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面,新触发的表现优于现有的触发器,例如Badnet,Trojaned NN和隐藏的后门。新的触发因素达到了几乎100%的攻击成功率,仅将分类准确性降低了不到0.7%-2.4%,并使最新的防御技术无效。
translated by 谷歌翻译
计算能力和大型培训数据集的可用性增加,机器学习的成功助长了。假设它充分代表了在测试时遇到的数据,则使用培训数据来学习新模型或更新现有模型。这种假设受到中毒威胁的挑战,这种攻击会操纵训练数据,以损害模型在测试时的表现。尽管中毒已被认为是行业应用中的相关威胁,到目前为止,已经提出了各种不同的攻击和防御措施,但对该领域的完整系统化和批判性审查仍然缺失。在这项调查中,我们在机器学习中提供了中毒攻击和防御措施的全面系统化,审查了过去15年中该领域发表的100多篇论文。我们首先对当前的威胁模型和攻击进行分类,然后相应地组织现有防御。虽然我们主要关注计算机视觉应用程序,但我们认为我们的系统化还包括其他数据模式的最新攻击和防御。最后,我们讨论了中毒研究的现有资源,并阐明了当前的局限性和该研究领域的开放研究问题。
translated by 谷歌翻译
视觉变压器(VIT)最近在各种视觉任务上表现出了典范的性能,并被用作CNN的替代方案。它们的设计基于一种自我发挥的机制,该机制将图像作为一系列斑块进行处理,与CNN相比,这是完全不同的。因此,研究VIT是否容易受到后门攻击的影响很有趣。当攻击者出于恶意目的,攻击者毒害培训数据的一小部分时,就会发生后门攻击。模型性能在干净的测试图像上很好,但是攻击者可以通过在测试时间显示触发器来操纵模型的决策。据我们所知,我们是第一个证明VIT容易受到后门攻击的人。我们还发现VIT和CNNS之间存在着有趣的差异 - 解释算法有效地突出了VIT的测试图像的触发因素,但没有针对CNN。基于此观察结果,我们提出了一个测试时间图像阻止VIT的防御,这将攻击成功率降低了很大。代码可在此处找到:https://github.com/ucdvision/backdoor_transformer.git
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time.We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.
translated by 谷歌翻译
As a critical threat to deep neural networks (DNNs), backdoor attacks can be categorized into two types, i.e., source-agnostic backdoor attacks (SABAs) and source-specific backdoor attacks (SSBAs). Compared to traditional SABAs, SSBAs are more advanced in that they have superior stealthier in bypassing mainstream countermeasures that are effective against SABAs. Nonetheless, existing SSBAs suffer from two major limitations. First, they can hardly achieve a good trade-off between ASR (attack success rate) and FPR (false positive rate). Besides, they can be effectively detected by the state-of-the-art (SOTA) countermeasures (e.g., SCAn). To address the limitations above, we propose a new class of viable source-specific backdoor attacks, coined as CASSOCK. Our key insight is that trigger designs when creating poisoned data and cover data in SSBAs play a crucial role in demonstrating a viable source-specific attack, which has not been considered by existing SSBAs. With this insight, we focus on trigger transparency and content when crafting triggers for poisoned dataset where a sample has an attacker-targeted label and cover dataset where a sample has a ground-truth label. Specifically, we implement $CASSOCK_{Trans}$ and $CASSOCK_{Cont}$. While both they are orthogonal, they are complementary to each other, generating a more powerful attack, called $CASSOCK_{Comp}$, with further improved attack performance and stealthiness. We perform a comprehensive evaluation of the three $CASSOCK$-based attacks on four popular datasets and three SOTA defenses. Compared with a representative SSBA as a baseline ($SSBA_{Base}$), $CASSOCK$-based attacks have significantly advanced the attack performance, i.e., higher ASR and lower FPR with comparable CDA (clean data accuracy). Besides, $CASSOCK$-based attacks have effectively bypassed the SOTA defenses, and $SSBA_{Base}$ cannot.
translated by 谷歌翻译
特洛伊木马后门是针对神经网络(NN)分类器的中毒攻击,对手试图利用(高度理想的)模型重用属性将特洛伊木马植入模型参数中,以通过中毒训练过程进行后门漏洞。大多数针对特洛伊木马攻击的防御措施都假设了白盒设置,其中防守者可以访问NN的内部状态,或者能够通过它进行后传播。在这项工作中,我们提出了一个更实用的黑盒防御,称为Trojdef,只能在NN上进行前进。 Trojdef试图通过监视输入因随机噪声反复扰动预测置信度的变化来识别和滤除特洛伊木马输入(即用Trojan触发器增强的输入)。我们根据预测输出得出一个函数,该函数称为预测置信度,以决定输入示例是否为特洛伊木马。直觉是,由于错误分类仅取决于触发因素,因此特洛伊木马的输入更加稳定,而由于分类特征的扰动,良性输入会受到损失。通过数学分析,我们表明,如果攻击者在注入后门时是完美的,则将训练特洛伊木马感染的模型以学习适当的预测置信度结合,该模型用于区分特洛伊木马和良性输入,并在任意扰动下。但是,由于攻击者在注入后门时可能不是完美的,因此我们将非线性转换引入了预测置信度,以提高实际环境中的检测准确性。广泛的经验评估表明,即使分类器体系结构,培训过程或超参数变化,Trojdef的表现明显优于州的防御能力,并且在不同的设置下也很稳定。
translated by 谷歌翻译
随着机器学习数据的策展变得越来越自动化,数据集篡改是一种安装威胁。后门攻击者通过培训数据篡改,以嵌入在该数据上培训的模型中的漏洞。然后通过将“触发”放入模型的输入中的推理时间以推理时间激活此漏洞。典型的后门攻击将触发器直接插入训练数据,尽管在检查时可能会看到这种攻击。相比之下,隐藏的触发后托攻击攻击达到中毒,而无需将触发器放入训练数据即可。然而,这种隐藏的触发攻击在从头开始培训的中毒神经网络时无效。我们开发了一个新的隐藏触发攻击,睡眠代理,在制备过程中使用梯度匹配,数据选择和目标模型重新培训。睡眠者代理是第一个隐藏的触发后门攻击,以对从头开始培训的神经网络有效。我们展示了Imagenet和黑盒设置的有效性。我们的实现代码可以在https://github.com/hsouri/sleeper-agent找到。
translated by 谷歌翻译
最近的研究表明,深层神经网络容易受到不同类型的攻击,例如对抗性攻击,数据中毒攻击和后门攻击。其中,后门攻击是最狡猾的攻击,几乎可以在深度学习管道的每个阶段发生。因此,后门攻击吸引了学术界和行业的许多兴趣。但是,大多数现有的后门攻击方法对于某些轻松的预处理(例如常见数据转换)都是可见的或脆弱的。为了解决这些限制,我们提出了一种强大而无形的后门攻击,称为“毒药”。具体而言,我们首先利用图像结构作为目标中毒区域,并用毒药(信息)填充它们以生成触发图案。由于图像结构可以在数据转换期间保持其语义含义,因此这种触发模式对数据转换本质上是强大的。然后,我们利用深度注射网络将这种触发模式嵌入封面图像中,以达到隐身性。与现有流行的后门攻击方法相比,毒药的墨水在隐形和健壮性方面都优于表现。通过广泛的实验,我们证明了毒药不仅是不同数据集和网络体系结构的一般性,而且对于不同的攻击场景也很灵活。此外,它对许多最先进的防御技术也具有非常强烈的抵抗力。
translated by 谷歌翻译
AI安全社区的一个主要目标是为现实世界应用安全可靠地生产和部署深入学习模型。为此,近年来,在生产阶段(或培训阶段)和相应的防御中,基于数据中毒基于深度神经网络(DNN)的后门攻击以及相应的防御。具有讽刺意味的是,部署阶段的后门攻击,这些攻击通常可以在不专业用户的设备中发生,因此可以说是在现实世界的情景中威胁要威胁,得以更少的关注社区。我们将这种警惕的不平衡归因于现有部署阶段后门攻击算法的弱实用性以及现实世界攻击示范的不足。为了填补空白,在这项工作中,我们研究了对DNN的部署阶段后门攻击的现实威胁。我们基于普通使用的部署阶段攻击范式 - 对抗对抗权重攻击的研究,主体选择性地修改模型权重,以将后台嵌入到部署的DNN中。为了实现现实的实用性,我们提出了第一款灰度盒和物理可实现的重量攻击算法,即替换注射,即子网替换攻击(SRA),只需要受害者模型的架构信息,并且可以支持现实世界中的物理触发器。进行了广泛的实验模拟和系统级真实的世界攻击示范。我们的结果不仅提出了所提出的攻击算法的有效性和实用性,还揭示了一种新型计算机病毒的实际风险,这些计算机病毒可能会广泛传播和悄悄地将后门注入用户设备中的DNN模型。通过我们的研究,我们要求更多地关注DNN在部署阶段的脆弱性。
translated by 谷歌翻译
视觉变压器(VITS)具有与卷积神经网络相比,具有较小的感应偏置的根本不同的结构。随着绩效的提高,VIT的安全性和鲁棒性也非常重要。与许多最近利用VIT反对对抗性例子的鲁棒性的作品相反,本文调查了代表性的病因攻击,即后门。我们首先检查了VIT对各种后门攻击的脆弱性,发现VIT也很容易受到现有攻击的影响。但是,我们观察到,VIT的清洁数据准确性和后门攻击成功率在位置编码之前对补丁转换做出了明显的反应。然后,根据这一发现,我们为VIT提出了一种通过补丁处理来捍卫基于补丁的触发后门攻击的有效方法。在包括CIFAR10,GTSRB和Tinyimagenet在内的几个基准数据集上评估了这些表演,这些数据表明,该拟议的新颖防御在减轻VIT的后门攻击方面非常成功。据我们所知,本文提出了第一个防御性策略,该策略利用了反对后门攻击的VIT的独特特征。
translated by 谷歌翻译
本文提出了针对回顾性神经网络(Badnets)的新型两级防御(NNOCULICULE),该案例在响应该字段中遇到的回溯测试输入,修复了预部署和在线的BADNET。在预部署阶段,NNICULICULE与清洁验证输入的随机扰动进行检测,以部分减少后门的对抗影响。部署后,NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即,它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序,产生我们的最终防御。关于全面的后门攻击套件的实证评估表明,NNOCLICULE优于所有最先进的防御,以制定限制性假设,并且仅在特定的后门攻击上工作,或者在适应性攻击中失败。相比之下,NNICULICULE使得最小的假设并提供有效的防御,即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。
translated by 谷歌翻译
量化是一种流行的技术,即$将神经网络的参数表示从浮点数转换为低精度($ e.g. $,8位整数)。它会降低记忆占用和计算成本,推断,促进了资源饥饿的模型的部署。但是,在量化之前和之后,该转换引起的参数扰动导致模型之间的$行为$ $差异$。例如,量化模型可以错误分类正确分类的测试时间样本。尚不清楚这些差异是否导致新的安全漏洞。我们假设对手可以控制这种差异以引入在量化时激活的具体行为。为研究这一假设,我们武装量化感知培训并提出了一种新的培训框架来实施对抗性量化结果。在此框架之后,我们展示了三次攻击我们通过量化进行:(i)对显着的精度损失的不分青红皂白攻击; (ii)针对特定样本的目标攻击; (iii)使用输入触发来控制模型的后门攻击。我们进一步表明,单个受损模型击败多种量化方案,包括鲁棒量化技术。此外,在联合学习情景中,我们证明了一系列伴侣可以注入我们量化激活的后门的恶意参与者。最后,我们讨论了潜在的反措施,并表明只有重新训练始终如一地删除攻击伪影。我们的代码可以在https://github.com/secure-ai-systems-group/qu-antigization获得
translated by 谷歌翻译
后门是针对深神经网络(DNN)的强大攻击。通过中毒训练数据,攻击者可以将隐藏的规则(后门)注入DNN,该规则仅在包含攻击特异性触发器的输入上激活。尽管现有工作已经研究了各种DNN模型的后门攻击,但它们仅考虑静态模型,这些模型在初始部署后保持不变。在本文中,我们研究了后门攻击对时变DNN模型更现实的情况的影响,其中定期更新模型权重以处理数据分布的漂移。具体而言,我们从经验上量化了后门针对模型更新的“生存能力”,并检查攻击参数,数据漂移行为和模型更新策略如何影响后门生存能力。我们的结果表明,即使攻击者会积极增加触发器的大小和毒药比,即使在几个模型更新中,一次射击后门攻击(即一次仅中毒训练数据)也无法幸免。为了保持模型更新影响,攻击者必须不断将损坏的数据引入培训管道。这些结果共同表明,当模型更新以学习新数据时,它们也将后门“忘记”为隐藏的恶意功能。旧培训数据之间的分配变化越大,后门被遗忘了。利用这些见解,我们应用了智能学习率调度程序,以进一步加速模型更新期间的后门遗忘,这阻止了单发后门在单个模型更新中幸存下来。
translated by 谷歌翻译
深度神经网络众所周知,很容易受到对抗性攻击和后门攻击的影响,在该攻击中,对输入的微小修改能够误导模型以给出错误的结果。尽管已经广泛研究了针对对抗性攻击的防御措施,但有关减轻后门攻击的调查仍处于早期阶段。尚不清楚防御这两次攻击之间是否存在任何连接和共同特征。我们对对抗性示例与深神网络的后门示例之间的联系进行了全面的研究,以寻求回答以下问题:我们可以使用对抗检测方法检测后门。我们的见解是基于这样的观察结果,即在推理过程中,对抗性示例和后门示例都有异常,与良性​​样本高度区分。结果,我们修改了四种现有的对抗防御方法来检测后门示例。广泛的评估表明,这些方法可靠地防止后门攻击,其准确性比检测对抗性实例更高。这些解决方案还揭示了模型灵敏度,激活空间和特征空间中对抗性示例,后门示例和正常样本的关系。这能够增强我们对这两次攻击和防御机会的固有特征的理解。
translated by 谷歌翻译
This paper asks the intriguing question: is it possible to exploit neural architecture search (NAS) as a new attack vector to launch previously improbable attacks? Specifically, we present EVAS, a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. Compared with existing attacks, EVAS demonstrates many interesting properties: (i) it does not require polluting training data or perturbing model parameters; (ii) it is agnostic to downstream fine-tuning or even re-training from scratch; (iii) it naturally evades defenses that rely on inspecting model parameters or training data. With extensive evaluation on benchmark datasets, we show that EVAS features high evasiveness, transferability, and robustness, thereby expanding the adversary's design spectrum. We further characterize the mechanisms underlying EVAS, which are possibly explainable by architecture-level ``shortcuts'' that recognize trigger patterns. This work raises concerns about the current practice of NAS and points to potential directions to develop effective countermeasures.
translated by 谷歌翻译