With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time.We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.
translated by 谷歌翻译
视觉变压器(VIT)最近在各种视觉任务上表现出了典范的性能,并被用作CNN的替代方案。它们的设计基于一种自我发挥的机制,该机制将图像作为一系列斑块进行处理,与CNN相比,这是完全不同的。因此,研究VIT是否容易受到后门攻击的影响很有趣。当攻击者出于恶意目的,攻击者毒害培训数据的一小部分时,就会发生后门攻击。模型性能在干净的测试图像上很好,但是攻击者可以通过在测试时间显示触发器来操纵模型的决策。据我们所知,我们是第一个证明VIT容易受到后门攻击的人。我们还发现VIT和CNNS之间存在着有趣的差异 - 解释算法有效地突出了VIT的测试图像的触发因素,但没有针对CNN。基于此观察结果,我们提出了一个测试时间图像阻止VIT的防御,这将攻击成功率降低了很大。代码可在此处找到:https://github.com/ucdvision/backdoor_transformer.git
translated by 谷歌翻译
随着机器学习数据的策展变得越来越自动化,数据集篡改是一种安装威胁。后门攻击者通过培训数据篡改,以嵌入在该数据上培训的模型中的漏洞。然后通过将“触发”放入模型的输入中的推理时间以推理时间激活此漏洞。典型的后门攻击将触发器直接插入训练数据,尽管在检查时可能会看到这种攻击。相比之下,隐藏的触发后托攻击攻击达到中毒,而无需将触发器放入训练数据即可。然而,这种隐藏的触发攻击在从头开始培训的中毒神经网络时无效。我们开发了一个新的隐藏触发攻击,睡眠代理,在制备过程中使用梯度匹配,数据选择和目标模型重新培训。睡眠者代理是第一个隐藏的触发后门攻击,以对从头开始培训的神经网络有效。我们展示了Imagenet和黑盒设置的有效性。我们的实现代码可以在https://github.com/hsouri/sleeper-agent找到。
translated by 谷歌翻译
Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use "clean-labels"; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a specific test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by leaving them on the web and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a "watermarking" strategy that makes poisoning reliable using multiple (≈ 50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.
translated by 谷歌翻译
大规模的未标记数据刺激了学习丰富的视觉表示的自我监督学习方法的最新进展。从图像中学习表示表示形式的最先进的自我监督方法(例如Moco,Byol,MSF)使用诱导性偏差,即图像的随机增强(例如随机农作物)应产生相似的嵌入。我们表明,这种方法容易受到后门攻击的影响 - 攻击者通过将触发器(攻击者选择的图像补丁)添加到图像中来毒害未标记数据的一小部分。模型性能在干净的测试图像上很好,但是攻击者可以通过在测试时间显示触发器来操纵模型的决策。在有监督的学习中对后门攻击进行了广泛的研究,据我们所知,我们是第一个研究他们进行自学学习的人。在自学学习中,后门攻击更为实用,因为使用大型未标记的数据使数据检查以消除毒药过敏。我们表明,在我们的目标攻击中,攻击者可以在测试时使用触发器为目标类别产生许多误报。我们还提出了一种基于知识蒸馏的防御方法,以成功中和攻击。我们的代码可在此处提供:https://github.com/umbcvision/ssl-backdoor。
translated by 谷歌翻译
We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.
translated by 谷歌翻译
后门攻击已被证明是对深度学习模型的严重安全威胁,并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是,在许多情况下,尤其是对于高级后门攻击,违反了上述观察结果,阻碍了现有防御的性能和适用性。在本文中,我们提出了基于新观察的后门防御范围。也就是说,有效的后门攻击通常需要对中毒训练样本的高预测置信度,以确保训练有素的模型具有很高的可能性。基于此观察结果,Dtinspector首先学习一个可以改变最高信心数据的预测的补丁,然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击,四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。
translated by 谷歌翻译
在对抗机器学习中,防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下,法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充,并为缓解措施提供前进的途径,以防止将来采取类似的攻击。在本文中,我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案,该解决方案修剪了“无辜”训练样本,直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响,然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签(后门)毒物攻击和三种类型的清洁标签毒药攻击的功效,这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4%的精度和96.8%的召回。我们还表明,我们的系统与专门攻击它的四种抗纤维法措施相对强大。
translated by 谷歌翻译
与令人印象深刻的进步触动了我们社会的各个方面,基于深度神经网络(DNN)的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注,但是通过干扰培训过程来利用破坏DNN模型的可能性,代表了破坏训练过程的可能性,这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中,攻击者损坏了培训数据,以便在测试时间诱导错误的行为。然而,测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式,损坏的网络继续正常输入的预期工作,并且只有当攻击者决定激活网络内隐藏的后门时,才会发生恶意行为。在过去几年中,后门攻击一直是强烈的研究活动的主题,重点是新的攻击阶段的发展,以及可能对策的提议。此概述文件的目标是审查发表的作品,直到现在,分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量,以及防御者验证用于培训的数据的完整性,并监控DNN在培训和测试中的操作时间。因此,拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。
translated by 谷歌翻译
计算能力和大型培训数据集的可用性增加,机器学习的成功助长了。假设它充分代表了在测试时遇到的数据,则使用培训数据来学习新模型或更新现有模型。这种假设受到中毒威胁的挑战,这种攻击会操纵训练数据,以损害模型在测试时的表现。尽管中毒已被认为是行业应用中的相关威胁,到目前为止,已经提出了各种不同的攻击和防御措施,但对该领域的完整系统化和批判性审查仍然缺失。在这项调查中,我们在机器学习中提供了中毒攻击和防御措施的全面系统化,审查了过去15年中该领域发表的100多篇论文。我们首先对当前的威胁模型和攻击进行分类,然后相应地组织现有防御。虽然我们主要关注计算机视觉应用程序,但我们认为我们的系统化还包括其他数据模式的最新攻击和防御。最后,我们讨论了中毒研究的现有资源,并阐明了当前的局限性和该研究领域的开放研究问题。
translated by 谷歌翻译
在产生无形的后门攻击中毒数据期间,特征空间转换操作往往会导致一些中毒特征的丧失,并削弱了与触发器和目标标签之间的源图像之间的映射关系,从而导致需要更高的中毒率以实现相应的后门攻击成功率。为了解决上述问题,我们首次提出了功能修复的想法,并引入了盲水印技术,以修复在中毒数据中损失的中毒特征。在确保一致的标签的前提下,我们提出了基于功能维修的低毒速率看不见的后门攻击,名为FRIB。从上面的设计概念中受益,新方法增强了源图像与触发器和目标标签之间的映射关系,并增加了误导性DNN的程度,从而获得了高后门攻击成功率,中毒率非常低。最终,详细的实验结果表明,在所有MNIST,CIFAR10,GTSRB和Imagenet数据集中实现了高成功攻击成功率的高成功率的目标。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motion-based triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -- models trained under Flareon learn powerful target-conditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community: https://github.com/lafeat/flareon.
translated by 谷歌翻译
Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
As a critical threat to deep neural networks (DNNs), backdoor attacks can be categorized into two types, i.e., source-agnostic backdoor attacks (SABAs) and source-specific backdoor attacks (SSBAs). Compared to traditional SABAs, SSBAs are more advanced in that they have superior stealthier in bypassing mainstream countermeasures that are effective against SABAs. Nonetheless, existing SSBAs suffer from two major limitations. First, they can hardly achieve a good trade-off between ASR (attack success rate) and FPR (false positive rate). Besides, they can be effectively detected by the state-of-the-art (SOTA) countermeasures (e.g., SCAn). To address the limitations above, we propose a new class of viable source-specific backdoor attacks, coined as CASSOCK. Our key insight is that trigger designs when creating poisoned data and cover data in SSBAs play a crucial role in demonstrating a viable source-specific attack, which has not been considered by existing SSBAs. With this insight, we focus on trigger transparency and content when crafting triggers for poisoned dataset where a sample has an attacker-targeted label and cover dataset where a sample has a ground-truth label. Specifically, we implement $CASSOCK_{Trans}$ and $CASSOCK_{Cont}$. While both they are orthogonal, they are complementary to each other, generating a more powerful attack, called $CASSOCK_{Comp}$, with further improved attack performance and stealthiness. We perform a comprehensive evaluation of the three $CASSOCK$-based attacks on four popular datasets and three SOTA defenses. Compared with a representative SSBA as a baseline ($SSBA_{Base}$), $CASSOCK$-based attacks have significantly advanced the attack performance, i.e., higher ASR and lower FPR with comparable CDA (clean data accuracy). Besides, $CASSOCK$-based attacks have effectively bypassed the SOTA defenses, and $SSBA_{Base}$ cannot.
translated by 谷歌翻译
后门攻击误导机器学习模型以在测试时间呈现特定触发时输出攻击者指定的类。这些攻击需要毒害训练数据来损害学习算法,例如,通过将包含触发器的中毒样本注入训练集中,以及所需的类标签。尽管对后门攻击和防御的研究数量越来越多,但影响后门攻击成功的潜在因素以及它们对学习算法的影响尚未得到很好的理解。在这项工作中,我们的目标是通过揭幕揭示触发样本周围的更光滑的决策功能来阐明这一问题 - 这是我们称之为\ Textit {后门平滑}的现象。为了量化后门平滑,我们定义了一种评估与输入样本周围分类器的预测相关的不确定性的度量。我们的实验表明,当触发器添加到输入样本时,平滑度会增加,并且这种现象更加明显,以获得更成功的攻击。我们还提供了初步证据,后者触发器不是唯一的平滑诱导模式,而是可以通过我们的方法来检测其他人工图案,铺平了解当前防御和设计新颖的局限性。
translated by 谷歌翻译
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
translated by 谷歌翻译
后门深度学习(DL)模型的行为通常在清洁输入上,但在触发器输入时不端行为,因为后门攻击者希望为DL模型部署构成严重后果。最先进的防御是限于特定的后门攻击(源无关攻击)或在该机器学习(ML)专业知识或昂贵的计算资源中不适用于源友好的攻击。这项工作观察到所有现有的后门攻击都具有不可避免的内在弱点,不可转换性,即触发器输入劫持劫持模型,但不能对另一个尚未植入同一后门的模型有效。通过此密钥观察,我们提出了不可转换性的反向检测(NTD)来识别运行时在运行时的模型欠测试(MUT)的触发输入。特定,NTD允许潜在的回溯静电预测输入的类别。同时,NTD利用特征提取器(FE)来提取输入的特征向量,并且从其预测类随机拾取的一组样本,然后比较FE潜在空间中的输入和样本之间的相似性。如果相似性低,则输入是对逆势触发输入;否则,良性。 FE是一个免费的预训练模型,私下从开放平台保留。随着FE和MUT来自不同来源,攻击者非常不可能将相同的后门插入其中两者。由于不可转换性,不能将突变处工作的触发效果转移到FE,使NTD对不同类型的后门攻击有效。我们在三个流行的定制任务中评估NTD,如面部识别,交通标志识别和一般动物分类,结果确认NDT具有高效率(低假验收率)和具有低检测延迟的可用性(低误报率)。
translated by 谷歌翻译
深度神经网络容易受到来自对抗性投入的攻击,并且最近,特洛伊木马误解或劫持模型的决定。我们通过探索有界抗逆性示例空间和生成的对抗网络内的自然输入空间来揭示有界面的对抗性实例 - 通用自然主义侵害贴片的兴趣类 - 我们呼叫TNT。现在,一个对手可以用一个自然主义的补丁来手臂自己,不太恶意,身体上可实现,高效 - 实现高攻击成功率和普遍性。 TNT是普遍的,因为在场景中的TNT中捕获的任何输入图像都将:i)误导网络(未确定的攻击);或ii)迫使网络进行恶意决定(有针对性的攻击)。现在,有趣的是,一个对抗性补丁攻击者有可能发挥更大的控制水平 - 选择一个独立,自然的贴片的能力,与被限制为嘈杂的扰动的触发器 - 到目前为止只有可能与特洛伊木马攻击方法有可能干扰模型建设过程,以嵌入风险发现的后门;但是,仍然意识到在物理世界中部署的补丁。通过对大型视觉分类任务的广泛实验,想象成在其整个验证集50,000张图像中进行评估,我们展示了TNT的现实威胁和攻击的稳健性。我们展示了攻击的概括,以创建比现有最先进的方法实现更高攻击成功率的补丁。我们的结果表明,攻击对不同的视觉分类任务(CIFAR-10,GTSRB,PUBFIG)和多个最先进的深神经网络,如WieredEnet50,Inception-V3和VGG-16。
translated by 谷歌翻译