智能论文笔记

Defense Against Multi-target Trojan Attacks

Haripriya Harikumar , Santu Rana , Kien Do , Sunil Gupta , Wei Zong , Willy Susilo , Svetha Venkastesh

分类：计算机视觉

2022-07-08

对基于深度学习的模型的对抗性攻击对当前的AI基础架构构成了重大威胁。其中，特洛伊木马袭击是最难防御的。在本文中，我们首先引入了Badnet类型的攻击变体，该攻击将特洛伊木马后门引入多个目标类，并允许将触发器放置在图像中的任何位置。前者使其更有效，后者使在物理空间中进行攻击变得非常容易。这种威胁模型的最先进的特洛伊木马检测方法失败了。为了防止这种攻击，我们首先引入了一种触发反向工程机制，该机制使用多个图像来恢复各种潜在的触发器。然后，我们通过测量此类恢复触发器的可传递性提出了检测机制。特洛伊木马触发器的可传递性将非常高，即它们使其他图像也进入同一类。我们研究攻击方法的许多实际优势，然后使用各种图像数据集证明检测性能。实验结果表明，我们方法的卓越检测性能超过了最新的。

translated by 谷歌翻译

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Yansong Gao , Chang Xu , Derui Wang , Shiping Chen , Damith C. Ranasinghe , Surya Nepal

分类：

2019-02-18

A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.

translated by 谷歌翻译

Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

Yulong Wang , Minghui Zhao , Shenghong Li , Xin Yuan , Wei Ni

分类：计算机视觉 | 人工智能

2022-08-19

典型的深神经网络（DNN）后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中，我们提出了一个新的后门触发器，该扳机易于生成，不可察觉和高效。新的触发器是一个均匀生成的三维（3D）二进制图案，可以水平和/或垂直重复和镜像，并将其超级贴在三通道图像上，以训练后式DNN模型。新型触发器分散在整个图像中，对单个像素产生微弱的扰动，但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明，随着图像的分辨率提高，触发因素越来越有效。实验是使用MNIST，CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面，新触发的表现优于现有的触发器，例如Badnet，Trojaned NN和隐藏的后门。新的触发因素达到了几乎100％的攻击成功率，仅将分类准确性降低了不到0.7％-2.4％，并使最新的防御技术无效。

translated by 谷歌翻译

An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences

Wei Guo , Benedetta Tondi , Mauro Barni

分类：计算机视觉

2021-11-16

与令人印象深刻的进步触动了我们社会的各个方面，基于深度神经网络（DNN）的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注，但是通过干扰培训过程来利用破坏DNN模型的可能性，代表了破坏训练过程的可能性，这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中，攻击者损坏了培训数据，以便在测试时间诱导错误的行为。然而，测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式，损坏的网络继续正常输入的预期工作，并且只有当攻击者决定激活网络内隐藏的后门时，才会发生恶意行为。在过去几年中，后门攻击一直是强烈的研究活动的主题，重点是新的攻击阶段的发展，以及可能对策的提议。此概述文件的目标是审查发表的作品，直到现在，分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量，以及防御者验证用于培训的数据的完整性，并监控DNN在培训和测试中的操作时间。因此，拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。

translated by 谷歌翻译

NNoculation: Catching BadNets in the Wild

Akshaj Kumar Veldanda , Kang Liu , Benjamin Tan , Prashanth Krishnamurthy , Farshad Khorrami , Ramesh Karri , Brendan Dolan-Gavitt , Siddharth Garg

分类：机器学习

2020-02-19

本文提出了针对回顾性神经网络（Badnets）的新型两级防御（NNOCULICULE），该案例在响应该字段中遇到的回溯测试输入，修复了预部署和在线的BADNET。在预部署阶段，NNICULICULE与清洁验证输入的随机扰动进行检测，以部分减少后门的对抗影响。部署后，NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即，它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序，产生我们的最终防御。关于全面的后门攻击套件的实证评估表明，NNOCLICULE优于所有最先进的防御，以制定限制性假设，并且仅在特定的后门攻击上工作，或者在适应性攻击中失败。相比之下，NNICULICULE使得最小的假设并提供有效的防御，即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。

translated by 谷歌翻译

Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

Xi Li , Zhen Xiang , David J. Miller , George Kesidis

分类：机器学习

2021-12-06

后门（特洛伊木马）攻击正在对深度神经网络（DNN）产生威胁。每当来自任何源类的测试样本都嵌入后门图案时，DNN被攻击将预测到攻击者期望的目标类;在正确分类干净（无攻击）测试样本时。现有的后门防御在检测到DNN是攻击和逆向工程的“培训后”制度的反向工程方面取得了成功：防御者可以访问要检查的DNN和独立收集的小型清洁数据集，但是无法访问DNN的（可能中毒）培训集。然而，这些防御既不触发后门映射的行为也不抓住罪魁祸首，也不是在试验时间下减轻后门攻击。在本文中，我们提出了一个“飞行中的”防范反向攻击对图像分类的攻击，其中1）检测在试验时间时使用后门触发的使用; 2）Infers为检测到的触发器示例中的原始原点（源类）。我们防御的有效性是针对不同强大的后门攻击实验证明的。

translated by 谷歌翻译

Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Kien Do , Haripriya Harikumar , Hung Le , Dung Nguyen , Truyen Tran , Santu Rana , Dang Nguyen , Willy Susilo , Svetha Venkatesh

分类：人工智能 | 计算机视觉 | 机器学习

2022-02-24

特洛伊木马对深度神经网络的攻击既危险又秘密。在过去的几年中，特洛伊木马的攻击从仅使用单个输入 - 不知不线的触发器和仅针对一个类别使用多个输入特异性触发器和定位多个类的类别。但是，特洛伊木马的防御尚未赶上这一发展。大多数防御方法仍然使对特洛伊木马触发器和目标类别的假设不足，因此，现代特洛伊木马的攻击很容易被规避。为了解决这个问题，我们提出了两种新颖的“过滤”防御措施，称为变分输入过滤（VIF）和对抗输入过滤（AIF），它们分别利用有损数据压缩和对抗性学习，以有效地纯化潜在的Trojan触发器，而无需在运行时间内触发潜在的Trojan触发器。对触发器/目标类的数量或触发器的输入依赖性属性做出假设。此外，我们还引入了一种称为“过滤 - 对抗性”（FTC）的新防御机制，该机制有助于避免通过“过滤”引起的清洁数据的分类准确性下降，并将其与VIF/AIF结合起来，从种类。广泛的实验结果和消融研究表明，我们提议的防御能力在减轻五次高级特洛伊木马攻击方面显着优于众所周知的基线防御能力，包括最近的两次最新一次，同时对少量训练数据和大型触发器非常强大。

translated by 谷歌翻译

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

Tong Wang , Yuan Yao , Feng Xu , Miao Xu , Shengwei An , Ting Wang

分类：计算机视觉

2022-08-13

后门攻击已被证明是对深度学习模型的严重安全威胁，并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是，在许多情况下，尤其是对于高级后门攻击，违反了上述观察结果，阻碍了现有防御的性能和适用性。在本文中，我们提出了基于新观察的后门防御范围。也就是说，有效的后门攻击通常需要对中毒训练样本的高预测置信度，以确保训练有素的模型具有很高的可能性。基于此观察结果，Dtinspector首先学习一个可以改变最高信心数据的预测的补丁，然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击，四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。

translated by 谷歌翻译

Poison Ink: Robust and Invisible Backdoor Attack

Jie Zhang , Dongdong Chen , Qidong Huang , Jing Liao , Weiming Zhang , Huamin Feng , Gang Hua , Nenghai Yu

分类：计算机视觉

2021-08-05

最近的研究表明，深层神经网络容易受到不同类型的攻击，例如对抗性攻击，数据中毒攻击和后门攻击。其中，后门攻击是最狡猾的攻击，几乎可以在深度学习管道的每个阶段发生。因此，后门攻击吸引了学术界和行业的许多兴趣。但是，大多数现有的后门攻击方法对于某些轻松的预处理（例如常见数据转换）都是可见的或脆弱的。为了解决这些限制，我们提出了一种强大而无形的后门攻击，称为“毒药”。具体而言，我们首先利用图像结构作为目标中毒区域，并用毒药（信息）填充它们以生成触发图案。由于图像结构可以在数据转换期间保持其语义含义，因此这种触发模式对数据转换本质上是强大的。然后，我们利用深度注射网络将这种触发模式嵌入封面图像中，以达到隐身性。与现有流行的后门攻击方法相比，毒药的墨水在隐形和健壮性方面都优于表现。通过广泛的实验，我们证明了毒药不仅是不同数据集和网络体系结构的一般性，而且对于不同的攻击场景也很灵活。此外，它对许多最先进的防御技术也具有非常强烈的抵抗力。

translated by 谷歌翻译

CASSOCK: Viable Backdoor Attacks against DNN in The Wall of Source-Specific Backdoor Defences

Shang Wang , Yansong Gao , Anmin Fu , Zhi Zhang , Yuqing Zhang , Willy Susilo , Dongxi Liu

分类：机器学习

2022-05-31

As a critical threat to deep neural networks (DNNs), backdoor attacks can be categorized into two types, i.e., source-agnostic backdoor attacks (SABAs) and source-specific backdoor attacks (SSBAs). Compared to traditional SABAs, SSBAs are more advanced in that they have superior stealthier in bypassing mainstream countermeasures that are effective against SABAs. Nonetheless, existing SSBAs suffer from two major limitations. First, they can hardly achieve a good trade-off between ASR (attack success rate) and FPR (false positive rate). Besides, they can be effectively detected by the state-of-the-art (SOTA) countermeasures (e.g., SCAn). To address the limitations above, we propose a new class of viable source-specific backdoor attacks, coined as CASSOCK. Our key insight is that trigger designs when creating poisoned data and cover data in SSBAs play a crucial role in demonstrating a viable source-specific attack, which has not been considered by existing SSBAs. With this insight, we focus on trigger transparency and content when crafting triggers for poisoned dataset where a sample has an attacker-targeted label and cover dataset where a sample has a ground-truth label. Specifically, we implement $CASSOCK_{Trans}$ and $CASSOCK_{Cont}$. While both they are orthogonal, they are complementary to each other, generating a more powerful attack, called $CASSOCK_{Comp}$, with further improved attack performance and stealthiness. We perform a comprehensive evaluation of the three $CASSOCK$-based attacks on four popular datasets and three SOTA defenses. Compared with a representative SSBA as a baseline ($SSBA_{Base}$), $CASSOCK$-based attacks have significantly advanced the attack performance, i.e., higher ASR and lower FPR with comparable CDA (clean data accuracy). Besides, $CASSOCK$-based attacks have effectively bypassed the SOTA defenses, and $SSBA_{Base}$ cannot.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Poison Forensics: Traceback of Data Poisoning Attacks in Neural Networks

Shawn Shan , Arjun Nitin Bhagoji , Haitao Zheng , Ben Y. Zhao

分类：人工智能

2021-10-13

在对抗机器学习中，防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下，法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充，并为缓解措施提供前进的途径，以防止将来采取类似的攻击。在本文中，我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案，该解决方案修剪了“无辜”训练样本，直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响，然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签（后门）毒物攻击和三种类型的清洁标签毒药攻击的功效，这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4％的精度和96.8％的召回。我们还表明，我们的系统与专门攻击它的四种抗纤维法措施相对强大。

translated by 谷歌翻译

The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram Matrices

Wanlun Ma , Derui Wang , Ruoxi Sun , Minhui Xue , Sheng Wen , Yang Xiang

分类：人工智能

2022-09-23

深度神经网络（DNNS）在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常起作用，但是当输入中的某些模式触发时，会产生预定义的目标标签。现有防御通常依赖于通用后门设置的假设，其中有毒样品共享相同的均匀扳机。但是，最近的高级后门攻击表明，这种假设在动态后门中不再有效，在动态后门中，触发者因输入而异，从而击败了现有的防御。在这项工作中，我们提出了一种新颖的技术BEATRIX（通过革兰氏矩阵检测）。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性，还可以捕获表示形式的适当高阶信息。通过从正常样本的激活模式中学习类条件统计，BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能，BEATRIX利用基于内核的测试，而无需对表示分布进行任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明，我们的方法在检测动态后门时达到了91.1％的F1得分，而最新技术只能达到36.9％。

translated by 谷歌翻译

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

Guanhong Tao , Zhenting Wang , Siyuan Cheng , Shiqing Ma , Shengwei An , Yingqi Liu , Guangyu Shen , Zhuo Zhang , Yunshu Mao , Xiangyu Zhang

分类：机器学习

2022-11-29

We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities. We find that natural backdoors are widely existing, with most injected backdoor attacks having natural correspondences. We categorize these natural backdoors and propose a general detection framework. It finds 315 natural backdoors in the 56 normally trained models downloaded from the Internet, covering all the different categories, while existing scanners designed for injected backdoors can at most detect 65 backdoors. We also study the root causes and defense of natural backdoors.

translated by 谷歌翻译

Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation

Tong Wu , Tianhao Wang , Vikash Sehwag , Saeed Mahloujifar , Prateek Mittal

分类：计算机视觉 | 机器学习

2022-07-22

最近的作品表明，深度学习模型容易受到后门中毒攻击的影响，在这些攻击中，这些攻击灌输了与外部触发模式或物体（例如贴纸，太阳镜等）的虚假相关性。我们发现这种外部触发信号是不必要的，因为可以使用基于旋转的图像转换轻松插入高效的后门。我们的方法通过旋转有限数量的对象并将其标记错误来构建中毒数据集；一旦接受过培训，受害者的模型将在运行时间推理期间做出不良的预测。它表现出明显的攻击成功率，同时通过有关图像分类和对象检测任务的全面实证研究来保持清洁绩效。此外，我们评估了标准数据增强技术和针对我们的攻击的四种不同的后门防御措施，发现它们都无法作为一致的缓解方法。正如我们在图像分类和对象检测应用程序中所示，我们的攻击只能在现实世界中轻松部署在现实世界中。总体而言，我们的工作突出了一个新的，简单的，物理上可实现的，高效的矢量，用于后门攻击。我们的视频演示可在https://youtu.be/6jif8wnx34m上找到。

translated by 谷歌翻译

Hidden Trigger Backdoor Attacks

Aniruddha Saha , Akshayvarun Subramanya , Hamed Pirsiavash

分类：

2019-09-30

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time.We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.

translated by 谷歌翻译

Backdoor Attacks on Time Series: A Generative Approach

Yujing Jiang , Xingjun Ma , Sarah Monazam Erfani , James Bailey

分类：机器学习

2022-11-15

Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.

translated by 谷歌翻译

Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

Hossein Souri , Liam Fowl , Rama Chellappa , Micah Goldblum , Tom Goldstein

分类：机器学习 | 计算机视觉

2021-06-16

随着机器学习数据的策展变得越来越自动化，数据集篡改是一种安装威胁。后门攻击者通过培训数据篡改，以嵌入在该数据上培训的模型中的漏洞。然后通过将“触发”放入模型的输入中的推理时间以推理时间激活此漏洞。典型的后门攻击将触发器直接插入训练数据，尽管在检查时可能会看到这种攻击。相比之下，隐藏的触发后托攻击攻击达到中毒，而无需将触发器放入训练数据即可。然而，这种隐藏的触发攻击在从头开始培训的中毒神经网络时无效。我们开发了一个新的隐藏触发攻击，睡眠代理，在制备过程中使用梯度匹配，数据选择和目标模型重新培训。睡眠者代理是第一个隐藏的触发后门攻击，以对从头开始培训的神经网络有效。我们展示了Imagenet和黑盒设置的有效性。我们的实现代码可以在https://github.com/hsouri/sleeper-agent找到。

translated by 谷歌翻译

Backdoor Attacks on Vision Transformers

Akshayvarun Subramanya , Aniruddha Saha , Soroush Abbasi Koohpayegani , Ajinkya Tejankar , Hamed Pirsiavash

分类：计算机视觉 | 机器学习

2022-06-16

视觉变压器（VIT）最近在各种视觉任务上表现出了典范的性能，并被用作CNN的替代方案。它们的设计基于一种自我发挥的机制，该机制将图像作为一系列斑块进行处理，与CNN相比，这是完全不同的。因此，研究VIT是否容易受到后门攻击的影响很有趣。当攻击者出于恶意目的，攻击者毒害培训数据的一小部分时，就会发生后门攻击。模型性能在干净的测试图像上很好，但是攻击者可以通过在测试时间显示触发器来操纵模型的决策。据我们所知，我们是第一个证明VIT容易受到后门攻击的人。我们还发现VIT和CNNS之间存在着有趣的差异 - 解释算法有效地突出了VIT的测试图像的触发因素，但没有针对CNN。基于此观察结果，我们提出了一个测试时间图像阻止VIT的防御，这将攻击成功率降低了很大。代码可在此处找到：https：//github.com/ucdvision/backdoor_transformer.git

translated by 谷歌翻译

One-shot Neural Backdoor Erasing via Adversarial Weight Masking

Shuwen Chai , Jinghui Chen

分类：机器学习 | 人工智能

2022-07-10

最近的研究表明，尽管在许多现实世界应用上达到了很高的精度，但深度神经网络（DNN）可以被换式：通过将触发的数据样本注入培训数据集中，对手可以将受过训练的模型误导到将任何测试数据分类为将任何测试数据分类为只要提出触发模式，目标类。为了消除此类后门威胁，已经提出了各种方法。特别是，一系列研究旨在净化潜在的损害模型。但是，这项工作的一个主要限制是访问足够的原始培训数据的要求：当可用的培训数据受到限制时，净化性能要差得多。在这项工作中，我们提出了对抗重量掩蔽（AWM），这是一种即使在单一设置中也能擦除神经后门的新颖方法。我们方法背后的关键思想是将其提出为最小最大优化问题：首先，对抗恢复触发模式，然后（软）掩盖对恢复模式敏感的网络权重。对几个基准数据集的全面评估表明，AWM在很大程度上可以改善对各种可用培训数据集大小的其他最先进方法的纯化效果。

translated by 谷歌翻译