智能论文笔记

Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks

Xi Li , Zhen Xiang , David J. Miller , George Kesidis

分类：机器学习

2021-12-06

后门（特洛伊木马）攻击正在对深度神经网络（DNN）产生威胁。每当来自任何源类的测试样本都嵌入后门图案时，DNN被攻击将预测到攻击者期望的目标类;在正确分类干净（无攻击）测试样本时。现有的后门防御在检测到DNN是攻击和逆向工程的“培训后”制度的反向工程方面取得了成功：防御者可以访问要检查的DNN和独立收集的小型清洁数据集，但是无法访问DNN的（可能中毒）培训集。然而，这些防御既不触发后门映射的行为也不抓住罪魁祸首，也不是在试验时间下减轻后门攻击。在本文中，我们提出了一个“飞行中的”防范反向攻击对图像分类的攻击，其中1）检测在试验时间时使用后门触发的使用; 2）Infers为检测到的触发器示例中的原始原点（源类）。我们防御的有效性是针对不同强大的后门攻击实验证明的。

translated by 谷歌翻译

NNoculation: Catching BadNets in the Wild

Akshaj Kumar Veldanda , Kang Liu , Benjamin Tan , Prashanth Krishnamurthy , Farshad Khorrami , Ramesh Karri , Brendan Dolan-Gavitt , Siddharth Garg

分类：机器学习

2020-02-19

本文提出了针对回顾性神经网络（Badnets）的新型两级防御（NNOCULICULE），该案例在响应该字段中遇到的回溯测试输入，修复了预部署和在线的BADNET。在预部署阶段，NNICULICULE与清洁验证输入的随机扰动进行检测，以部分减少后门的对抗影响。部署后，NNOCULICULE通过在原始和预先部署修补网络之间录制分歧来检测和隔离测试输入。然后培训Constcan以学习清洁验证和隔离输入之间的转换;即，它学会添加触发器来清洁验证图像。回顾验证图像以及其正确的标签用于进一步重新培训预修补程序，产生我们的最终防御。关于全面的后门攻击套件的实证评估表明，NNOCLICULE优于所有最先进的防御，以制定限制性假设，并且仅在特定的后门攻击上工作，或者在适应性攻击中失败。相比之下，NNICULICULE使得最小的假设并提供有效的防御，即使在现有防御因攻击者而导致其限制假设而导致的现有防御无效的情况下。

translated by 谷歌翻译

CASSOCK: Viable Backdoor Attacks against DNN in The Wall of Source-Specific Backdoor Defences

Shang Wang , Yansong Gao , Anmin Fu , Zhi Zhang , Yuqing Zhang , Willy Susilo , Dongxi Liu

分类：机器学习

2022-05-31

As a critical threat to deep neural networks (DNNs), backdoor attacks can be categorized into two types, i.e., source-agnostic backdoor attacks (SABAs) and source-specific backdoor attacks (SSBAs). Compared to traditional SABAs, SSBAs are more advanced in that they have superior stealthier in bypassing mainstream countermeasures that are effective against SABAs. Nonetheless, existing SSBAs suffer from two major limitations. First, they can hardly achieve a good trade-off between ASR (attack success rate) and FPR (false positive rate). Besides, they can be effectively detected by the state-of-the-art (SOTA) countermeasures (e.g., SCAn). To address the limitations above, we propose a new class of viable source-specific backdoor attacks, coined as CASSOCK. Our key insight is that trigger designs when creating poisoned data and cover data in SSBAs play a crucial role in demonstrating a viable source-specific attack, which has not been considered by existing SSBAs. With this insight, we focus on trigger transparency and content when crafting triggers for poisoned dataset where a sample has an attacker-targeted label and cover dataset where a sample has a ground-truth label. Specifically, we implement $CASSOCK_{Trans}$ and $CASSOCK_{Cont}$. While both they are orthogonal, they are complementary to each other, generating a more powerful attack, called $CASSOCK_{Comp}$, with further improved attack performance and stealthiness. We perform a comprehensive evaluation of the three $CASSOCK$-based attacks on four popular datasets and three SOTA defenses. Compared with a representative SSBA as a baseline ($SSBA_{Base}$), $CASSOCK$-based attacks have significantly advanced the attack performance, i.e., higher ASR and lower FPR with comparable CDA (clean data accuracy). Besides, $CASSOCK$-based attacks have effectively bypassed the SOTA defenses, and $SSBA_{Base}$ cannot.

translated by 谷歌翻译

Backdoor Attacks on Time Series: A Generative Approach

Yujing Jiang , Xingjun Ma , Sarah Monazam Erfani , James Bailey

分类：机器学习

2022-11-15

Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.

translated by 谷歌翻译

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer

Tong Wang , Yuan Yao , Feng Xu , Miao Xu , Shengwei An , Ting Wang

分类：计算机视觉

2022-08-13

后门攻击已被证明是对深度学习模型的严重安全威胁，并且检测给定模型是否已成为后门成为至关重要的任务。现有的防御措施主要建立在观察到后门触发器通常尺寸很小或仅影响几个神经元激活的观察结果。但是，在许多情况下，尤其是对于高级后门攻击，违反了上述观察结果，阻碍了现有防御的性能和适用性。在本文中，我们提出了基于新观察的后门防御范围。也就是说，有效的后门攻击通常需要对中毒训练样本的高预测置信度，以确保训练有素的模型具有很高的可能性。基于此观察结果，Dtinspector首先学习一个可以改变最高信心数据的预测的补丁，然后通过检查在低信心数据上应用学习补丁后检查预测变化的比率来决定后门的存在。对五次后门攻击，四个数据集和三种高级攻击类型的广泛评估证明了拟议防御的有效性。

translated by 谷歌翻译

Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

Yulong Wang , Minghui Zhao , Shenghong Li , Xin Yuan , Wei Ni

分类：计算机视觉 | 人工智能

2022-08-19

典型的深神经网络（DNN）后门攻击基于输入中嵌入的触发因素。现有的不可察觉的触发因素在计算上昂贵或攻击成功率低。在本文中，我们提出了一个新的后门触发器，该扳机易于生成，不可察觉和高效。新的触发器是一个均匀生成的三维（3D）二进制图案，可以水平和/或垂直重复和镜像，并将其超级贴在三通道图像上，以训练后式DNN模型。新型触发器分散在整个图像中，对单个像素产生微弱的扰动，但共同拥有强大的识别模式来训练和激活DNN的后门。我们还通过分析表明，随着图像的分辨率提高，触发因素越来越有效。实验是使用MNIST，CIFAR-10和BTSR数据集上的RESNET-18和MLP模型进行的。在无遗象的方面，新触发的表现优于现有的触发器，例如Badnet，Trojaned NN和隐藏的后门。新的触发因素达到了几乎100％的攻击成功率，仅将分类准确性降低了不到0.7％-2.4％，并使最新的防御技术无效。

translated by 谷歌翻译

Backdoor Attack Detection in Computer Vision by Applying Matrix Factorization on the Weights of Deep Networks

Khondoker Murad Hossain , Tim Oates

分类：计算机视觉 | 人工智能

2022-12-15

The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine learning classifier. In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data, being applicable across domains, operating with a wide range of network architectures, not assuming the nature of the triggers used to change network behavior, and being highly scalable. We discuss the detection pipeline, and then demonstrate the results on two computer vision datasets regarding image classification and object detection. Our method outperforms the competing algorithms in terms of efficiency and is more accurate, helping to ensure the safe application of deep learning and AI.

translated by 谷歌翻译

Data-free Backdoor Removal based on Channel Lipschitzness

Runkai Zheng , Rongjun Tang , Jianze Li , Li Liu

分类：机器学习

2022-08-05

最近的研究表明，深度神经网络（DNN）容易受到后门攻击的影响，后门攻击会导致DNN的恶意行为，当时特定的触发器附在输入图像上时。进一步证明，感染的DNN具有一系列通道，与正常通道相比，该通道对后门触发器更敏感。然后，将这些通道修剪可有效缓解后门行为。要定位这些通道，自然要考虑其Lipschitzness，这可以衡量他们对输入上最严重的扰动的敏感性。在这项工作中，我们介绍了一个名为Channel Lipschitz常数（CLC）的新颖概念，该概念定义为从输入图像到每个通道输出的映射的Lipschitz常数。然后，我们提供经验证据，以显示CLC（UCLC）上限与通道激活的触发激活变化之间的强相关性。由于可以从重量矩阵直接计算UCLC，因此我们可以以无数据的方式检测潜在的后门通道，并在感染的DNN上进行简单修剪以修复模型。提出的基于lipschitzness的通道修剪（CLP）方法非常快速，简单，无数据且可靠，可以选择修剪阈值。进行了广泛的实验来评估CLP的效率和有效性，CLP的效率和有效性也可以在主流防御方法中获得最新的结果。源代码可在https://github.com/rkteddy/channel-lipschitzness基于普通范围内获得。

translated by 谷歌翻译

Adversarial Fine-tuning for Backdoor Defense: Connecting Backdoor Attacks to Adversarial Attacks

Bingxu Mu , Zhenxing Niu , Le Wang , Xue Wang , Rong Jin , Gang Hua

分类：计算机视觉

2022-02-13

已知深层神经网络（DNN）容易受到后门攻击和对抗攻击的影响。在文献中，这两种攻击通常被视为明显的问题并分别解决，因为它们分别属于训练时间和推理时间攻击。但是，在本文中，我们发现它们之间有一个有趣的联系：对于具有后门种植的模型，我们观察到其对抗性示例具有与触发样品相似的行为，即都激活了同一DNN神经元的子集。这表明将后门种植到模型中会严重影响模型的对抗性例子。基于这一观察结果，我们设计了一种新的对抗性微调（AFT）算法，以防止后门攻击。我们从经验上表明，在5次最先进的后门攻击中，我们的船尾可以有效地擦除后门触发器，而无需在干净的样品上明显的性能降解，并显着优于现有的防御方法。

translated by 谷歌翻译

Defense Against Multi-target Trojan Attacks

Haripriya Harikumar , Santu Rana , Kien Do , Sunil Gupta , Wei Zong , Willy Susilo , Svetha Venkastesh

分类：计算机视觉

2022-07-08

对基于深度学习的模型的对抗性攻击对当前的AI基础架构构成了重大威胁。其中，特洛伊木马袭击是最难防御的。在本文中，我们首先引入了Badnet类型的攻击变体，该攻击将特洛伊木马后门引入多个目标类，并允许将触发器放置在图像中的任何位置。前者使其更有效，后者使在物理空间中进行攻击变得非常容易。这种威胁模型的最先进的特洛伊木马检测方法失败了。为了防止这种攻击，我们首先引入了一种触发反向工程机制，该机制使用多个图像来恢复各种潜在的触发器。然后，我们通过测量此类恢复触发器的可传递性提出了检测机制。特洛伊木马触发器的可传递性将非常高，即它们使其他图像也进入同一类。我们研究攻击方法的许多实际优势，然后使用各种图像数据集证明检测性能。实验结果表明，我们方法的卓越检测性能超过了最新的。

translated by 谷歌翻译

One-shot Neural Backdoor Erasing via Adversarial Weight Masking

Shuwen Chai , Jinghui Chen

分类：机器学习 | 人工智能

2022-07-10

最近的研究表明，尽管在许多现实世界应用上达到了很高的精度，但深度神经网络（DNN）可以被换式：通过将触发的数据样本注入培训数据集中，对手可以将受过训练的模型误导到将任何测试数据分类为将任何测试数据分类为只要提出触发模式，目标类。为了消除此类后门威胁，已经提出了各种方法。特别是，一系列研究旨在净化潜在的损害模型。但是，这项工作的一个主要限制是访问足够的原始培训数据的要求：当可用的培训数据受到限制时，净化性能要差得多。在这项工作中，我们提出了对抗重量掩蔽（AWM），这是一种即使在单一设置中也能擦除神经后门的新颖方法。我们方法背后的关键思想是将其提出为最小最大优化问题：首先，对抗恢复触发模式，然后（软）掩盖对恢复模式敏感的网络权重。对几个基准数据集的全面评估表明，AWM在很大程度上可以改善对各种可用培训数据集大小的其他最先进方法的纯化效果。

translated by 谷歌翻译

Adaptive Perturbation Generation for Multiple Backdoors Detection

Yuhang Wang , Huafeng Shi , Rui Min , Ruijia Wu , Siyuan Liang , Yichao Wu , Ding Liang , Aishan Liu

分类：计算机视觉

2022-09-12

大量证据表明，深神经网络（DNN）容易受到后门攻击的影响，这激发了后门检测方法的发展。现有的后门检测方法通常是针对具有单个特定类型（例如基于补丁或基于扰动）的后门攻击而定制的。但是，在实践中，对手可能会产生多种类型的后门攻击，这挑战了当前的检测策略。基于以下事实：对抗性扰动与触发模式高度相关，本文提出了自适应扰动生成（APG）框架，以通过自适应注射对抗性扰动来检测多种类型的后门攻击。由于不同的触发模式在相同的对抗扰动下显示出高度多样的行为，因此我们首先设计了全球到本地策略，以通过调整攻击的区域和预算来适应多种类型的后门触发器。为了进一步提高扰动注入的效率，我们引入了梯度引导的掩模生成策略，以寻找最佳区域以进行对抗攻击。在多个数据集（CIFAR-10，GTSRB，Tiny-Imagenet）上进行的广泛实验表明，我们的方法以大幅度优于最先进的基线（+12％）。

translated by 谷歌翻译

Defending Backdoor Attacks on Vision Transformer via Patch Processing

Khoa D. Doan , Yingjie Lao , Peng Yang , Ping Li

分类：计算机视觉

2022-06-24

视觉变压器（VITS）具有与卷积神经网络相比，具有较小的感应偏置的根本不同的结构。随着绩效的提高，VIT的安全性和鲁棒性也非常重要。与许多最近利用VIT反对对抗性例子的鲁棒性的作品相反，本文调查了代表性的病因攻击，即后门。我们首先检查了VIT对各种后门攻击的脆弱性，发现VIT也很容易受到现有攻击的影响。但是，我们观察到，VIT的清洁数据准确性和后门攻击成功率在位置编码之前对补丁转换做出了明显的反应。然后，根据这一发现，我们为VIT提出了一种通过补丁处理来捍卫基于补丁的触发后门攻击的有效方法。在包括CIFAR10，GTSRB和Tinyimagenet在内的几个基准数据集上评估了这些表演，这些数据表明，该拟议的新颖防御在减轻VIT的后门攻击方面非常成功。据我们所知，本文提出了第一个防御性策略，该策略利用了反对后门攻击的VIT的独特特征。

translated by 谷歌翻译

Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain

Chang Yue , Peizhuo Lv , Ruigang Liang , Kai Chen

分类：机器学习

2022-07-09

随着深度神经网络（DNN）的广泛应用，后门攻击逐渐引起了人们的关注。后门攻击是阴险的，中毒模型在良性样本上的表现良好，只有在给定特定输入时才会触发，这会导致神经网络产生不正确的输出。最先进的后门攻击工作是通过数据中毒（即攻击者注入中毒样品中的数据集中）实施的，并且用该数据集训练的模型被后门感染。但是，当前研究中使用的大多数触发因素都是在一小部分图像上修补的固定图案，并且经常被明显错误地标记，这很容易被人类或防御方法（例如神经清洁和前哨）检测到。同样，DNN很难在没有标记的情况下学习，因为它们可能会忽略小图案。在本文中，我们提出了一种基于频域的广义后门攻击方法，该方法可以实现后门植入而不会错标和访问训练过程。它是人类看不见的，能够逃避常用的防御方法。我们在三个数据集（CIFAR-10，STL-10和GTSRB）的无标签和清洁标签案例中评估了我们的方法。结果表明，我们的方法可以在所有任务上实现高攻击成功率（高于90％），而不会在主要任务上进行大量绩效降解。此外，我们评估了我们的方法的旁路性能，以进行各种防御措施，包括检测训练数据（即激活聚类），输入的预处理（即过滤），检测输入（即Sentinet）和检测模型（即神经清洁）。实验结果表明，我们的方法对这种防御能力表现出极好的鲁棒性。

translated by 谷歌翻译

Mind Your Heart: Stealthy Backdoor Attack on Dynamic Deep Neural Network in Edge Computing

Tian Dong , Ziyuan Zhang , Han Qiu , Tianwei Zhang , Hewu Li , Terry Wang

分类：机器学习

2022-12-22

Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.

translated by 谷歌翻译

The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram Matrices

Wanlun Ma , Derui Wang , Ruoxi Sun , Minhui Xue , Sheng Wen , Yang Xiang

分类：人工智能

2022-09-23

深度神经网络（DNNS）在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常起作用，但是当输入中的某些模式触发时，会产生预定义的目标标签。现有防御通常依赖于通用后门设置的假设，其中有毒样品共享相同的均匀扳机。但是，最近的高级后门攻击表明，这种假设在动态后门中不再有效，在动态后门中，触发者因输入而异，从而击败了现有的防御。在这项工作中，我们提出了一种新颖的技术BEATRIX（通过革兰氏矩阵检测）。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性，还可以捕获表示形式的适当高阶信息。通过从正常样本的激活模式中学习类条件统计，BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能，BEATRIX利用基于内核的测试，而无需对表示分布进行任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明，我们的方法在检测动态后门时达到了91.1％的F1得分，而最新技术只能达到36.9％。

translated by 谷歌翻译

CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing

Haibo Jin , Ruoxi Chen , Jinyin Chen , Yao Cheng , Chong Fu , Ting Wang , Yue Yu , Zhaoyan Ming

分类：人工智能 | 计算机视觉

2021-12-24

在现实世界应用中的深度神经网络（DNN）的成功受益于丰富的预训练模型。然而，回溯预训练模型可以对下游DNN的部署构成显着的特洛伊木马威胁。现有的DNN测试方法主要旨在在对抗性设置中找到错误的角壳行为，但未能发现由强大的木马攻击所制作的后门。观察特洛伊木马网络行为表明，它们不仅由先前的工作所提出的单一受损神经元反射，而且归因于在多个神经元的激活强度和频率中的关键神经路径。这项工作制定了DNN后门测试，并提出了录音机框架。通过少量良性示例的关键神经元的差异模糊，我们识别特洛伊木马路径，特别是临界人，并通过模拟所识别的路径中的关键神经元来产生后门测试示例。广泛的实验表明了追索者的优越性，比现有方法更高的检测性能。通过隐秘的混合和自适应攻击来检测到后门的录音机更好，现有方法无法检测到。此外，我们的实验表明，录音所可能会揭示模型动物园中的模型的潜在潜在的背面。

translated by 谷歌翻译

Fine-Tuning Is All You Need to Mitigate Backdoor Attacks

Zeyang Sha , Xinlei He , Pascal Berrang , Mathias Humbert , Yang Zhang

分类：计算机视觉 | 机器学习

2022-12-18

Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.

translated by 谷歌翻译

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Yige Li , Xixiang Lyu , Nodens Koren , Lingjuan Lyu , Bo Li , Xingjun Ma

分类：机器学习 | 人工智能

2021-10-22

后门攻击已成为深度神经网络（DNN）的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果，但仍然尚不清楚是否可以设计强大的培训方法，以防止后门触发器首先注入训练的模型。在本文中，我们介绍了\ emph {反后门学习}的概念，旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看，我们确定了两个后门攻击的固有特征，因为他们的弱点2）后门任务与特定类（后门目标类）相关联。根据这两个弱点，我们提出了一般学习计划，反后门学习（ABL），在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制，帮助分离早期训练阶段的后台示例，2）在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验，针对10个最先进的攻击，我们经验证明，后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。

translated by 谷歌翻译

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Kang Liu , Brendan Dolan-Gavitt , Siddharth Garg

分类：

2018-05-30

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

translated by 谷歌翻译