智能论文笔记

Going In Style: Audio Backdoors Through Stylistic Transformations

Stefanos Koffas , Luca Pajola , Stjepan Picek , Mauro Conti

分类：机器学习

2022-11-06

A backdoor attack places triggers in victims' deep learning models to enable a targeted misclassification at testing time. In general, triggers are fixed artifacts attached to samples, making backdoor attacks easy to spot. Only recently, a new trigger generation harder to detect has been proposed: the stylistic triggers that apply stylistic transformations to the input samples (e.g., a specific writing style). Currently, stylistic backdoor literature lacks a proper formalization of the attack, which is established in this paper. Moreover, most studies of stylistic triggers focus on text and images, while there is no understanding of whether they can work in sound. This work fills this gap. We propose JingleBack, the first stylistic backdoor attack based on audio transformations such as chorus and gain. Using 444 models in a speech classification task, we confirm the feasibility of stylistic triggers in audio, achieving 96% attack success.

translated by 谷歌翻译

Backdoor Attacks on Time Series: A Generative Approach

Yujing Jiang , Xingjun Ma , Sarah Monazam Erfani , James Bailey

分类：机器学习

2022-11-15

Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.

translated by 谷歌翻译

VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

Hanbo Cai , Pengcheng Zhang , Hai Dong , Yan Xiao , Shunhui Ji

分类：人工智能 | 机器学习

2022-12-20

Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulating third-party training data. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Voiceprint Selection and Voice Conversion, abbreviated as VSVC. Experimental results demonstrated that VSVC is feasible to achieve an average attack success rate close to 97% in four victim models when poisoning less than 1% of the training data.

translated by 谷歌翻译

Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition

Hasan Abed Al Kader Hammoud , Shuming Liu , Mohammad Alkhrasi , Fahad AlBalawi , Bernard Ghanem

分类：计算机视觉 | 机器学习

2023-01-03

Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.

translated by 谷歌翻译

Enhancing Clean Label Backdoor Attack with Two-phase Specific Triggers

Nan Luo , Yuanzhang Li , Yajie Wang , Shangbo Wu , Yu-an Tan , Quanxin Zhang

分类：计算机视觉

2022-06-10

后门攻击威胁着深度神经网络（DNNS）。对于隐身性，研究人员提出了清洁标签的后门攻击，这要求对手不要更改中毒训练数据集的标签。由于正确的图像标签对，清洁标签的设置使攻击更加隐秘，但仍然存在一些问题：首先，传统的中毒训练数据方法无效；其次，传统的触发器并不是仍然可感知的隐形。为了解决这些问题，我们提出了一种两相和特定图像的触发器生成方法，以增强清洁标签的后门攻击。我们的方法是（1）功能强大：我们的触发器都可以同时促进后门攻击中的两个阶段（即后门植入和激活阶段）。（2）隐身：我们的触发器是从每个图像中生成的。它们是特定于图像的而不是固定触发器。广泛的实验表明，我们的方法可以达到奇妙的攻击成功率〜（98.98％），中毒率低（5％），在许多评估指标下高隐身，并且对后门防御方法有抵抗力。

translated by 谷歌翻译

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

Guanhong Tao , Zhenting Wang , Siyuan Cheng , Shiqing Ma , Shengwei An , Yingqi Liu , Guangyu Shen , Zhuo Zhang , Yunshu Mao , Xiangyu Zhang

分类：机器学习

2022-11-29

We conduct a systematic study of backdoor vulnerabilities in normally trained Deep Learning models. They are as dangerous as backdoors injected by data poisoning because both can be equally exploited. We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities. We find that natural backdoors are widely existing, with most injected backdoor attacks having natural correspondences. We categorize these natural backdoors and propose a general detection framework. It finds 315 natural backdoors in the 56 normally trained models downloaded from the Internet, covering all the different categories, while existing scanners designed for injected backdoors can at most detect 65 backdoors. We also study the root causes and defense of natural backdoors.

translated by 谷歌翻译

Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation

Tong Wu , Tianhao Wang , Vikash Sehwag , Saeed Mahloujifar , Prateek Mittal

分类：计算机视觉 | 机器学习

2022-07-22

最近的作品表明，深度学习模型容易受到后门中毒攻击的影响，在这些攻击中，这些攻击灌输了与外部触发模式或物体（例如贴纸，太阳镜等）的虚假相关性。我们发现这种外部触发信号是不必要的，因为可以使用基于旋转的图像转换轻松插入高效的后门。我们的方法通过旋转有限数量的对象并将其标记错误来构建中毒数据集；一旦接受过培训，受害者的模型将在运行时间推理期间做出不良的预测。它表现出明显的攻击成功率，同时通过有关图像分类和对象检测任务的全面实证研究来保持清洁绩效。此外，我们评估了标准数据增强技术和针对我们的攻击的四种不同的后门防御措施，发现它们都无法作为一致的缓解方法。正如我们在图像分类和对象检测应用程序中所示，我们的攻击只能在现实世界中轻松部署在现实世界中。总体而言，我们的工作突出了一个新的，简单的，物理上可实现的，高效的矢量，用于后门攻击。我们的视频演示可在https://youtu.be/6jif8wnx34m上找到。

translated by 谷歌翻译

Backdoor Attacks on Vision Transformers

Akshayvarun Subramanya , Aniruddha Saha , Soroush Abbasi Koohpayegani , Ajinkya Tejankar , Hamed Pirsiavash

分类：计算机视觉 | 机器学习

2022-06-16

视觉变压器（VIT）最近在各种视觉任务上表现出了典范的性能，并被用作CNN的替代方案。它们的设计基于一种自我发挥的机制，该机制将图像作为一系列斑块进行处理，与CNN相比，这是完全不同的。因此，研究VIT是否容易受到后门攻击的影响很有趣。当攻击者出于恶意目的，攻击者毒害培训数据的一小部分时，就会发生后门攻击。模型性能在干净的测试图像上很好，但是攻击者可以通过在测试时间显示触发器来操纵模型的决策。据我们所知，我们是第一个证明VIT容易受到后门攻击的人。我们还发现VIT和CNNS之间存在着有趣的差异 - 解释算法有效地突出了VIT的测试图像的触发因素，但没有针对CNN。基于此观察结果，我们提出了一个测试时间图像阻止VIT的防御，这将攻击成功率降低了很大。代码可在此处找到：https：//github.com/ucdvision/backdoor_transformer.git

translated by 谷歌翻译

More is Better (Mostly): On the Backdoor Attacks in Federated Graph Neural Networks

Jing Xu , Rui Wang , Stefanos Koffas , Kaitai Liang , Stjepan Picek

分类：机器学习

2022-02-07

图神经网络（GNN）是一类用于处理图形域信息的基于深度学习的方法。 GNN最近已成为一种广泛使用的图形分析方法，因为它们可以为复杂的图形数据学习表示形式。但是，由于隐私问题和法规限制，集中的GNN可能很难应用于数据敏感的情况。 Federated学习（FL）是一种新兴技术，为保护隐私设置而开发，当几个方需要协作培训共享的全球模型时。尽管几项研究工作已应用于培训GNN（联邦GNN），但对他们对后门攻击的稳健性没有研究。本文通过在联邦GNN中进行两种类型的后门攻击来弥合这一差距：集中式后门攻击（CBA）和分发后门攻击（DBA）。我们的实验表明，在几乎所有评估的情况下，DBA攻击成功率高于CBA。对于CBA，即使对抗方的训练集嵌入了全球触发因素，所有本地触发器的攻击成功率也类似于全球触发因素。为了进一步探索联邦GNN中两次后门攻击的属性，我们评估了不同数量的客户，触发尺寸，中毒强度和触发密度的攻击性能。此外，我们探讨了DBA和CBA对两个最先进的防御能力的鲁棒性。我们发现，两次攻击都对被调查的防御能力进行了强大的强大，因此需要考虑将联邦GNN中的后门攻击视为需要定制防御的新威胁。

translated by 谷歌翻译

Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning

Antonio Emanuele Cinà , Kathrin Grosse , Ambra Demontis , Sebastiano Vascon , Werner Zellinger , Bernhard A. Moser , Alina Oprea , Battista Biggio , Marcello Pelillo , Fabio Roli

分类：机器学习 | 人工智能

2022-05-04

计算能力和大型培训数据集的可用性增加，机器学习的成功助长了。假设它充分代表了在测试时遇到的数据，则使用培训数据来学习新模型或更新现有模型。这种假设受到中毒威胁的挑战，这种攻击会操纵训练数据，以损害模型在测试时的表现。尽管中毒已被认为是行业应用中的相关威胁，到目前为止，已经提出了各种不同的攻击和防御措施，但对该领域的完整系统化和批判性审查仍然缺失。在这项调查中，我们在机器学习中提供了中毒攻击和防御措施的全面系统化，审查了过去15年中该领域发表的100多篇论文。我们首先对当前的威胁模型和攻击进行分类，然后相应地组织现有防御。虽然我们主要关注计算机视觉应用程序，但我们认为我们的系统化还包括其他数据模式的最新攻击和防御。最后，我们讨论了中毒研究的现有资源，并阐明了当前的局限性和该研究领域的开放研究问题。

translated by 谷歌翻译

Hidden Trigger Backdoor Attacks

Aniruddha Saha , Akshayvarun Subramanya , Hamed Pirsiavash

分类：

2019-09-30

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time.We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.

translated by 谷歌翻译

Transferable Graph Backdoor Attack

Shuiqiao Yang , Bao Gia Doan , Paul Montague , Olivier De Vel , Tamas Abraham , Seyit Camtepe , Damith C. Ranasinghe , Salil S. Kanhere

分类：人工智能 | 机器学习

2022-06-21

图形神经网络（GNNS）在许多图形挖掘任务中取得了巨大的成功，这些任务从消息传递策略中受益，该策略融合了局部结构和节点特征，从而为更好的图表表示学习。尽管GNN成功，并且与其他类型的深神经网络相似，但发现GNN容易受到图形结构和节点特征的不明显扰动。已经提出了许多对抗性攻击，以披露在不同的扰动策略下创建对抗性例子的GNN的脆弱性。但是，GNNS对成功后门攻击的脆弱性直到最近才显示。在本文中，我们披露了陷阱攻击，这是可转移的图形后门攻击。核心攻击原则是用基于扰动的触发器毒化训练数据集，这可以导致有效且可转移的后门攻击。图形的扰动触发是通过通过替代模型的基于梯度的得分矩阵在图形结构上执行扰动动作来生成的。与先前的作品相比，陷阱攻击在几种方面有所不同：i）利用替代图卷积网络（GCN）模型来生成基于黑盒的后门攻击的扰动触发器； ii）它产生了没有固定模式的样品特异性扰动触发器； iii）在使用锻造中毒训练数据集训练时，在GNN的背景下，攻击转移到了不同的GNN模型中。通过对四个现实世界数据集进行广泛的评估，我们证明了陷阱攻击使用四个现实世界数据集在四个不同流行的GNN中构建可转移的后门的有效性

translated by 谷歌翻译

Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain

Chang Yue , Peizhuo Lv , Ruigang Liang , Kai Chen

分类：机器学习

2022-07-09

随着深度神经网络（DNN）的广泛应用，后门攻击逐渐引起了人们的关注。后门攻击是阴险的，中毒模型在良性样本上的表现良好，只有在给定特定输入时才会触发，这会导致神经网络产生不正确的输出。最先进的后门攻击工作是通过数据中毒（即攻击者注入中毒样品中的数据集中）实施的，并且用该数据集训练的模型被后门感染。但是，当前研究中使用的大多数触发因素都是在一小部分图像上修补的固定图案，并且经常被明显错误地标记，这很容易被人类或防御方法（例如神经清洁和前哨）检测到。同样，DNN很难在没有标记的情况下学习，因为它们可能会忽略小图案。在本文中，我们提出了一种基于频域的广义后门攻击方法，该方法可以实现后门植入而不会错标和访问训练过程。它是人类看不见的，能够逃避常用的防御方法。我们在三个数据集（CIFAR-10，STL-10和GTSRB）的无标签和清洁标签案例中评估了我们的方法。结果表明，我们的方法可以在所有任务上实现高攻击成功率（高于90％），而不会在主要任务上进行大量绩效降解。此外，我们评估了我们的方法的旁路性能，以进行各种防御措施，包括检测训练数据（即激活聚类），输入的预处理（即过滤），检测输入（即Sentinet）和检测模型（即神经清洁）。实验结果表明，我们的方法对这种防御能力表现出极好的鲁棒性。

translated by 谷歌翻译

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Yansong Gao , Chang Xu , Derui Wang , Shiping Chen , Damith C. Ranasinghe , Surya Nepal

分类：

2019-02-18

A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.

translated by 谷歌翻译

MACAB: Model-Agnostic Clean-Annotation Backdoor to Object Detection with Natural Trigger in Real-World

Hua Ma , Yinshan Li , Yansong Gao , Zhi Zhang , Alsharif Abuadbba , Anmin Fu , Said F. Al-Sarawi , Nepal Surya , Derek Abbott

分类：计算机视觉

2022-09-06

对象检测是各种关键计算机视觉任务的基础，例如分割，对象跟踪和事件检测。要以令人满意的精度训练对象探测器，需要大量数据。但是，由于注释大型数据集涉及大量劳动力，这种数据策展任务通常被外包给第三方或依靠志愿者。这项工作揭示了此类数据策展管道的严重脆弱性。我们提出MACAB，即使数据策展人可以手动审核图像，也可以将干净的图像制作清洁的图像将后门浸入对象探测器中。我们观察到，当后门被不明确的天然物理触发器激活时，在野外实现了错误分类和披肩的后门效应。与带有清洁标签的现有图像分类任务相比，带有清洁通道的非分类对象检测具有挑战性，这是由于每个帧内有多个对象的复杂性，包括受害者和非视野性对象。通过建设性地滥用深度学习框架使用的图像尺度函数，II结合了所提出的对抗性清洁图像复制技术，以及在考虑到毒品数据选择标准的情况下，通过建设性地滥用图像尺度尺度，可以确保MACAB的功效。广泛的实验表明，在各种现实世界中，MacAB在90％的攻击成功率中表现出超过90％的攻击成功率。这包括披肩和错误分类后门效应，甚至限制了较小的攻击预算。最先进的检测技术无法有效地识别中毒样品。全面的视频演示位于https://youtu.be/ma7l_lpxkp4上，该演示基于yolov4倒置的毒药率为0.14％，yolov4 clokaking后门和更快的速度R-CNN错误分类后门。

translated by 谷歌翻译

Defending Backdoor Attacks on Vision Transformer via Patch Processing

Khoa D. Doan , Yingjie Lao , Peng Yang , Ping Li

分类：计算机视觉

2022-06-24

视觉变压器（VITS）具有与卷积神经网络相比，具有较小的感应偏置的根本不同的结构。随着绩效的提高，VIT的安全性和鲁棒性也非常重要。与许多最近利用VIT反对对抗性例子的鲁棒性的作品相反，本文调查了代表性的病因攻击，即后门。我们首先检查了VIT对各种后门攻击的脆弱性，发现VIT也很容易受到现有攻击的影响。但是，我们观察到，VIT的清洁数据准确性和后门攻击成功率在位置编码之前对补丁转换做出了明显的反应。然后，根据这一发现，我们为VIT提出了一种通过补丁处理来捍卫基于补丁的触发后门攻击的有效方法。在包括CIFAR10，GTSRB和Tinyimagenet在内的几个基准数据集上评估了这些表演，这些数据表明，该拟议的新颖防御在减轻VIT的后门攻击方面非常成功。据我们所知，本文提出了第一个防御性策略，该策略利用了反对后门攻击的VIT的独特特征。

translated by 谷歌翻译

Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

Hossein Souri , Liam Fowl , Rama Chellappa , Micah Goldblum , Tom Goldstein

分类：机器学习 | 计算机视觉

2021-06-16

随着机器学习数据的策展变得越来越自动化，数据集篡改是一种安装威胁。后门攻击者通过培训数据篡改，以嵌入在该数据上培训的模型中的漏洞。然后通过将“触发”放入模型的输入中的推理时间以推理时间激活此漏洞。典型的后门攻击将触发器直接插入训练数据，尽管在检查时可能会看到这种攻击。相比之下，隐藏的触发后托攻击攻击达到中毒，而无需将触发器放入训练数据即可。然而，这种隐藏的触发攻击在从头开始培训的中毒神经网络时无效。我们开发了一个新的隐藏触发攻击，睡眠代理，在制备过程中使用梯度匹配，数据选择和目标模型重新培训。睡眠者代理是第一个隐藏的触发后门攻击，以对从头开始培训的神经网络有效。我们展示了Imagenet和黑盒设置的有效性。我们的实现代码可以在https://github.com/hsouri/sleeper-agent找到。

translated by 谷歌翻译

An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences

Wei Guo , Benedetta Tondi , Mauro Barni

分类：计算机视觉

2021-11-16

与令人印象深刻的进步触动了我们社会的各个方面，基于深度神经网络（DNN）的AI技术正在带来越来越多的安全问题。虽然在考试时间运行的攻击垄断了研究人员的初始关注，但是通过干扰培训过程来利用破坏DNN模型的可能性，代表了破坏训练过程的可能性，这是破坏AI技术的可靠性的进一步严重威胁。在后门攻击中，攻击者损坏了培训数据，以便在测试时间诱导错误的行为。然而，测试时间误差仅在存在与正确制作的输入样本对应的触发事件的情况下被激活。通过这种方式，损坏的网络继续正常输入的预期工作，并且只有当攻击者决定激活网络内隐藏的后门时，才会发生恶意行为。在过去几年中，后门攻击一直是强烈的研究活动的主题，重点是新的攻击阶段的发展，以及可能对策的提议。此概述文件的目标是审查发表的作品，直到现在，分类到目前为止提出的不同类型的攻击和防御。指导分析的分类基于攻击者对培训过程的控制量，以及防御者验证用于培训的数据的完整性，并监控DNN在培训和测试中的操作时间。因此，拟议的分析特别适合于参考他们在运营的应用方案的攻击和防御的强度和弱点。

translated by 谷歌翻译

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Yige Li , Xixiang Lyu , Nodens Koren , Lingjuan Lyu , Bo Li , Xingjun Ma

分类：机器学习 | 人工智能

2021-10-22

后门攻击已成为深度神经网络（DNN）的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果，但仍然尚不清楚是否可以设计强大的培训方法，以防止后门触发器首先注入训练的模型。在本文中，我们介绍了\ emph {反后门学习}的概念，旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看，我们确定了两个后门攻击的固有特征，因为他们的弱点2）后门任务与特定类（后门目标类）相关联。根据这两个弱点，我们提出了一般学习计划，反后门学习（ABL），在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制，帮助分离早期训练阶段的后台示例，2）在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验，针对10个最先进的攻击，我们经验证明，后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。

translated by 谷歌翻译

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

Zhengyan Zhang , Guangxuan Xiao , Yongwei Li , Tian Lv , Fanchao Qi , Zhiyuan Liu , Yasheng Wang , Xin Jiang , Maosong Sun

分类：自然语言处理 | 计算机视觉

2021-01-18

预训练模型（PTM）已被广泛用于各种下游任务。 PTM的参数分布在Internet上，可能会遭受后门攻击。在这项工作中，我们演示了PTMS的普遍脆弱性，在该工作中，可以通过任意下游任务中的后门攻击轻松控制PTMS。具体而言，攻击者可以添加一个简单的预训练任务，该任务将触发实例的输出表示限制为预定义的向量，即神经元级后门攻击（NEUBA）。如果在微调过程中未消除后门功能，则触发器可以通过预定义的矢量预测固定标签。在自然语言处理（NLP）和计算机视觉（CV）的实验中，我们表明Neuba绝对可以控制触发实例的预测，而无需了解下游任务。最后，我们将几种防御方法应用于Neuba，并发现模型修剪是通过排除后门神经元来抵抗Neuba的有希望的方向。我们的发现听起来是红色警报，用于广泛使用PTM。我们的源代码和模型可在\ url {https://github.com/thunlp/neuba}上获得。

translated by 谷歌翻译