Robust Model-Agnostic Meta-Learning (MAML) is usually adopted to train a meta-model which may fast adapt to novel classes with only a few exemplars and meanwhile remain robust to adversarial attacks. The conventional solution for robust MAML is to introduce robustness-promoting regularization during meta-training stage. With such a regularization, previous robust MAML methods simply follow the typical MAML practice that the number of training shots should match with the number of test shots to achieve an optimal adaptation performance. However, although the robustness can be largely improved, previous methods sacrifice clean accuracy a lot. In this paper, we observe that introducing robustness-promoting regularization into MAML reduces the intrinsic dimension of clean sample features, which results in a lower capacity of clean representations. This may explain why the clean accuracy of previous robust MAML methods drops severely. Based on this observation, we propose a simple strategy, i.e., increasing the number of training shots, to mitigate the loss of intrinsic dimension caused by robustness-promoting regularization. Though simple, our method remarkably improves the clean accuracy of MAML without much loss of robustness, producing a robust yet accurate model. Extensive experiments demonstrate that our method outperforms prior arts in achieving a better trade-off between accuracy and robustness. Besides, we observe that our method is less sensitive to the number of fine-tuning steps during meta-training, which allows for a reduced number of fine-tuning steps to improve training efficiency.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
通过回顾他们之前看到的类似未腐败的图像,人类的注意力可以直观地适应图像的损坏区域。这种观察结果激发了我们通过考虑清洁的对应物来提高对抗性图像的注意。为了实现这一目标,我们将联想的对抗性学习(aal)介绍进入对抗的学习,以指导选择性攻击。我们为引人注目和攻击(扰动)之间的内在关系作为提高其互动的耦合优化问题。这导致注意反向触发算法,可以有效提高注意力的对抗鲁棒性。我们的方法是通用的,可用于通过简单选择不同的核来解决各种任务,以便为特定攻击选择其他区域的关联注意。实验结果表明,选择性攻击提高了模型的性能。我们表明,与基线相比,我们的方法提高了8.32%对想象成的识别准确性。它还将Pascalvoc的物体检测图提高了2.02%,并在MiniimAgenet上的几次学习识别准确性为1.63%。
translated by 谷歌翻译
Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples.We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger. For instance, by applying AdvProp to the latest EfficientNet-B7 [41] on ImageNet, we achieve significant improvements on ImageNet (+0.7%), ImageNet-C (+6.5%), ImageNet-A (+7.0%) and Stylized-ImageNet (+4.8%). With an enhanced EfficientNet-B8, our method achieves the state-of-the-art 85.5% ImageNet top-1 accuracy without extra data. This result even surpasses the best model in [24] which is trained with 3.5B Instagram images (∼3000× more than ImageNet) and ∼9.4× more parameters. Models are available at https://github.com/tensorflow/tpu/tree/ master/models/official/efficientnet.
translated by 谷歌翻译
作为反对攻击的最有效的防御方法之一,对抗性训练倾向于学习包容性的决策边界,以提高深度学习模型的鲁棒性。但是,由于沿对抗方向的边缘的大幅度和不必要的增加,对抗性训练会在自然实例和对抗性示例之间引起严重的交叉,这不利于平衡稳健性和自然准确性之间的权衡。在本文中,我们提出了一种新颖的对抗训练计划,以在稳健性和自然准确性之间进行更好的权衡。它旨在学习一个中度包容的决策边界,这意味着决策边界下的自然示例的边缘是中等的。我们称此方案为中等边缘的对抗训练(MMAT),该方案生成更细粒度的对抗示例以减轻交叉问题。我们还利用了经过良好培训的教师模型的逻辑来指导我们的模型学习。最后,MMAT在Black-Box和White-Box攻击下都可以实现高自然的精度和鲁棒性。例如,在SVHN上,实现了最新的鲁棒性和自然精度。
translated by 谷歌翻译
对抗训练(AT)方法有效地防止对抗性攻击,但它们在不同阶级之间引入了严重的准确性和鲁棒性差异,称为强大的公平性问题。以前建议的公平健壮的学习(FRL)适应重新重量不同的类别以提高公平性。但是,表现良好的班级的表现降低了,导致表现强劲。在本文中,我们在对抗训练中观察到了两种不公平现象:在产生每个类别的对抗性示例(源级公平)和产生对抗性示例时(目标级公平)时产生对抗性示例的不​​同困难。从观察结果中,我们提出平衡对抗训练(BAT)来解决强大的公平问题。关于源阶级的公平性,我们调整了每个班级的攻击强度和困难,以在决策边界附近生成样本,以便更容易,更公平的模型学习;考虑到目标级公平,通过引入统一的分布约束,我们鼓励每个班级的对抗性示例生成过程都有公平的趋势。在多个数据集(CIFAR-10,CIFAR-100和IMAGENETTE)上进行的广泛实验表明,我们的方法可以显着超过其他基线,以减轻健壮的公平性问题(最坏的类精度为+5-10 \%)
translated by 谷歌翻译
尽管深度学习模型在图像语义细分中取得了巨大进展,但它们通常需要大的注释示例,并且越来越多的注意力转移到了诸如少数射击学习(FSL)之类的问题设置中,在这些设置中,只需要少量注释才能泛化才能概括地进行概括的概括。新颖的课程。这尤其在医疗领域中可以看到,那里的像素级注释昂贵。在本文中,我们提出了正则原型神经差微分方程(R-PNODE),该方法利用神经模型的固有特性,通过额外的簇和一致性损失来辅助和增强,以执行几个器官的几分片段分割(FSS)。 R-Pnode将同一类的支持和查询功能限制在表示空间中,从而改善了基于现有的卷积神经网络(CNN)的FSS方法的性能。我们进一步证明,尽管许多现有的基于CNN的现有方法往往非常容易受到对抗攻击的影响,但R-Pnode表现出对各种攻击的对抗性鲁棒性的提高。我们在内域和跨域FSS设置中使用三个公开可用的多器官分割数据集,以证明我们方法的疗效。此外,我们在各种设置中使用七个常用的对抗攻击进行实验,以证明R-Pnode的鲁棒性。 R-Pnode的表现优于FSS的基线,并且在强度和设计方面也显示出卓越的性能。
translated by 谷歌翻译
逆势培训可针对特异性对抗性扰动有用,但它们也证明旨在展示偏离用于培训的攻击的攻击。然而,我们观察到这种无效性是本质上与域的适应性,深度学习中的另一个关键问题似乎是一个有希望的解决方案。因此,我们提出了ADV-4-ADV作为一种新的逆势培训方法,旨在保持针对看不见的对抗性扰动的鲁棒性。基本上,ADV-4-ADV将攻击产生不同的扰动作为不同的域,并且通过利用逆势域适应的力量,它旨在消除域/攻击特定的功能。这迫使训练有素的模型来学习强大的域名不变的表示,这反过来增强了其泛化能力。对时尚 - MNIST,SVHN,CIFAR-10和CIFAR-100的广泛评估表明,基于由简单攻击(例如,FGSM)制备的样本训练的模型可以推广到更高级的攻击(例如, PGD​​),性能超过了这些数据集的最先进的提案。
translated by 谷歌翻译
Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters.Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks. The code is available at https://github.com/ashafahi/free_adv_train.
translated by 谷歌翻译
有必要提高某些特殊班级的表现,或者特别保护它们免受对抗学习的攻击。本文提出了一个将成本敏感分类和对抗性学习结合在一起的框架,以训练可以区分受保护和未受保护的类的模型,以使受保护的类别不太容易受到对抗性示例的影响。在此框架中,我们发现在训练深神经网络(称为Min-Max属性)期间,一个有趣的现象,即卷积层中大多数参数的绝对值。基于这种最小的最大属性,该属性是在随机分布的角度制定和分析的,我们进一步建立了一个针对对抗性示例的新防御模型,以改善对抗性鲁棒性。构建模型的一个优点是,它的性能比标准模型更好,并且可以与对抗性训练相结合,以提高性能。在实验上证实,对于所有类别的平均准确性,我们的模型在没有发生攻击时几乎与现有模型一样,并且在发生攻击时比现有模型更好。具体而言,关于受保护类的准确性,提议的模型比发生攻击时的现有模型要好得多。
translated by 谷歌翻译
最近的研究表明,对对抗性攻击的鲁棒性可以跨网络转移。换句话说,在强大的教师模型的帮助下,我们可以使模型更加强大。我们问是否从静态教师那里学习,可以模特“学习”和“互相教导”来实现更好的稳健性?在本文中,我们研究模型之间的相互作用如何通过知识蒸馏来影响鲁棒性。我们提出了互联土训练(垫子),其中多种模型一起培训并分享对抗性示例的知识,以实现改善的鲁棒性。垫允许强大的模型来探索更大的对抗样本空间,并找到更强大的特征空间和决策边界。通过对CIFAR-10和CIFAR-100的广泛实验,我们证明垫可以在白盒攻击下有效地改善模型稳健性和最优异的现有方法,使$ \ SIM为8%的准确性增益对香草对抗培训(在PGD-100袭击下。此外,我们表明垫子还可以在不同的扰动类型中减轻鲁棒性权衡,从$ l_ \ infty $,$ l_2 $和$ l_1 $攻击中带来基线的基线。这些结果表明了该方法的优越性,并证明协作学习是设计强大模型的有效策略。
translated by 谷歌翻译
在本文中,我们研究了通过减少优化难度来改善对抗性训练(AT)获得的对抗性鲁棒性。为了更好地研究这个问题,我们为AT建立了一个新颖的Bregman Divergence观点,其中可以将其视为负熵曲线上训练数据点的滑动过程。基于这个观点,我们分析了方法(即PGD-AT和Trades)的两个典型方法的学习目标,并且我们发现交易的优化过程比PGD-AT更容易,而PGD-AT则将PGD-AT分开。此外,我们讨论了熵在贸易中的功能,我们发现具有高熵的模型可以是更好的鲁棒性学习者。受到上述发现的启发,我们提出了两种方法,即伪造和MER,它们不仅可以减少10步PGD对手下优化的难度,而且还可以提供更好的鲁棒性。我们的工作表明,在10步PGD对手下减少优化的难度是增强AT中对抗性鲁棒性的一种有前途的方法。
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
对抗性训练(AT)已被证明可以通过利用对抗性示例进行训练来有效地改善模型鲁棒性。但是,大多数方法面对昂贵的时间和计算成本,用于在生成对抗性示例的多个步骤中计算梯度。为了提高训练效率,快速梯度符号方法(FGSM)在方法中仅通过计算一次来快速地采用。不幸的是,鲁棒性远非令人满意。初始化的方式可能引起一个原因。现有的快速在通常使用随机的样本不合时宜的初始化,这促进了效率,但会阻碍进一步的稳健性改善。到目前为止,快速AT中的初始化仍未广泛探索。在本文中,我们以样本依赖性的对抗初始化(即,来自良性图像条件的生成网络的输出及其来自目标网络的梯度信息的输出)快速增强。随着生成网络和目标网络在训练阶段共同优化,前者可以适应相对于后者的有效初始化,从而激发了逐渐改善鲁棒性。在四个基准数据库上进行的实验评估证明了我们所提出的方法比在方法上快速的最先进方法的优越性,以及与方法相当的鲁棒性。该代码在https://github.com//jiaxiaojunqaq//fgsm-sdi上发布。
translated by 谷歌翻译
到目前为止对抗训练是抵御对抗例子的最有效的策略。然而,由于每个训练步骤中的迭代对抗性攻击,它遭受了高的计算成本。最近的研究表明,通过随机初始化执行单步攻击,可以实现快速的对抗训练。然而,这种方法仍然落后于稳定性和模型稳健性的最先进的对手训练算法。在这项工作中,我们通过观察随机平滑的随机初始化来更好地优化内部最大化问题,对快速对抗培训进行新的理解。在这种新的视角之后,我们还提出了一种新的初始化策略,向后平滑,进一步提高单步强大培训方法的稳定性和模型稳健性。多个基准测试的实验表明,我们的方法在使用更少的训练时间(使用相同的培训计划时,使用更少的培训时间($ \ sim $ 3x改进)时,我们的方法达到了类似的模型稳健性。
translated by 谷歌翻译
几乎没有弹出的文本分类旨在在几个弹奏方案下对文本进行分类。以前的大多数方法都采用基于优化的元学习来获得任务分布。但是,由于少数样本和复杂模型之间的匹配以及有用的任务功能之间的区别,这些方法遭受了过度拟合问题的影响。为了解决这个问题,我们通过梯度相似性(AMGS)方法提出了一种新颖的自适应元学习器,以提高模型的泛化能力。具体而言,拟议的AMG基于两个方面缓解了过度拟合:(i)通过内部循环中的自我监督的辅助任务来获取样品的潜在语义表示并改善模型的概括,(ii)利用适应性元学习者通过适应性元学习者通过梯度通过相似性,可以在外环中基底学习者获得的梯度上增加约束。此外,我们对正则化对整个框架的影响进行系统分析。对几个基准测试的实验结果表明,与最先进的优化元学习方法相比,提出的AMG始终提高了很少的文本分类性能。
translated by 谷歌翻译
Adversarial training has been empirically shown to be more prone to overfitting than standard training. The exact underlying reasons still need to be fully understood. In this paper, we identify one cause of overfitting related to current practices of generating adversarial samples from misclassified samples. To address this, we propose an alternative approach that leverages the misclassified samples to mitigate the overfitting problem. We show that our approach achieves better generalization while having comparable robustness to state-of-the-art adversarial training methods on a wide range of computer vision, natural language processing, and tabular tasks.
translated by 谷歌翻译
As acquiring manual labels on data could be costly, unsupervised domain adaptation (UDA), which transfers knowledge learned from a rich-label dataset to the unlabeled target dataset, is gaining increasing popularity. While extensive studies have been devoted to improving the model accuracy on target domain, an important issue of model robustness is neglected. To make things worse, conventional adversarial training (AT) methods for improving model robustness are inapplicable under UDA scenario since they train models on adversarial examples that are generated by supervised loss function. In this paper, we present a new meta self-training pipeline, named SRoUDA, for improving adversarial robustness of UDA models. Based on self-training paradigm, SRoUDA starts with pre-training a source model by applying UDA baseline on source labeled data and taraget unlabeled data with a developed random masked augmentation (RMA), and then alternates between adversarial target model training on pseudo-labeled target data and finetuning source model by a meta step. While self-training allows the direct incorporation of AT in UDA, the meta step in SRoUDA further helps in mitigating error propagation from noisy pseudo labels. Extensive experiments on various benchmark datasets demonstrate the state-of-the-art performance of SRoUDA where it achieves significant model robustness improvement without harming clean accuracy. Code is available at https://github.com/Vision.
translated by 谷歌翻译
为了应对对抗性实例的威胁,对抗性培训提供了一种有吸引力的选择,可以通过在线增强的对抗示例中的培训模型提高模型稳健性。然而,大多数现有的对抗训练方法通过强化对抗性示例来侧重于提高鲁棒的准确性,但忽略了天然数据和对抗性实施例之间的增加,导致自然精度急剧下降。为了维持自然和强大的准确性之间的权衡,我们从特征适应的角度缓解了转变,并提出了一种特征自适应对抗训练(FAAT),这些培训(FAAT)跨越自然数据和对抗示例优化类条件特征适应。具体而言,我们建议纳入一类条件鉴别者,以鼓励特征成为(1)类鉴别的和(2)不变导致对抗性攻击的变化。新型的FAAT框架通过在天然和对抗数据中产生具有类似分布的特征来实现自然和强大的准确性之间的权衡,并实现从类鉴别特征特征中受益的更高的整体鲁棒性。在各种数据集上的实验表明,FAAT产生更多辨别特征,并对最先进的方法表现有利。代码在https://github.com/visionflow/faat中获得。
translated by 谷歌翻译
The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or selfsupervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of selfdistillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.
translated by 谷歌翻译