在对抗文献中,鲁棒性和准确性之间的权衡得到了广泛的研究。尽管仍然有争议,但普遍的观点是,从经验或理论上,这种权衡是固有的。因此,我们在对抗训练中挖掘了这种权衡的起源,发现它可能源于不当定义的可靠错误,该错误施加了局部不变性的诱导偏见 - 对平稳性的过度校正。鉴于此,我们主张采用局部模棱两可来描述健壮模型的理想行为,从而导致自洽的强大错误称为得分。根据定义,得分有助于稳健性与准确性之间的对帐,同时仍通过稳健优化处理最坏情况的不确定性。通过简单地将KL差异替换为距离指标的变体,得分可以有效地最小化。从经验上讲,我们的模型在AutoAttact下的强力板上实现了最高的性能。此外,得分提供了指导性见解,以解释在健壮模型上观察到的过度拟合现象和语义输入梯度。代码可在https://github.com/p2333/score上找到。
translated by 谷歌翻译
正确分类对抗性示例是安全部署机器学习模型的必不可少但具有挑战性的要求。据抢救模型甚至是最先进的离职训练的模型,在CIFAR-10上努力超过67%的强大测试精度,这远非实用。互动的互补方法是引入拒绝选项,允许模型不返回对不确定输入的预测,自信是常用的确定性代理。随着这个例程,我们发现置信度和纠正的置信度(R-Con)可以形成两个耦合的拒绝度量,这可以从正确分类的次数中可以证明错误分类的输入。这种有趣的属性揭示了使用偶联策略来更好地检测和抑制对抗性实例。我们在包括自适应攻击的若干攻击下,在CiFar-10,CiFar-10-C和CiFar-100上评估我们的整流拒绝(RR)模块,并证明RR模块与改善稳健性的不同的对抗训练框架兼容额外的计算。代码可在https://github.com/p2333/Rectified-re注意到。
translated by 谷歌翻译
在本文中,我们研究了通过减少优化难度来改善对抗性训练(AT)获得的对抗性鲁棒性。为了更好地研究这个问题,我们为AT建立了一个新颖的Bregman Divergence观点,其中可以将其视为负熵曲线上训练数据点的滑动过程。基于这个观点,我们分析了方法(即PGD-AT和Trades)的两个典型方法的学习目标,并且我们发现交易的优化过程比PGD-AT更容易,而PGD-AT则将PGD-AT分开。此外,我们讨论了熵在贸易中的功能,我们发现具有高熵的模型可以是更好的鲁棒性学习者。受到上述发现的启发,我们提出了两种方法,即伪造和MER,它们不仅可以减少10步PGD对手下优化的难度,而且还可以提供更好的鲁棒性。我们的工作表明,在10步PGD对手下减少优化的难度是增强AT中对抗性鲁棒性的一种有前途的方法。
translated by 谷歌翻译
由明确的反对派制作的对抗例子在机器学习中引起了重要的关注。然而,潜在虚假朋友带来的安全风险基本上被忽视了。在本文中,我们揭示了虚伪的例子的威胁 - 最初被错误分类但是虚假朋友扰乱的投入,以强迫正确的预测。虽然这种扰动的例子似乎是无害的,但我们首次指出,它们可能是恶意地用来隐瞒评估期间不合格(即,不如所需)模型的错误。一旦部署者信任虚伪的性能并在真实应用程序中应用“良好的”模型,即使在良性环境中也可能发生意外的失败。更严重的是,这种安全风险似乎是普遍存在的:我们发现许多类型的不合标准模型易受多个数据集的虚伪示例。此外,我们提供了第一次尝试,以称为虚伪风险的公制表征威胁,并试图通过一些对策来规避它。结果表明对策的有效性,即使在自适应稳健的培训之后,风险仍然是不可忽视的。
translated by 谷歌翻译
到目前为止对抗训练是抵御对抗例子的最有效的策略。然而,由于每个训练步骤中的迭代对抗性攻击,它遭受了高的计算成本。最近的研究表明,通过随机初始化执行单步攻击,可以实现快速的对抗训练。然而,这种方法仍然落后于稳定性和模型稳健性的最先进的对手训练算法。在这项工作中,我们通过观察随机平滑的随机初始化来更好地优化内部最大化问题,对快速对抗培训进行新的理解。在这种新的视角之后,我们还提出了一种新的初始化策略,向后平滑,进一步提高单步强大培训方法的稳定性和模型稳健性。多个基准测试的实验表明,我们的方法在使用更少的训练时间(使用相同的培训计划时,使用更少的培训时间($ \ sim $ 3x改进)时,我们的方法达到了类似的模型稳健性。
translated by 谷歌翻译
对抗性的鲁棒性已成为机器学习越来越兴趣的话题,因为观察到神经网络往往会变得脆弱。我们提出了对逆转防御的信息几何表述,并引入Fire,这是一种针对分类跨透明镜损失的新的Fisher-Rao正则化,这基于对应于自然和受扰动输入特征的软磁输出之间的测量距离。基于SoftMax分布类的信息几何特性,我们为二进制和多类案例提供了Fisher-Rao距离(FRD)的明确表征,并绘制了一些有趣的属性以及与标准正则化指标的连接。此外,对于一个简单的线性和高斯模型,我们表明,在精度 - 舒适性区域中的所有帕累托最佳点都可以通过火力达到,而其他最先进的方法则可以通过火灾。从经验上讲,我们评估了经过标准数据集拟议损失的各种分类器的性能,在清洁和健壮的表现方面同时提高了1 \%的改进,同时将培训时间降低了20 \%,而不是表现最好的方法。
translated by 谷歌翻译
由于机器学习(ML)系统变得普遍存在,因此保护其安全性至关重要。然而,最近已经证明,动机的对手能够通过使用语义转换扰乱测试数据来误导ML系统。虽然存在丰富的研究机构,但为ML模型提供了可提供的稳健性保证,以防止$ \ ell_p $ norm界限对抗对抗扰动,抵御语义扰动的保证仍然很广泛。在本文中,我们提供了TSS - 一种统一的框架,用于针对一般对抗性语义转换的鲁棒性认证。首先,根据每个转换的性质,我们将常见的变换划分为两类,即可解决的(例如,高斯模糊)和差异可解的(例如,旋转)变换。对于前者,我们提出了特定于转型的随机平滑策略并获得强大的稳健性认证。后者类别涵盖涉及插值错误的变换,我们提出了一种基于分层采样的新方法,以证明稳健性。我们的框架TSS利用这些认证策略并结合了一致性增强的培训,以提供严谨的鲁棒性认证。我们对十种挑战性语义转化进行了广泛的实验,并表明TSS显着优于现有技术。此外,据我们所知,TSS是第一种在大规模想象数据集上实现非竞争认证稳健性的方法。例如,我们的框架在ImageNet上实现了旋转攻击的30.4%认证的稳健准确性(在$ \ PM 30 ^ \ CIC $)。此外,要考虑更广泛的转换,我们展示了TSS对自适应攻击和不可预见的图像损坏,例如CIFAR-10-C和Imagenet-C。
translated by 谷歌翻译
随机平滑是目前是最先进的方法,用于构建来自Neural Networks的可认真稳健的分类器,以防止$ \ ell_2 $ - vitersarial扰动。在范例下,分类器的稳健性与预测置信度对齐,即,对平滑分类器的较高的置信性意味着更好的鲁棒性。这使我们能够在校准平滑分类器的信仰方面重新思考准确性和鲁棒性之间的基本权衡。在本文中,我们提出了一种简单的训练方案,Coined Spiremix,通过自我混合来控制平滑分类器的鲁棒性:它沿着每个输入对逆势扰动方向进行样品的凸起组合。该提出的程序有效地识别过度自信,在平滑分类器的情况下,作为有限的稳健性的原因,并提供了一种直观的方法来自适应地在这些样本之间设置新的决策边界,以实现更好的鲁棒性。我们的实验结果表明,与现有的最先进的强大培训方法相比,该方法可以显着提高平滑分类器的认证$ \ ell_2 $ -toSpustness。
translated by 谷歌翻译
The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than 10%, identifying several broken defenses.
translated by 谷歌翻译
最近,张等人。(2021)基于$ \ ell_ \ infty $ -distance函数开发出一种新的神经网络架构,自然拥有经过认证的$ \ ell_ \ infty $坚固的稳健性。尽管具有出色的理论特性,但到目前为止的模型只能实现与传统网络的可比性。在本文中,我们通过仔细分析培训流程,大大提高了$ \ ell_ \ infty $ -distance网的认证稳健性。特别是,我们展示了$ \ ell_p $ -rexation,这是克服模型的非平滑度的关键方法,导致早期训练阶段的意外的大型嘴唇浓度。这使得优化不足以使用铰链损耗并产生次优溶液。鉴于这些调查结果,我们提出了一种简单的方法来解决上述问题,设计一种新的客观函数,这些功能将缩放的跨熵损失结合在剪切铰链损失。实验表明,使用拟议的培训策略,$ \ ell_ \ infty $-distance网的认证准确性可以从Cifar-10($ \ epsilon = 8/255 $)的33.30%到40.06%的显着提高到40.06%,同时显着优于表现优势该地区的其他方法。我们的结果清楚地展示了$ \ ell_ \ infty $-distance净的有效性和潜力,以获得认证的稳健性。代码在https://github.com/zbh2047/l_inf-dist-net-v2上获得。
translated by 谷歌翻译
We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the 2 norm. This "randomized smoothing" technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in 2 norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with 2 norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified 2 robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http: //github.com/locuslab/smoothing.
translated by 谷歌翻译
在对抗性鲁棒性的背景下,单个模型通常没有足够的力量来防御所有可能的对抗攻击,因此具有亚最佳的鲁棒性。因此,新兴的工作重点是学习神经网络的合奏,以防止对抗性攻击。在这项工作中,我们采取了一种有原则的方法来建立强大的合奏。我们从增强保证金的角度观察了这个问题,并开发了一种学习最大利润的合奏的算法。通过在基准数据集上进行广泛的经验评估,我们表明我们的算法不仅超过了现有的结合技术,而且还以端到端方式训练的大型模型。我们工作的一个重要副产品是边缘最大化的跨肠损失(MCE)损失,这是标准跨侧面(CE)损失的更好替代方法。从经验上讲,我们表明,用MCE损失取代最先进的对抗训练技术中的CE损失会导致显着提高性能。
translated by 谷歌翻译
现有针对对抗性示例(例如对抗训练)的防御能力通常假设对手将符合特定或已知的威胁模型,例如固定预算内的$ \ ell_p $扰动。在本文中,我们关注的是在训练过程中辩方假设的威胁模型中存在不匹配的情况,以及在测试时对手的实际功能。我们问一个问题:学习者是否会针对特定的“源”威胁模型进行训练,我们什么时候可以期望鲁棒性在测试时间期间概括为更强大的未知“目标”威胁模型?我们的主要贡献是通过不可预见的对手正式定义学习和概括的问题,这有助于我们从常规的对手的传统角度来理解对抗风险的增加。应用我们的框架,我们得出了将源和目标威胁模型之间的概括差距与特征提取器变化相关联的概括,该限制衡量了在给定威胁模型中提取的特征之间的预期最大差异。基于我们的概括结合,我们提出了具有变化正则化(AT-VR)的对抗训练,该训练在训练过程中降低了特征提取器在源威胁模型中的变化。我们从经验上证明,与标准的对抗训练相比,AT-VR可以改善测试时间内的概括,从而无法预见。此外,我们将变异正则化与感知对抗训练相结合[Laidlaw等。 2021]以实现不可预见的攻击的最新鲁棒性。我们的代码可在https://github.com/inspire-group/variation-regularization上公开获取。
translated by 谷歌翻译
对抗性培训(AT)已成为培训强大网络的热门选择。然而,它倾向于牺牲清洁精度,以令人满意的鲁棒性,并且遭受大的概括误差。为了解决这些问题,我们提出了平稳的对抗培训(SAT),以我们对损失令人歉端的损失的终人谱指导。 We find that curriculum learning, a scheme that emphasizes on starting "easy" and gradually ramping up on the "difficulty" of training, smooths the adversarial loss landscape for a suitably chosen difficulty metric.我们展示了对普通环境中的课程学习的一般制定,并提出了一种基于最大Hessian特征值(H-SAT)和软MAX概率(P-SA)的两个难度指标。我们展示SAT稳定网络培训即使是大型扰动规范,并且允许网络以更好的清洁精度运行而与鲁棒性权衡曲线相比。与AT,交易和其他基线相比,这导致清洁精度和鲁棒性的显着改善。为了突出一些结果,我们的最佳模型将分别在CIFAR-100上提高6%和1%的稳健准确性。在Imagenette上,一个十一级想象成的子集,我们的模型分别以正常和强大的准确性达到23%和3%。
translated by 谷歌翻译
尽管深层神经网络在各种任务中取得了巨大的成功,但它们对不可察觉的对抗性扰动的脆弱性阻碍了他们在现实世界中的部署。最近,与随机合奏的作品相对于经过最小的计算开销的标准对手训练(AT)模型,对对抗性训练(AT)模型的对抗性鲁棒性有了显着改善,这使它们成为安全临界资源限制应用程序的有前途解决方案。但是,这种令人印象深刻的表现提出了一个问题:这些稳健性是由随机合奏提供的吗?在这项工作中,我们从理论和经验上都解决了这个问题。从理论上讲,我们首先确定通常采用的鲁棒性评估方法(例如自适应PGD)在这种情况下提供了错误的安全感。随后,我们提出了一种理论上有效的对抗攻击算法(ARC),即使在自适应PGD无法做到这一点的情况下,也能妥协随机合奏。我们在各种网络体系结构,培训方案,数据集和规范上进行全面的实验,以支持我们的主张,并经验证明,随机合奏实际上比在模型上更容易受到$ \ ell_p $结合的对抗性扰动的影响。我们的代码可以在https://github.com/hsndbk4/arc上找到。
translated by 谷歌翻译
The study on improving the robustness of deep neural networks against adversarial examples grows rapidly in recent years. Among them, adversarial training is the most promising one, which flattens the input loss landscape (loss change with respect to input) via training on adversarially perturbed examples. However, how the widely used weight loss landscape (loss change with respect to weight) performs in adversarial training is rarely explored. In this paper, we investigate the weight loss landscape from a new perspective, and identify a clear correlation between the flatness of weight loss landscape and robust generalization gap. Several well-recognized adversarial training improvements, such as early stopping, designing new objective functions, or leveraging unlabeled data, all implicitly flatten the weight loss landscape. Based on these observations, we propose a simple yet effective Adversarial Weight Perturbation (AWP) to explicitly regularize the flatness of weight loss landscape, forming a double-perturbation mechanism in the adversarial training framework that adversarially perturbs both inputs and weights. Extensive experiments demonstrate that AWP indeed brings flatter weight loss landscape and can be easily incorporated into various existing adversarial training methods to further boost their adversarial robustness.
translated by 谷歌翻译
Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the smoothed classifiers, the fundamental trade-off between accuracy and (adversarial) robustness has been well evidenced in the literature: i.e., increasing the robustness of a classifier for an input can be at the expense of decreased accuracy for some other inputs. In this paper, we propose a simple training method leveraging this trade-off to obtain robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on $\ell_p$-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.
translated by 谷歌翻译
最近,Wong等人。表明,使用单步FGSM的对抗训练导致一种名为灾难性过度拟合(CO)的特征故障模式,其中模型突然变得容易受到多步攻击的影响。他们表明,在FGSM(RS-FGSM)之前添加随机扰动似乎足以防止CO。但是,Andriushchenko和Flammarion观察到RS-FGSM仍会导致更大的扰动,并提出了一个昂贵的常规化器(Gradalign),DEMATER(GARGALIGN)DES昂贵(Gradalign)Dust Forrasiniger(Gradalign)Dust co避免在这项工作中,我们有条不紊地重新审视了噪声和剪辑在单步对抗训练中的作用。与以前的直觉相反,我们发现在干净的样品周围使用更强烈的噪声与不剪接相结合在避免使用大扰动半径的CO方面非常有效。基于这些观察结果,我们提出了噪声-FGSM(N-FGSM),尽管提供了单步对抗训练的好处,但在大型实验套件上没有经验分析,这表明N-FGSM能够匹配或超越以前的单步方法的性能,同时达到3 $ \ times $加速。代码可以在https://github.com/pdejorge/n-fgsm中找到
translated by 谷歌翻译