随机平滑是目前是最先进的方法,用于构建来自Neural Networks的可认真稳健的分类器,以防止$ \ ell_2 $ - vitersarial扰动。在范例下,分类器的稳健性与预测置信度对齐,即,对平滑分类器的较高的置信性意味着更好的鲁棒性。这使我们能够在校准平滑分类器的信仰方面重新思考准确性和鲁棒性之间的基本权衡。在本文中,我们提出了一种简单的训练方案,Coined Spiremix,通过自我混合来控制平滑分类器的鲁棒性:它沿着每个输入对逆势扰动方向进行样品的凸起组合。该提出的程序有效地识别过度自信,在平滑分类器的情况下,作为有限的稳健性的原因,并提供了一种直观的方法来自适应地在这些样本之间设置新的决策边界,以实现更好的鲁棒性。我们的实验结果表明,与现有的最先进的强大培训方法相比,该方法可以显着提高平滑分类器的认证$ \ ell_2 $ -toSpustness。
translated by 谷歌翻译
Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the smoothed classifiers, the fundamental trade-off between accuracy and (adversarial) robustness has been well evidenced in the literature: i.e., increasing the robustness of a classifier for an input can be at the expense of decreased accuracy for some other inputs. In this paper, we propose a simple training method leveraging this trade-off to obtain robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions.
translated by 谷歌翻译
We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the 2 norm. This "randomized smoothing" technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in 2 norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with 2 norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified 2 robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http: //github.com/locuslab/smoothing.
translated by 谷歌翻译
由于机器学习(ML)系统变得普遍存在,因此保护其安全性至关重要。然而,最近已经证明,动机的对手能够通过使用语义转换扰乱测试数据来误导ML系统。虽然存在丰富的研究机构,但为ML模型提供了可提供的稳健性保证,以防止$ \ ell_p $ norm界限对抗对抗扰动,抵御语义扰动的保证仍然很广泛。在本文中,我们提供了TSS - 一种统一的框架,用于针对一般对抗性语义转换的鲁棒性认证。首先,根据每个转换的性质,我们将常见的变换划分为两类,即可解决的(例如,高斯模糊)和差异可解的(例如,旋转)变换。对于前者,我们提出了特定于转型的随机平滑策略并获得强大的稳健性认证。后者类别涵盖涉及插值错误的变换,我们提出了一种基于分层采样的新方法,以证明稳健性。我们的框架TSS利用这些认证策略并结合了一致性增强的培训,以提供严谨的鲁棒性认证。我们对十种挑战性语义转化进行了广泛的实验,并表明TSS显着优于现有技术。此外,据我们所知,TSS是第一种在大规模想象数据集上实现非竞争认证稳健性的方法。例如,我们的框架在ImageNet上实现了旋转攻击的30.4%认证的稳健准确性(在$ \ PM 30 ^ \ CIC $)。此外,要考虑更广泛的转换,我们展示了TSS对自适应攻击和不可预见的图像损坏,例如CIFAR-10-C和Imagenet-C。
translated by 谷歌翻译
随机平滑是一种最近的技术,可以在训练中实现最先进的性能,从而确认强大的深度神经网络。虽然平滑的分布家族通常连接到用于认证的规范的选择,但这些分布的参数始终将其视为全局超级参数,独立于网络认证的输入数据。在这项工作中,我们重新访问高斯随机平滑,并表明可以在每个输入时优化高斯分布的方差,以最大程度地提高构建平滑分类器的认证半径。由于数据依赖性分类器未直接使用现有方法享受合理的认证,因此我们提出了一个可通过构造认证的记忆增强数据依赖的平滑分类器。这种新方法是通用,无参数且易于实现的。实际上,我们表明我们的数据依赖框架可以无缝地纳入3种随机平滑方法中,从而导致一致的提高认证准确性。当这些方法的训练例程中使用此框架,然后是数据依赖性认证时,我们比CIFAR10和Imagenet上0.5的最强基线的认证准确度提高了9%和6%。
translated by 谷歌翻译
We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed empirically in more complex settings. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the representations learned by robust models tend to align better with salient data characteristics and human perception.
translated by 谷歌翻译
当前,随机平滑被认为是获得确切可靠分类器的最新方法。尽管其表现出色,但该方法仍与各种严重问题有关,例如``认证准确性瀑布'',认证与准确性权衡甚至公平性问题。已经提出了依赖输入的平滑方法,目的是克服这些缺陷。但是,我们证明了这些方法缺乏正式的保证,因此所产生的证书是没有道理的。我们表明,一般而言,输入依赖性平滑度遭受了维数的诅咒,迫使方差函数具有低半弹性。另一方面,我们提供了一个理论和实用的框架,即使在严格的限制下,即使在有维度的诅咒的情况下,即使在存在维度的诅咒的情况下,也可以使用依赖输入的平滑。我们提供平滑方差功能的一种混凝土设计,并在CIFAR10和MNIST上进行测试。我们的设计减轻了经典平滑的一些问题,并正式下划线,但仍需要进一步改进设计。
translated by 谷歌翻译
我们理论上和经验地证明,对抗性鲁棒性可以显着受益于半体验学习。从理论上讲,我们重新审视了Schmidt等人的简单高斯模型。这显示了标准和稳健分类之间的示例复杂性差距。我们证明了未标记的数据桥接这种差距:简单的半体验学习程序(自我训练)使用相同数量的达到高标准精度所需的标签实现高的强大精度。经验上,我们增强了CiFar-10,使用50万微小的图像,使用了8000万微小的图像,并使用强大的自我训练来优于最先进的鲁棒精度(i)$ \ ell_ infty $鲁棒性通过对抗培训和(ii)认证$ \ ell_2 $和$ \ ell_ \ infty $鲁棒性通过随机平滑的几个强大的攻击。在SVHN上,添加DataSet自己的额外训练集,删除的标签提供了4到10个点的增益,在使用额外标签的1点之内。
translated by 谷歌翻译
最近的研究表明,深神经网络(DNN)易受对抗性攻击的影响,包括逃避和后门(中毒)攻击。在防守方面,有密集的努力,改善了对逃避袭击的经验和可怜的稳健性;然而,对后门攻击的可稳健性仍然很大程度上是未开发的。在本文中,我们专注于认证机器学习模型稳健性,反对一般威胁模型,尤其是后门攻击。我们首先通过随机平滑技术提供统一的框架,并展示如何实例化以证明对逃避和后门攻击的鲁棒性。然后,我们提出了第一个强大的培训过程Rab,以平滑训练有素的模型,并证明其稳健性对抗后门攻击。我们派生机学习模型的稳健性突出了培训的机器学习模型,并证明我们的鲁棒性受到紧张。此外,我们表明,可以有效地训练强大的平滑模型,以适用于诸如k最近邻分类器的简单模型,并提出了一种精确的平滑训练算法,该算法消除了从这种模型的噪声分布采样采样的需要。经验上,我们对MNIST,CIFAR-10和Imagenet数据集等DNN,差异私有DNN和K-NN模型等不同机器学习(ML)型号进行了全面的实验,并为反卧系攻击提供认证稳健性的第一个基准。此外,我们在SPAMBase表格数据集上评估K-NN模型,以展示所提出的精确算法的优点。对多元化模型和数据集的综合评价既有关于普通训练时间攻击的进一步强劲学习策略的多样化模型和数据集的综合评价。
translated by 谷歌翻译
我们介绍了嘈杂的特征混音(NFM),这是一个廉价但有效的数据增强方法,这些方法结合了基于插值的训练和噪声注入方案。不是用凸面的示例和它们的标签的凸面组合训练,而不是在输入和特征空间中使用对数据点对的噪声扰动凸组合。该方法包括混合和歧管混合作为特殊情况,但它具有额外的优点,包括更好地平滑决策边界并实现改进的模型鲁棒性。我们提供理论要理解这一点以及NFM的隐式正则化效果。与混合和歧管混合相比,我们的理论得到了经验结果的支持,展示了NFM的优势。我们表明,在一系列计算机视觉基准数据集中,使用NFM培训的剩余网络和视觉变压器在清洁数据的预测准确性和鲁棒性之间具有有利的权衡。
translated by 谷歌翻译
深度神经网络已成为现代图像识别系统的驱动力。然而,神经网络对抗对抗性攻击的脆弱性对受这些系统影响的人构成严重威胁。在本文中,我们专注于一个真实的威胁模型,中间对手恶意拦截和erturbs网页用户上传在线。这种类型的攻击可以在简单的性能下降之上提高严重的道德问题。为了防止这种攻击,我们设计了一种新的双层优化算法,该算法在对抗对抗扰动的自然图像附近找到点。CiFar-10和Imagenet的实验表明我们的方法可以有效地强制在给定的修改预算范围内的自然图像。我们还显示所提出的方法可以在共同使用随机平滑时提高鲁棒性。
translated by 谷歌翻译
Deep neural networks achieve high prediction accuracy when the train and test distributions coincide. In practice though, various types of corruptions occur which deviate from this setup and cause severe performance degradations. Few methods have been proposed to address generalization in the presence of unforeseen domain shifts. In particular, digital noise corruptions arise commonly in practice during the image acquisition stage and present a significant challenge for current robustness approaches. In this paper, we propose a diverse Gaussian noise consistency regularization method for improving robustness of image classifiers under a variety of noise corruptions while still maintaining high clean accuracy. We derive bounds to motivate and understand the behavior of our Gaussian noise consistency regularization using a local loss landscape analysis. We show that this simple approach improves robustness against various unforeseen noise corruptions by 4.2-18.4% over adversarial training and other strong diverse data augmentation baselines across several benchmarks. Furthermore, when combined with state-of-the-art diverse data augmentation techniques, experiments against state-of-the-art show our method further improves robustness accuracy by 3.7% and uncertainty calibration by 5.5% for all common corruptions on several image classification benchmarks.
translated by 谷歌翻译
对抗性的鲁棒性已经成为深度学习的核心目标,无论是在理论和实践中。然而,成功的方法来改善对抗的鲁棒性(如逆势训练)在不受干扰的数据上大大伤害了泛化性能。这可能会对对抗性鲁棒性如何影响现实世界系统的影响(即,如果它可以提高未受干扰的数据的准确性),许多人可能选择放弃鲁棒性)。我们提出内插对抗培训,该培训最近雇用了在对抗培训框架内基于插值的基于插值的培训方法。在CiFar -10上,对抗性训练增加了标准测试错误(当没有对手时)从4.43%到12.32%,而我们的内插对抗培训我们保留了对抗性的鲁棒性,同时实现了仅6.45%的标准测试误差。通过我们的技术,强大模型标准误差的相对增加从178.1%降至仅为45.5%。此外,我们提供内插对抗性培训的数学分析,以确认其效率,并在鲁棒性和泛化方面展示其优势。
translated by 谷歌翻译
最近,张等人。(2021)基于$ \ ell_ \ infty $ -distance函数开发出一种新的神经网络架构,自然拥有经过认证的$ \ ell_ \ infty $坚固的稳健性。尽管具有出色的理论特性,但到目前为止的模型只能实现与传统网络的可比性。在本文中,我们通过仔细分析培训流程,大大提高了$ \ ell_ \ infty $ -distance网的认证稳健性。特别是,我们展示了$ \ ell_p $ -rexation,这是克服模型的非平滑度的关键方法,导致早期训练阶段的意外的大型嘴唇浓度。这使得优化不足以使用铰链损耗并产生次优溶液。鉴于这些调查结果,我们提出了一种简单的方法来解决上述问题,设计一种新的客观函数,这些功能将缩放的跨熵损失结合在剪切铰链损失。实验表明,使用拟议的培训策略,$ \ ell_ \ infty $-distance网的认证准确性可以从Cifar-10($ \ epsilon = 8/255 $)的33.30%到40.06%的显着提高到40.06%,同时显着优于表现优势该地区的其他方法。我们的结果清楚地展示了$ \ ell_ \ infty $-distance净的有效性和潜力,以获得认证的稳健性。代码在https://github.com/zbh2047/l_inf-dist-net-v2上获得。
translated by 谷歌翻译
到目前为止对抗训练是抵御对抗例子的最有效的策略。然而,由于每个训练步骤中的迭代对抗性攻击,它遭受了高的计算成本。最近的研究表明,通过随机初始化执行单步攻击,可以实现快速的对抗训练。然而,这种方法仍然落后于稳定性和模型稳健性的最先进的对手训练算法。在这项工作中,我们通过观察随机平滑的随机初始化来更好地优化内部最大化问题,对快速对抗培训进行新的理解。在这种新的视角之后,我们还提出了一种新的初始化策略,向后平滑,进一步提高单步强大培训方法的稳定性和模型稳健性。多个基准测试的实验表明,我们的方法在使用更少的训练时间(使用相同的培训计划时,使用更少的培训时间($ \ sim $ 3x改进)时,我们的方法达到了类似的模型稳健性。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
对共同腐败的稳健性的文献表明对逆势培训是否可以提高这种环境的性能,没有达成共识。 First, we show that, when used with an appropriately selected perturbation radius, $\ell_p$ adversarial training can serve as a strong baseline against common corruptions improving both accuracy and calibration.然后,我们解释了为什么对抗性训练比具有简单高斯噪声的数据增强更好地表现,这被观察到是对共同腐败的有意义的基线。与此相关,我们确定了高斯增强过度适用于用于培训的特定标准偏差的$ \ sigma $ -oviting现象,这对培训具有显着不利影响的普通腐败精度。我们讨论如何缓解这一问题,然后如何通过学习的感知图像贴片相似度引入对抗性训练的有效放松来进一步增强$ \ ell_p $普发的培训。通过对CiFar-10和Imagenet-100的实验,我们表明我们的方法不仅改善了$ \ ell_p $普发的培训基线,而且还有累积的收益与Augmix,Deepaulment,Ant和Sin等数据增强方法,导致普通腐败的最先进的表现。我们的实验代码在HTTPS://github.com/tml-epfl/adv-training - 窗子上公开使用。
translated by 谷歌翻译
对抗性培训(AT)已成为培训强大网络的热门选择。然而,它倾向于牺牲清洁精度,以令人满意的鲁棒性,并且遭受大的概括误差。为了解决这些问题,我们提出了平稳的对抗培训(SAT),以我们对损失令人歉端的损失的终人谱指导。 We find that curriculum learning, a scheme that emphasizes on starting "easy" and gradually ramping up on the "difficulty" of training, smooths the adversarial loss landscape for a suitably chosen difficulty metric.我们展示了对普通环境中的课程学习的一般制定,并提出了一种基于最大Hessian特征值(H-SAT)和软MAX概率(P-SA)的两个难度指标。我们展示SAT稳定网络培训即使是大型扰动规范,并且允许网络以更好的清洁精度运行而与鲁棒性权衡曲线相比。与AT,交易和其他基线相比,这导致清洁精度和鲁棒性的显着改善。为了突出一些结果,我们的最佳模型将分别在CIFAR-100上提高6%和1%的稳健准确性。在Imagenette上,一个十一级想象成的子集,我们的模型分别以正常和强大的准确性达到23%和3%。
translated by 谷歌翻译
现有针对对抗性示例(例如对抗训练)的防御能力通常假设对手将符合特定或已知的威胁模型,例如固定预算内的$ \ ell_p $扰动。在本文中,我们关注的是在训练过程中辩方假设的威胁模型中存在不匹配的情况,以及在测试时对手的实际功能。我们问一个问题:学习者是否会针对特定的“源”威胁模型进行训练,我们什么时候可以期望鲁棒性在测试时间期间概括为更强大的未知“目标”威胁模型?我们的主要贡献是通过不可预见的对手正式定义学习和概括的问题,这有助于我们从常规的对手的传统角度来理解对抗风险的增加。应用我们的框架,我们得出了将源和目标威胁模型之间的概括差距与特征提取器变化相关联的概括,该限制衡量了在给定威胁模型中提取的特征之间的预期最大差异。基于我们的概括结合,我们提出了具有变化正则化(AT-VR)的对抗训练,该训练在训练过程中降低了特征提取器在源威胁模型中的变化。我们从经验上证明,与标准的对抗训练相比,AT-VR可以改善测试时间内的概括,从而无法预见。此外,我们将变异正则化与感知对抗训练相结合[Laidlaw等。 2021]以实现不可预见的攻击的最新鲁棒性。我们的代码可在https://github.com/inspire-group/variation-regularization上公开获取。
translated by 谷歌翻译
It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ( ∞ and 2 ). Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by simply using early stopping. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting. Finally, we study several classical and modern deep learning remedies for overfitting, including regularization and data augmentation, and find that no approach in isolation improves significantly upon the gains achieved by early stopping. All code for reproducing the experiments as well as pretrained model weights and training logs can be found at https://github.com/ locuslab/robust_overfitting.
translated by 谷歌翻译