我们从频道明智激活的角度调查CNN的对抗性鲁棒性。通过比较\ Textit {非鲁棒}(通常训练)和\ exingit {REXITIT {REARUSTIFIED}(普及培训的)模型,我们观察到对抗性培训(AT)通过将频道明智的数据与自然的渠道和自然的对抗激活对齐来强调CNN同行。然而,在处理逆势数据时仍仍会过度激活以\ texit {excy-computive}(nr)的频道仍会过度激活。此外,我们还观察到,在所有课程上不会导致类似的稳健性。对于强大的类,具有较大激活大小的频道通常是更长的\ extedit {正相关}(pr)到预测,但这种对齐不适用于非鲁棒类。鉴于这些观察结果,我们假设抑制NR通道并对齐PR与其相关性进一步增强了在其下的CNN的鲁棒性。为了检查这个假设,我们介绍了一种新的机制,即\下划线{C} Hannel-Wise \ Underline {i} Mportance的\下划线{F} eature \ Underline {s}选举(CIFS)。 CIFS通过基于与预测的相关性产生非负乘法器来操纵某些层的激活。在包括CIFAR10和SVHN的基准数据集上的广泛实验明确验证了强制性CNN的假设和CIFS的有效性。 \ url {https://github.com/hanshuyan/cifs}
translated by 谷歌翻译
神经常规差分方程(ODES)最近在各种研究域中引起了不断的关注。有一些作品研究了神经杂物的优化问题和近似能力,但他们的鲁棒性尚不清楚。在这项工作中,我们通过探索神经杂物经验和理论上的神经杂物的鲁棒性质来填补这一重要差异。我们首先通过将它们暴露于具有各种类型的扰动并随后研究相应输出的变化来提出基于神经竞争的网络(odeNets)的鲁棒性的实证研究。与传统的卷积神经网络(CNNS)相反,我们发现odeenets对随机高斯扰动和对抗性攻击示例的更稳健。然后,我们通过利用连续时间颂的流动的某种理想性能来提供对这种现象的富有识别理解,即积分曲线是非交叉的。我们的工作表明,由于其内在的稳健性,它很有希望使用神经杂散作为构建强大的深网络模型的基本块。为了进一步增强香草神经杂物杂物的鲁棒性,我们提出了时间不变的稳定神经颂(Tisode),其通过时间不变性和施加稳态约束来规则地规则地规则地对扰动数据的流程。我们表明,Tisode方法优于香草神经杂物,也可以与其他最先进的架构方法一起制造更强大的深网络。 \ url {https://github.com/hanshuyan/tisode}
translated by 谷歌翻译
Adversarial training (AT) is one of the most effective ways for improving the robustness of deep convolution neural networks (CNNs). Just like common network training, the effectiveness of AT relies on the design of basic network components. In this paper, we conduct an in-depth study on the role of the basic ReLU activation component in AT for robust CNNs. We find that the spatially-shared and input-independent properties of ReLU activation make CNNs less robust to white-box adversarial attacks with either standard or adversarial training. To address this problem, we extend ReLU to a novel Sparta activation function (Spatially attentive and Adversarially Robust Activation), which enables CNNs to achieve both higher robustness, i.e., lower error rate on adversarial examples, and higher accuracy, i.e., lower error rate on clean examples, than the existing state-of-the-art (SOTA) activation functions. We further study the relationship between Sparta and the SOTA activation functions, providing more insights about the advantages of our method. With comprehensive experiments, we also find that the proposed method exhibits superior cross-CNN and cross-dataset transferability. For the former, the adversarially trained Sparta function for one CNN (e.g., ResNet-18) can be fixed and directly used to train another adversarially robust CNN (e.g., ResNet-34). For the latter, the Sparta function trained on one dataset (e.g., CIFAR-10) can be employed to train adversarially robust CNNs on another dataset (e.g., SVHN). In both cases, Sparta leads to CNNs with higher robustness than the vanilla ReLU, verifying the flexibility and versatility of the proposed method.
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question-do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel formulation of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial data (i.e., friendly adversarial data) minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively-adversarial robustness can indeed be achieved without compromising the natural generalization.* Equal contribution † Preliminary work was done during an internship at RIKEN AIP.
translated by 谷歌翻译
对抗性例子的现象说明了深神经网络最基本的漏洞之一。在推出这一固有的弱点的各种技术中,对抗性训练已成为学习健壮模型的最有效策略。通常,这是通过平衡强大和自然目标来实现的。在这项工作中,我们旨在通过执行域不变的功能表示,进一步优化鲁棒和标准准确性之间的权衡。我们提出了一种新的对抗训练方法,域不变的对手学习(DIAL),该方法学习了一个既健壮又不变的功能表示形式。拨盘使用自然域及其相应的对抗域上的域对抗神经网络(DANN)的变体。在源域由自然示例组成和目标域组成的情况下,是对抗性扰动的示例,我们的方法学习了一个被限制的特征表示,以免区分自然和对抗性示例,因此可以实现更强大的表示。拨盘是一种通用和模块化技术,可以轻松地将其纳入任何对抗训练方法中。我们的实验表明,将拨号纳入对抗训练过程中可以提高鲁棒性和标准精度。
translated by 谷歌翻译
已知深度神经网络(DNN)容易受到用不可察觉的扰动制作的对抗性示例的影响,即,输入图像的微小变化会引起错误的分类,从而威胁着基于深度学习的部署系统的可靠性。经常采用对抗训练(AT)来通过训练损坏和干净的数据的混合物来提高DNN的鲁棒性。但是,大多数基于AT的方法在处理\ textit {转移的对抗示例}方面是无效的,这些方法是生成以欺骗各种防御模型的生成的,因此无法满足现实情况下提出的概括要求。此外,对抗性训练一般的国防模型不能对具有扰动的输入产生可解释的预测,而不同的领域专家则需要一个高度可解释的强大模型才能了解DNN的行为。在这项工作中,我们提出了一种基于Jacobian规范和选择性输入梯度正则化(J-SIGR)的方法,该方法通过Jacobian归一化提出了线性化的鲁棒性,还将基于扰动的显着性图正规化,以模仿模型的可解释预测。因此,我们既可以提高DNN的防御能力和高解释性。最后,我们评估了跨不同体系结构的方法,以针对强大的对抗性攻击。实验表明,提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性,我们还表明,来自神经网络的预测易于解释。
translated by 谷歌翻译
Deep Neural Networks are vulnerable to adversarial attacks. Among many defense strategies, adversarial training with untargeted attacks is one of the most effective methods. Theoretically, adversarial perturbation in untargeted attacks can be added along arbitrary directions and the predicted labels of untargeted attacks should be unpredictable. However, we find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs become virtual targets of each other. This study investigates the impact of such closely-coupled classes on adversarial attacks and develops a self-paced reweighting strategy in adversarial training accordingly. Specifically, we propose to upweight hard-class pair losses in model optimization, which prompts learning discriminative features from hard classes. We further incorporate a term to quantify hard-class pair consistency in adversarial training, which greatly boosts model robustness. Extensive experiments show that the proposed adversarial training method achieves superior robustness performance over state-of-the-art defenses against a wide range of adversarial attacks.
translated by 谷歌翻译
Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 -it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ∼10%. Code is available at https://github.com/facebookresearch/ ImageNet-Adversarial-Training.
translated by 谷歌翻译
深度神经网络(DNNS)最近在许多分类任务中取得了巨大的成功。不幸的是,它们容易受到对抗性攻击的影响,这些攻击会产生对抗性示例,这些示例具有很小的扰动,以欺骗DNN模型,尤其是在模型共享方案中。事实证明,对抗性训练是最有效的策略,它将对抗性示例注入模型训练中,以提高DNN模型的稳健性,以对对抗性攻击。但是,基于现有的对抗性示例的对抗训练无法很好地推广到标准,不受干扰的测试数据。为了在标准准确性和对抗性鲁棒性之间取得更好的权衡,我们提出了一个新型的对抗训练框架,称为潜在边界引导的对抗训练(梯子),该训练(梯子)在潜在的边界引导的对抗性示例上对对手进行对手训练DNN模型。与大多数在输入空间中生成对抗示例的现有方法相反,梯子通过增加对潜在特征的扰动而产生了无数的高质量对抗示例。扰动是沿SVM构建的具有注意机制的决策边界的正常情况进行的。我们从边界场的角度和可视化视图分析了生成的边界引导的对抗示例的优点。与Vanilla DNN和竞争性底线相比,对MNIST,SVHN,CELEBA和CIFAR-10的广泛实验和详细分析验证了梯子在标准准确性和对抗性鲁棒性之间取得更好的权衡方面的有效性。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译
通过快速梯度符号方法(FGSM)生成的样品(也称为FGSM-AT)生成的样品是一种计算上的简单方法,可以训练训练强大的网络。然而,在训练过程中,在Arxiv:2001.03994 [CS.LG]中发现了一种不稳定的“灾难性过度拟合”模式,在单个训练步骤中,强大的精度突然下降到零。现有方法使用梯度正规化器或随机初始化技巧来减轻此问题,而它们要么承担高计算成本或导致较低的稳健精度。在这项工作中,我们提供了第一项研究,该研究从三个角度彻底研究了技巧的集合:数据初始化,网络结构和优化,以克服FGSM-AT中的灾难性过度拟合。令人惊讶的是,我们发现简单的技巧,即a)掩盖部分像素(即使没有随机性),b)设置较大的卷积步幅和平滑的激活功能,或c)正规化第一卷积层的重量,可以有效地应对过度拟合问题。对一系列网络体系结构的广泛结果验证了每个提出的技巧的有效性,还研究了技巧的组合。例如,在CIFAR-10上接受了PREACTRESNET-18培训,我们的方法对PGD-50攻击者的准确性为49.8%,并且针对AutoAttack的精度为46.4%,这表明Pure FGSM-AT能够启用健壮的学习者。代码和模型可在https://github.com/ucsc-vlaa/bag-of-tricks-for-for-fgsm-at上公开获得。
translated by 谷歌翻译
Designing powerful adversarial attacks is of paramount importance for the evaluation of $\ell_p$-bounded adversarial defenses. Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries. The search space of PGD is dictated by the steepest ascent directions of an objective. Despite the plethora of objective function choices, there is no universally superior option and robustness overestimation may arise from ill-suited objective selection. Driven by this observation, we postulate that the combination of different objectives through a simple loss alternating scheme renders PGD more robust towards design choices. We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $\ell_{\infty}$-robust models and 3 datasets. The performance improvement is consistent, when compared to the single loss counterparts. In the CIFAR-10 dataset, our strongest adversarial attack outperforms all of the white-box components of AutoAttack (AA) ensemble, as well as the most powerful attacks existing on the literature, achieving state-of-the-art results in the computational budget of our study ($T=100$, no restarts).
translated by 谷歌翻译
深度神经网络已成为现代图像识别系统的驱动力。然而,神经网络对抗对抗性攻击的脆弱性对受这些系统影响的人构成严重威胁。在本文中,我们专注于一个真实的威胁模型,中间对手恶意拦截和erturbs网页用户上传在线。这种类型的攻击可以在简单的性能下降之上提高严重的道德问题。为了防止这种攻击,我们设计了一种新的双层优化算法,该算法在对抗对抗扰动的自然图像附近找到点。CiFar-10和Imagenet的实验表明我们的方法可以有效地强制在给定的修改预算范围内的自然图像。我们还显示所提出的方法可以在共同使用随机平滑时提高鲁棒性。
translated by 谷歌翻译
由明确的反对派制作的对抗例子在机器学习中引起了重要的关注。然而,潜在虚假朋友带来的安全风险基本上被忽视了。在本文中,我们揭示了虚伪的例子的威胁 - 最初被错误分类但是虚假朋友扰乱的投入,以强迫正确的预测。虽然这种扰动的例子似乎是无害的,但我们首次指出,它们可能是恶意地用来隐瞒评估期间不合格(即,不如所需)模型的错误。一旦部署者信任虚伪的性能并在真实应用程序中应用“良好的”模型,即使在良性环境中也可能发生意外的失败。更严重的是,这种安全风险似乎是普遍存在的:我们发现许多类型的不合标准模型易受多个数据集的虚伪示例。此外,我们提供了第一次尝试,以称为虚伪风险的公制表征威胁,并试图通过一些对策来规避它。结果表明对策的有效性,即使在自适应稳健的培训之后,风险仍然是不可忽视的。
translated by 谷歌翻译
Neural networks are vulnerable to adversarial examples, which poses a threat to their application in security sensitive systems. We propose high-level representation guided denoiser (HGD) as a defense for image classification. Standard denoiser suffers from the error amplification effect, in which small residual adversarial noise is progressively amplified and leads to wrong classifications. HGD overcomes this problem by using a loss function defined as the difference between the target model's outputs activated by the clean image and denoised image. Compared with ensemble adversarial training which is the state-of-the-art defending method on large images, HGD has three advantages. First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes. Third, HGD can be transferred to defend models other than the one guiding it. In NIPS competition on defense against adversarial attacks, our HGD solution won the first place and outperformed other models by a large margin. 1 * Equal contribution.
translated by 谷歌翻译
对抗性的鲁棒性已经成为深度学习的核心目标,无论是在理论和实践中。然而,成功的方法来改善对抗的鲁棒性(如逆势训练)在不受干扰的数据上大大伤害了泛化性能。这可能会对对抗性鲁棒性如何影响现实世界系统的影响(即,如果它可以提高未受干扰的数据的准确性),许多人可能选择放弃鲁棒性)。我们提出内插对抗培训,该培训最近雇用了在对抗培训框架内基于插值的基于插值的培训方法。在CiFar -10上,对抗性训练增加了标准测试错误(当没有对手时)从4.43%到12.32%,而我们的内插对抗培训我们保留了对抗性的鲁棒性,同时实现了仅6.45%的标准测试误差。通过我们的技术,强大模型标准误差的相对增加从178.1%降至仅为45.5%。此外,我们提供内插对抗性培训的数学分析,以确认其效率,并在鲁棒性和泛化方面展示其优势。
translated by 谷歌翻译
作为反对攻击的最有效的防御方法之一,对抗性训练倾向于学习包容性的决策边界,以提高深度学习模型的鲁棒性。但是,由于沿对抗方向的边缘的大幅度和不必要的增加,对抗性训练会在自然实例和对抗性示例之间引起严重的交叉,这不利于平衡稳健性和自然准确性之间的权衡。在本文中,我们提出了一种新颖的对抗训练计划,以在稳健性和自然准确性之间进行更好的权衡。它旨在学习一个中度包容的决策边界,这意味着决策边界下的自然示例的边缘是中等的。我们称此方案为中等边缘的对抗训练(MMAT),该方案生成更细粒度的对抗示例以减轻交叉问题。我们还利用了经过良好培训的教师模型的逻辑来指导我们的模型学习。最后,MMAT在Black-Box和White-Box攻击下都可以实现高自然的精度和鲁棒性。例如,在SVHN上,实现了最新的鲁棒性和自然精度。
translated by 谷歌翻译
在测试时间进行优化的自适应防御能力有望改善对抗性鲁棒性。我们对这种自适应测试时间防御措施进行分类,解释其潜在的好处和缺点,并评估图像分类的最新自适应防御能力的代表性。不幸的是,经过我们仔细的案例研究评估时,没有任何显着改善静态防御。有些甚至削弱了基本静态模型,同时增加了推理计算。尽管这些结果令人失望,但我们仍然认为自适应测试时间防御措施是一项有希望的研究途径,因此,我们为他们的彻底评估提供了建议。我们扩展了Carlini等人的清单。(2019年)通过提供针对自适应防御的具体步骤。
translated by 谷歌翻译