We propose a method to learn deep ReLU-based classifiers that are provably robust against normbounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded ∞ norm less than = 0.1), and code for all experiments is available at http://github.com/ locuslab/convex_adversarial.
translated by 谷歌翻译
While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most = 0.1 can cause more than 35% test error.
translated by 谷歌翻译
Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs-imperceptibly but adversarially perturbed versions of natural inputs. We have witnessed an arms race between defenders who attempt to train robust networks and attackers who try to construct adversarial examples. One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family. These certified defenses are based on convex relaxations which construct an upper bound on the worst case loss over all attackers in the family. Previous relaxations are loose on networks that are not trained against the respective relaxation. In this paper, we propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. We show that our proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different foreign networks whose training objectives are agnostic to our proposed relaxation.32nd Conference on Neural Information Processing Systems (NIPS 2018),
translated by 谷歌翻译
To rigorously certify the robustness of neural networks to adversarial perturbations, most state-of-the-art techniques rely on a triangle-shaped linear programming (LP) relaxation of the ReLU activation. While the LP relaxation is exact for a single neuron, recent results suggest that it faces an inherent "convex relaxation barrier" as additional activations are added, and as the attack budget is increased. In this paper, we propose a nonconvex relaxation for the ReLU relaxation, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. We show that the nonconvex relaxation has a similar complexity to the LP relaxation, but enjoys improved tightness that is comparable to the much more expensive SDP relaxation. Despite nonconvexity, we prove that the verification problem satisfies constraint qualification, and therefore a Riemannian staircase approach is guaranteed to compute a near-globally optimal solution in polynomial time. Our experiments provide evidence that our nonconvex relaxation almost completely overcome the "convex relaxation barrier" faced by the LP relaxation.
translated by 谷歌翻译
在本讨论文件中,我们调查了有关机器学习模型鲁棒性的最新研究。随着学习算法在数据驱动的控制系统中越来越流行,必须确保它们对数据不确定性的稳健性,以维持可靠的安全至关重要的操作。我们首先回顾了这种鲁棒性的共同形式主义,然后继续讨论训练健壮的机器学习模型的流行和最新技术,以及可证明这种鲁棒性的方法。从强大的机器学习的这种统一中,我们识别并讨论了该地区未来研究的迫切方向。
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
许多最先进的对抗性培训方法利用对抗性损失的上限来提供安全保障。然而,这些方法需要在每个训练步骤中计算,该步骤不能包含在梯度中的梯度以进行反向化。我们基于封闭形式的对抗性损失的封闭溶液引入了一种新的更具内容性的对抗性培训,可以有效地培养了背部衰退。通过稳健优化的最先进的工具促进了这一界限。我们使用我们的方法推出了两种新方法。第一种方法(近似稳健的上限或arub)使用网络的第一阶近似以及来自线性鲁棒优化的基本工具,以获得可以容易地实现的对抗丢失的近似偏置。第二种方法(鲁棒上限或摩擦)计算对抗性损失的精确上限。在各种表格和视觉数据集中,我们展示了我们更加原则的方法的有效性 - 摩擦比最先进的方法更强大,而是较大的扰动的最新方法,而谷会匹配的性能 - 小扰动的艺术方法。此外,摩擦和灌注速度比标准对抗性培训快(以牺牲内存增加)。重现结果的所有代码都可以在https://github.com/kimvc7/trobustness找到。
translated by 谷歌翻译
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NPcomplete problem. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or deliver low quality bounds that are too loose to be useful. In this paper, we exploit the special structure of ReLU networks and provide two computationally efficient algorithms (Fast-Lin,Fast-Lip) that are able to certify non-trivial lower bounds of minimum adversarial distortions. Experiments show that (1) our methods deliver bounds close to (the gap is 2-3X) exact minimum distortions found by Reluplex in small networks while our algorithms are more than 10,000 times faster; (2) our methods deliver similar quality of bounds (the gap is within 35% and usually around 10%; sometimes our bounds are even better) for larger networks compared to the methods based on solving linear programming problems but our algorithms are 33-14,000 times faster; (3) our method is capable of solving large MNIST and CIFAR networks up to 7 layers with more than 10,000 neurons within tens of seconds on a single CPU core. In addition, we show that there is no polynomial time algorithm that can approximately find the minimum 1 adversarial distortion of a ReLU network with a 0.99 ln n approximation ratio unless NP=P, where n is the number of neurons in the network.
translated by 谷歌翻译
人工神经网络(ANN)训练景观的非凸起带来了固有的优化困难。虽然传统的背传播随机梯度下降(SGD)算法及其变体在某些情况下是有效的,但它们可以陷入杂散的局部最小值,并且对初始化和普通公共表敏感。最近的工作表明,随着Relu激活的ANN的培训可以重新重整为凸面计划,使希望能够全局优化可解释的ANN。然而,天真地解决凸训练制剂具有指数复杂性,甚至近似启发式需要立方时间。在这项工作中,我们描述了这种近似的质量,并开发了两个有效的算法,这些算法通过全球收敛保证培训。第一算法基于乘法器(ADMM)的交替方向方法。它解决了精确的凸形配方和近似对应物。实现线性全局收敛,并且初始几次迭代通常会产生具有高预测精度的解决方案。求解近似配方时,每次迭代时间复杂度是二次的。基于“采样凸面”理论的第二种算法更简单地实现。它解决了不受约束的凸形制剂,并收敛到大约全球最佳的分类器。当考虑对抗性培训时,ANN训练景观的非凸起加剧了。我们将稳健的凸优化理论应用于凸训练,开发凸起的凸起制剂,培训Anns对抗对抗投入。我们的分析明确地关注一个隐藏层完全连接的ANN,但可以扩展到更复杂的体系结构。
translated by 谷歌翻译
基于基于不完整的神经网络验证如冠的绑定传播非常有效,可以显着加速基于神经网络的分支和绑定(BAB)。然而,绑定的传播不能完全处理由昂贵的线性编程(LP)求解器的BAB常规引入的神经元分割限制,导致界限和损伤验证效率。在这项工作中,我们开发了一种基于$ \ beta $ -cra所做的,一种基于新的绑定传播方法,可以通过从原始或双空间构造的可优化参数$ \ beta $完全编码神经元分割。当在中间层中联合优化时,$ \ Beta $ -CROWN通常会产生比具有神经元分裂约束的典型LP验证更好的界限,同时像GPU上的皇冠一样高效且并行化。适用于完全稳健的验证基准,使用BAB的$ \ Beta $ -CROWN比基于LP的BAB方法快三个数量级,并且比所有现有方法更快,同时产生较低的超时率。通过早期终止BAB,我们的方法也可用于有效的不完整验证。与强大的不完整验证者相比,我们始终如一地在许多设置中获得更高的验证准确性,包括基于凸屏障破碎技术的验证技术。与最严重但非常昂贵的Semidefinite编程(SDP)的不完整验证者相比,我们获得了更高的验证精度,验证时间较少三个级。我们的算法授权$ \ alpha,\ \β$ -craft(Alpha-Beta-Crown)验证者,VNN-Comp 2021中的获胜工具。我们的代码可在http://papercode.cc/betacrown提供
translated by 谷歌翻译
分析深神经网络对输入扰动的最坏情况的性能等于解决一个大规模的非凸优化问题,过去的几项工作提出了凸出的放松作为有希望的替代方案。但是,即使对于合理的神经网络,这些放松也无法处理,因此必须在实践中被较弱的放松所取代。在这项工作中,我们提出了一种新型的操作员分裂方法,该方法可以将问题直接解决至高精度的凸松弛,从而将其拆分为经常具有分析溶液的较小的子问题。该方法是模块化的,范围为非常大的问题实例,并损害了与GPU加速的快速并行化的操作。我们展示了我们在图像分类和强化学习设置以及神经网络动力学系统的可及性分析中界定大型卷积网络最差的方法的方法。
translated by 谷歌翻译
本文介绍了OptNet,该网络架构集成了优化问题(这里,专门以二次程序的形式),作为较大端到端可训练的深网络中的单个层。这些层在隐藏状态之间编码约束和复杂依赖性,传统的卷积和完全连接的层通常无法捕获。我们探索这种架构的基础:我们展示了如何使用敏感性分析,彼得优化和隐式差分的技术如何通过这些层和相对于层参数精确地区分;我们为这些层开发了一种高效的解算器,用于利用基于GPU的基于GPU的批处理在原始 - 双内部点法中解决,并且在求解的顶部几乎没有额外的成本提供了反向衰减梯度;我们突出了这些方法在几个问题中的应用。在一个值得注意的示例中,该方法学习仅在输入和输出游戏中播放Mini-sudoku(4x4),没有关于游戏规则的a-priori信息;这突出了OptNet比其他神经架构更好地学习硬限制的能力。
translated by 谷歌翻译
深度神经网络(DNN)的巨大进步导致了各种任务的最先进的性能。然而,最近的研究表明,DNNS容易受到对抗的攻击,这在将这些模型部署到自动驾驶等安全关键型应用时,这使得非常关注。已经提出了不同的防御方法,包括:a)经验防御,通常可以在不提供稳健性认证的情况下再次再次攻击; b)可认真的稳健方法,由稳健性验证组成,提供了在某些条件下的任何攻击和相应的强大培训方法中的稳健准确性的下限。在本文中,我们系统化了可认真的稳健方法和相关的实用和理论意义和调查结果。我们还提供了在不同数据集上现有的稳健验证和培训方法的第一个全面基准。特别是,我们1)为稳健性验证和培训方法提供分类,以及总结代表性算法的方法,2)揭示这些方法中的特征,优势,局限性和基本联系,3)讨论当前的研究进展情况TNN和4的可信稳健方法的理论障碍,主要挑战和未来方向提供了一个开放的统一平台,以评估超过20种代表可认真的稳健方法,用于各种DNN。
translated by 谷歌翻译
尽管深度神经网络(DNN)在感知和控制任务中表现出令人难以置信的性能,但几个值得信赖的问题仍然是开放的。其中一个最讨论的主题是存在对抗扰动的存在,它在能够量化给定输入的稳健性的可提供技术上开辟了一个有趣的研究线。在这方面,来自分类边界的输入的欧几里德距离表示良好被证明的鲁棒性评估,作为最小的经济适用的逆势扰动。不幸的是,由于NN的非凸性质,计算如此距离非常复杂。尽管已经提出了几种方法来解决这个问题,但据我们所知,没有提出可证明的结果来估计和绑定承诺的错误。本文通过提出两个轻量级策略来寻找最小的对抗扰动来解决这个问题。不同于现有技术,所提出的方法允许与理论上的近似距离的误差估计理论配制。最后,据报道,据报道了大量实验来评估算法的性能并支持理论发现。所获得的结果表明,该策略近似于靠近分类边界的样品的理论距离,导致可提供对任何对抗攻击的鲁棒性保障。
translated by 谷歌翻译
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the l p -norms for p ∈ {1, 2, ∞} aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to stateof-the-art attacks which are partially specialized to one l p -norm, and is robust to the phenomenon of gradient masking.
translated by 谷歌翻译
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by 11.41% in terms of mean 2 perturbation distance.
translated by 谷歌翻译
从外界培训的机器学习模型可能会被数据中毒攻击损坏,将恶意指向到模型的培训集中。对这些攻击的常见防御是数据消毒:在培训模型之前首先过滤出异常培训点。在本文中,我们开发了三次攻击,可以绕过广泛的常见数据消毒防御,包括基于最近邻居,训练损失和奇异值分解的异常探测器。通过增加3%的中毒数据,我们的攻击成功地将Enron垃圾邮件检测数据集的测试错误从3%增加到24%,并且IMDB情绪分类数据集从12%到29%。相比之下,没有明确占据这些数据消毒防御的现有攻击被他们击败。我们的攻击基于两个想法:(i)我们协调我们的攻击将中毒点彼此放置在彼此附近,(ii)我们将每个攻击制定为受限制的优化问题,限制旨在确保中毒点逃避检测。随着这种优化涉及解决昂贵的Bilevel问题,我们的三个攻击对应于基于影响功能的近似近似这个问题的方式; minimax二元性;和karush-kuhn-tucker(kkt)条件。我们的结果强调了对数据中毒攻击产生更强大的防御的必要性。
translated by 谷歌翻译
当与分支和界限结合使用时,结合的传播方法是正式验证深神经网络(例如正确性,鲁棒性和安全性)的最有效方法之一。但是,现有作品无法处理在传统求解器中广泛接受的切割平面限制的一般形式,这对于通过凸出凸松弛的加强验证者至关重要。在本文中,我们概括了结合的传播程序,以允许添加任意切割平面的约束,包括涉及放宽整数变量的限制,这些变量未出现在现有的结合传播公式中。我们的广义结合传播方法GCP-crown为应用一般切割平面方法}开辟了一个机会进行神经网络验证,同时受益于结合传播方法的效率和GPU加速。作为案例研究,我们研究了由现成的混合整数编程(MIP)求解器生成的切割平面的使用。我们发现,MIP求解器可以生成高质量的切割平面,以使用我们的新配方来增强基于界限的验证者。由于以分支为重点的绑定传播程序和切削平面的MIP求解器可以使用不同类型的硬件(GPU和CPU)并行运行,因此它们的组合可以迅速探索大量具有强切割平面的分支,从而导致强大的分支验证性能。实验表明,与VNN-Comp 2021中最佳工具相比,我们的方法是第一个可以完全求解椭圆形的基准并验证椭圆21基准的两倍的验证者,并且在oval21基准测试中的最佳工具也明显超过了最先进的验证器。广泛的基准。 GCP-Crown是$ \ alpha $,$ \ beta $ -Crown验证者,VNN-COMP 2022获奖者的一部分。代码可在http://papercode.cc/gcp-crown上获得
translated by 谷歌翻译
We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the 2 norm. This "randomized smoothing" technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in 2 norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with 2 norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified 2 robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http: //github.com/locuslab/smoothing.
translated by 谷歌翻译
最近的作品试图通过对比原始扰动大的域进行攻击,并在目标中增加各种正则化项,从而提高受对抗训练的网络的验证性。但是,这些算法表现不佳或需要复杂且昂贵的舞台训练程序,从而阻碍了其实际适用性。我们提出了IBP-R,这是一种新颖的经过验证的培训算法,既简单又有效。 IBP-R通过基于廉价的间隔结合传播对扩大域的对抗域进行对抗性攻击来诱导网络可验证性,从而最大程度地减少了非凸vex验证问题与其近似值之间的差距。通过利用最近的分支机构和结合的框架,我们表明IBP-R获得了最先进的核能 - 智能权准折衷,而在CIFAR-10上进行了小型扰动,而培训的速度明显快于相关的先前工作。此外,我们提出了一种新颖的分支策略,该策略依赖于基于$ \ beta $ crown的简单启发式,可降低最先进的分支分支算法的成本,同时产生可比质量的分裂。
translated by 谷歌翻译