在本讨论文件中,我们调查了有关机器学习模型鲁棒性的最新研究。随着学习算法在数据驱动的控制系统中越来越流行,必须确保它们对数据不确定性的稳健性,以维持可靠的安全至关重要的操作。我们首先回顾了这种鲁棒性的共同形式主义,然后继续讨论训练健壮的机器学习模型的流行和最新技术,以及可证明这种鲁棒性的方法。从强大的机器学习的这种统一中,我们识别并讨论了该地区未来研究的迫切方向。
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by 11.41% in terms of mean 2 perturbation distance.
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
We propose a method to learn deep ReLU-based classifiers that are provably robust against normbounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded ∞ norm less than = 0.1), and code for all experiments is available at http://github.com/ locuslab/convex_adversarial.
translated by 谷歌翻译
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate adversarial risk as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as obscurity to an adversary, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
translated by 谷歌翻译
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with stronger robustness to blackbox attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks (Kurakin et al., 2017c). However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
translated by 谷歌翻译
深度神经网络(DNN)的巨大进步导致了各种任务的最先进的性能。然而,最近的研究表明,DNNS容易受到对抗的攻击,这在将这些模型部署到自动驾驶等安全关键型应用时,这使得非常关注。已经提出了不同的防御方法,包括:a)经验防御,通常可以在不提供稳健性认证的情况下再次再次攻击; b)可认真的稳健方法,由稳健性验证组成,提供了在某些条件下的任何攻击和相应的强大培训方法中的稳健准确性的下限。在本文中,我们系统化了可认真的稳健方法和相关的实用和理论意义和调查结果。我们还提供了在不同数据集上现有的稳健验证和培训方法的第一个全面基准。特别是,我们1)为稳健性验证和培训方法提供分类,以及总结代表性算法的方法,2)揭示这些方法中的特征,优势,局限性和基本联系,3)讨论当前的研究进展情况TNN和4的可信稳健方法的理论障碍,主要挑战和未来方向提供了一个开放的统一平台,以评估超过20种代表可认真的稳健方法,用于各种DNN。
translated by 谷歌翻译
对抗性的鲁棒性已经成为深度学习的核心目标,无论是在理论和实践中。然而,成功的方法来改善对抗的鲁棒性(如逆势训练)在不受干扰的数据上大大伤害了泛化性能。这可能会对对抗性鲁棒性如何影响现实世界系统的影响(即,如果它可以提高未受干扰的数据的准确性),许多人可能选择放弃鲁棒性)。我们提出内插对抗培训,该培训最近雇用了在对抗培训框架内基于插值的基于插值的培训方法。在CiFar -10上,对抗性训练增加了标准测试错误(当没有对手时)从4.43%到12.32%,而我们的内插对抗培训我们保留了对抗性的鲁棒性,同时实现了仅6.45%的标准测试误差。通过我们的技术,强大模型标准误差的相对增加从178.1%降至仅为45.5%。此外,我们提供内插对抗性培训的数学分析,以确认其效率,并在鲁棒性和泛化方面展示其优势。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
深度神经网络已成为现代图像识别系统的驱动力。然而,神经网络对抗对抗性攻击的脆弱性对受这些系统影响的人构成严重威胁。在本文中,我们专注于一个真实的威胁模型,中间对手恶意拦截和erturbs网页用户上传在线。这种类型的攻击可以在简单的性能下降之上提高严重的道德问题。为了防止这种攻击,我们设计了一种新的双层优化算法,该算法在对抗对抗扰动的自然图像附近找到点。CiFar-10和Imagenet的实验表明我们的方法可以有效地强制在给定的修改预算范围内的自然图像。我们还显示所提出的方法可以在共同使用随机平滑时提高鲁棒性。
translated by 谷歌翻译
While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most = 0.1 can cause more than 35% test error.
translated by 谷歌翻译
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a misalignment between the (human-specified) notion of robustness and the inherent geometry of the data.
translated by 谷歌翻译
对抗性实例的有趣现象引起了机器学习中的显着关注,对社区可能更令人惊讶的是存在普遍对抗扰动(UAPS),即欺骗目标DNN的单一扰动。随着对深层分类器的关注,本调查总结了最近普遍对抗攻击的进展,讨论了攻击和防御方的挑战,以及uap存在的原因。我们的目标是将此工作扩展为动态调查,该调查将定期更新其内容,以遵循关于在广泛的域中的UAP或通用攻击的新作品,例如图像,音频,视频,文本等。将讨论相关更新:https://bit.ly/2sbqlgg。我们欢迎未来的作者在该领域的作品,联系我们,包括您的新发现。
translated by 谷歌翻译
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial perturbations are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
translated by 谷歌翻译
许多最先进的对抗性培训方法利用对抗性损失的上限来提供安全保障。然而,这些方法需要在每个训练步骤中计算,该步骤不能包含在梯度中的梯度以进行反向化。我们基于封闭形式的对抗性损失的封闭溶液引入了一种新的更具内容性的对抗性培训,可以有效地培养了背部衰退。通过稳健优化的最先进的工具促进了这一界限。我们使用我们的方法推出了两种新方法。第一种方法(近似稳健的上限或arub)使用网络的第一阶近似以及来自线性鲁棒优化的基本工具,以获得可以容易地实现的对抗丢失的近似偏置。第二种方法(鲁棒上限或摩擦)计算对抗性损失的精确上限。在各种表格和视觉数据集中,我们展示了我们更加原则的方法的有效性 - 摩擦比最先进的方法更强大,而是较大的扰动的最新方法,而谷会匹配的性能 - 小扰动的艺术方法。此外,摩擦和灌注速度比标准对抗性培训快(以牺牲内存增加)。重现结果的所有代码都可以在https://github.com/kimvc7/trobustness找到。
translated by 谷歌翻译
时间序列数据在许多现实世界中(例如,移动健康)和深神经网络(DNNS)中产生,在解决它们方面已取得了巨大的成功。尽管他们成功了,但对他们对对抗性攻击的稳健性知之甚少。在本文中,我们提出了一个通过统计特征(TSA-STAT)}称为时间序列攻击的新型对抗框架}。为了解决时间序列域的独特挑战,TSA-STAT对时间序列数据的统计特征采取限制来构建对抗性示例。优化的多项式转换用于创建比基于加性扰动的攻击(就成功欺骗DNN而言)更有效的攻击。我们还提供有关构建对抗性示例的统计功能规范的认证界限。我们对各种现实世界基准数据集的实验表明,TSA-STAT在欺骗DNN的时间序列域和改善其稳健性方面的有效性。 TSA-STAT算法的源代码可在https://github.com/tahabelkhouja/time-series-series-attacks-via-statity-features上获得
translated by 谷歌翻译
Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.
translated by 谷歌翻译