灵长类动物的视觉系统是强大感知的黄金标准。因此,人们普遍认为,模仿这些系统基础的神经表现形式将产生具有对手稳健的人工视觉系统。在这项工作中,我们开发了一种直接对灵长类动物大脑活动进行对抗性视觉攻击的方法。然后,我们利用这种方法来证明上述信念可能不是很好的基础。具体而言,我们报告说,组成灵长类动物视觉系统的生物神经元表现出对对抗性扰动的敏感性,这些扰动与现有(训练有素的)人工神经网络相当。
translated by 谷歌翻译
卷积神经网络(CNNS)容易受到对抗的攻击,将微型噪声添加到图像中的现象可以欺骗CNNS被错误分类。因为这种噪声对人类观察者几乎是不可察觉的,所以假设生物视觉对抗对抗性攻击是鲁棒性的。尽管具有这种明显的鲁棒性差异,但CNN是目前是生物视觉的最佳模型,揭示了脑部响应对抗性图像的响应方式的差距。实际上,对正常情况下的生物视觉尚未测量对逆势攻击的敏感性,也没有专门用于影响生物视觉的攻击方法。我们研究了对抗性攻击对灵长类动物视力的影响,测量猴神经元反应和人类行为。通过从一个类别(例如人面)来修改图像来创建对抗性图像,看起来像目标类别(例如猴子面),同时限制像素值改变。我们通过几种攻击方法测试了三次攻击方向,包括使用CNN对抗性图像并使用基于CNN的预测模型来指导猴子视觉神经元反应。我们认为广泛的图像变化大幅度,涉及攻击成功率高达> 90%。我们发现为CNN设计的对抗性图像在攻击灵长类动物视觉时无效。即使在考虑最佳的攻击方法时,灵长类动物的视觉也比CNN的集合攻击更强大,而不是CNN的集合,需要超过100倍的图像改变以成功攻击。单个攻击方法和图像的成功与猴子神经元和人类行为之间相关,但在分类和CNN分类之间不太相关。始终如一地,当在自然图像培训时,基于CNN的神经元模型并未概括地解释对对抗性图像的神经元反应。
translated by 谷歌翻译
神经科学家和机器学习研究人员通常引用对抗的例子,作为计算模型如何从生物感官系统发散的示例。最近的工作已经提出将生物启发组件添加到视觉神经网络中,作为提高其对抗性鲁棒性的一种方式。一种令人惊讶的有效组分,用于减少对抗性脆弱性是响应随机性,例如由生物神经元呈现的响应性随机性。在这里,使用最近开发的从计算神经科学的几何技术,我们研究了对抗性扰动如何影响标准,前列培训和生物学启发的随机网络的内部表示。我们为每种类型的网络找到了不同的几何签名,揭示了实现稳健表示的不同机制。接下来,我们将这些结果概括为听觉域,表明神经插值性也使听觉模型对对抗对抗扰动更鲁棒。随机网络的几何分析揭示了清洁和离前动脉扰动刺激的表示之间的重叠,并且定量表现出随机性的竞争几何效果在对抗和清洁性能之间调解权衡。我们的结果阐明了通过对外内培训和随机网络利用的强大感知的策略,并帮助解释了随机性如何有利于机器和生物计算。
translated by 谷歌翻译
深度神经网络在计算机视觉中的许多任务中设定了最先进的,但它们的概括对象扭曲的能力令人惊讶地是脆弱的。相比之下,哺乳动物视觉系统对广泛的扰动是强大的。最近的工作表明,这种泛化能力可以通过在整个视觉皮层中的视觉刺激的表示中编码的有用的电感偏差来解释。在这里,我们成功利用了多任务学习方法的这些归纳偏差:我们共同训练了深度网络以进行图像分类并预测猕猴初级视觉皮层(V1)中的神经活动。我们通过测试其对图像扭曲的鲁棒性来衡量我们网络的分发广泛性能力。我们发现,尽管在训练期间没有这些扭曲,但猴子V1数据的共同训练导致鲁棒性增加。此外,我们表明,我们的网络的鲁棒性非常接近Oracle网络的稳定性,其中架构的部分在嘈杂的图像上直接培训。我们的结果还表明,随着鲁布利的改善,网络的表示变得更加大脑。使用新颖的约束重建分析,我们调查了我们的大脑正规网络更加强大的原因。与我们仅对图像分类接受培训的基线网络相比,我们的共同训练网络对内容比噪声更敏感。使用深度预测的显着性图,用于想象成像图像,我们发现我们的猴子共同训练的网络对场景中的突出区域倾向更敏感,让人想起V1在对象边界的检测中的作用和自下而上的角色显着性。总体而言,我们的工作扩大了从大脑转移归纳偏见的有前途的研究途径,并为我们转移的影响提供了新的分析。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
随着卷积神经网络(CNN)在物体识别方面变得更加准确,它们的表示与灵长类动物的视觉系统越来越相似。这一发现激发了我们和其他研究人员询问该含义是否也以另一种方式运行:如果CNN表示更像大脑,网络会变得更加准确吗?以前解决这个问题的尝试显示出非常适中的准确性,部分原因是正则化方法的局限性。为了克服这些局限性,我们开发了一种新的CNN神经数据正常化程序,该数据正常化程序使用深层规范相关分析(DCCA)来优化CNN图像表示与猴子视觉皮层的相似之处。使用这种新的神经数据正常化程序,与先前的最新神经数据正则化器相比,我们看到分类准确性和少级精度的性能提高得多。这些网络对对抗性攻击也比未注册的攻击更强大。这些结果共同证实,神经数据正则化可以提高CNN的性能,并引入了一种获得更大性能提升的新方法。
translated by 谷歌翻译
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate adversarial risk as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as obscurity to an adversary, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
The authors thank Nicholas Carlini (UC Berkeley) and Dimitris Tsipras (MIT) for feedback to improve the survey quality. We also acknowledge X. Huang (Uni. Liverpool), K. R. Reddy (IISC), E. Valle (UNICAMP), Y. Yoo (CLAIR) and others for providing pointers to make the survey more comprehensive.
translated by 谷歌翻译
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples-inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. 1
translated by 谷歌翻译
在本讨论文件中,我们调查了有关机器学习模型鲁棒性的最新研究。随着学习算法在数据驱动的控制系统中越来越流行,必须确保它们对数据不确定性的稳健性,以维持可靠的安全至关重要的操作。我们首先回顾了这种鲁棒性的共同形式主义,然后继续讨论训练健壮的机器学习模型的流行和最新技术,以及可证明这种鲁棒性的方法。从强大的机器学习的这种统一中,我们识别并讨论了该地区未来研究的迫切方向。
translated by 谷歌翻译
尽管机器学习系统的效率和可扩展性,但最近的研究表明,许多分类方法,尤其是深神经网络(DNN),易受对抗的例子;即,仔细制作欺骗训练有素的分类模型的例子,同时无法区分从自然数据到人类。这使得在安全关键区域中应用DNN或相关方法可能不安全。由于这个问题是由Biggio等人确定的。 (2013)和Szegedy等人。(2014年),在这一领域已经完成了很多工作,包括开发攻击方法,以产生对抗的例子和防御技术的构建防范这些例子。本文旨在向统计界介绍这一主题及其最新发展,主要关注对抗性示例的产生和保护。在数值实验中使用的计算代码(在Python和R)公开可用于读者探讨调查的方法。本文希望提交人们将鼓励更多统计学人员在这种重要的令人兴奋的领域的产生和捍卫对抗的例子。
translated by 谷歌翻译
对卷积神经网络(CNN)的对抗性攻击的存在质疑这种模型对严重应用的适合度。攻击操纵输入图像,使得错误分类是在对人类观察者看上去正常的同时唤起的 - 因此它们不容易被检测到。在不同的上下文中,CNN隐藏层的反向传播激活(对给定输入的“特征响应”)有助于可视化人类“调试器” CNN“在计算其输出时对CNN”的看法。在这项工作中,我们提出了一种新颖的检测方法,以防止攻击。我们通过在特征响应中跟踪对抗扰动来做到这一点,从而可以使用平均局部空间熵自动检测。该方法不会改变原始的网络体系结构,并且完全可以解释。实验证实了我们对在Imagenet训练的大规模模型的最新攻击方法的有效性。
translated by 谷歌翻译
简短答案:是的,长答案:不!实际上,对对抗性鲁棒性的研究导致了宝贵的见解,帮助我们理解和探索问题的不同方面。在过去的几年中,已经提出了许多攻击和防御。然而,这个问题在很大程度上尚未解决和理解不足。在这里,我认为该问题的当前表述实现了短期目标,需要修改以实现更大的收益。具体而言,扰动的界限创造了一个人为的设置,需要放松。这使我们误导了我们专注于不够表达的模型类。取而代之的是,受到人类视野的启发以及我们更多地依赖于形状,顶点和前景对象的功能,而不是纹理等非稳定功能,应努力寻求显着不同的模型类别。也许我们应该攻击一个更普遍的问题,而不是缩小不可察觉的对抗性扰动,该问题是找到与可感知的扰动,几何变换(例如旋转,缩放),图像失真(照明,模糊)等同时稳健的体系结构,等等。阻塞,阴影)。只有这样,我们才能解决对抗脆弱性的问题。
translated by 谷歌翻译
Spiking neural networks (SNNs) attract great attention due to their low power consumption, low latency, and biological plausibility. As they are widely deployed in neuromorphic devices for low-power brain-inspired computing, security issues become increasingly important. However, compared to deep neural networks (DNNs), SNNs currently lack specifically designed defense methods against adversarial attacks. Inspired by neural membrane potential oscillation, we propose a novel neural model that incorporates the bio-inspired oscillation mechanism to enhance the security of SNNs. Our experiments show that SNNs with neural oscillation neurons have better resistance to adversarial attacks than ordinary SNNs with LIF neurons on kinds of architectures and datasets. Furthermore, we propose a defense method that changes model's gradients by replacing the form of oscillation, which hides the original training gradients and confuses the attacker into using gradients of 'fake' neurons to generate invalid adversarial samples. Our experiments suggest that the proposed defense method can effectively resist both single-step and iterative attacks with comparable defense effectiveness and much less computational costs than adversarial training methods on DNNs. To the best of our knowledge, this is the first work that establishes adversarial defense through masking surrogate gradients on SNNs.
translated by 谷歌翻译
深度神经网络已成为现代图像识别系统的驱动力。然而,神经网络对抗对抗性攻击的脆弱性对受这些系统影响的人构成严重威胁。在本文中,我们专注于一个真实的威胁模型,中间对手恶意拦截和erturbs网页用户上传在线。这种类型的攻击可以在简单的性能下降之上提高严重的道德问题。为了防止这种攻击,我们设计了一种新的双层优化算法,该算法在对抗对抗扰动的自然图像附近找到点。CiFar-10和Imagenet的实验表明我们的方法可以有效地强制在给定的修改预算范围内的自然图像。我们还显示所提出的方法可以在共同使用随机平滑时提高鲁棒性。
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
尽管取得了巨大的成功,但深入的学习严重遭受鲁棒性;也就是说,深度神经网络非常容易受到对抗的攻击,即使是最简单的攻击。灵感来自脑科学最近的进步,我们提出了一种新的内部模型(DIM),这是一种基于新的生成自动化器的模型来解决这一挑战。模拟人类大脑中的管道进行视觉信号处理,暗淡采用两级方法。在第一阶段,DIM使用丹组器来减少输入的噪声和尺寸,反映了塔马拉姆的信息预处理。从主视觉皮质中的内存相关迹线的稀疏编码启发,第二阶段产生一组内部模型,一个用于每个类别。我们评估了42次对抗攻击的衰弱,表明Dim有效地防御所有攻击,并且优于整体鲁棒性的SOTA。
translated by 谷歌翻译
对抗性的鲁棒性已经成为深度学习的核心目标,无论是在理论和实践中。然而,成功的方法来改善对抗的鲁棒性(如逆势训练)在不受干扰的数据上大大伤害了泛化性能。这可能会对对抗性鲁棒性如何影响现实世界系统的影响(即,如果它可以提高未受干扰的数据的准确性),许多人可能选择放弃鲁棒性)。我们提出内插对抗培训,该培训最近雇用了在对抗培训框架内基于插值的基于插值的培训方法。在CiFar -10上,对抗性训练增加了标准测试错误(当没有对手时)从4.43%到12.32%,而我们的内插对抗培训我们保留了对抗性的鲁棒性,同时实现了仅6.45%的标准测试误差。通过我们的技术,强大模型标准误差的相对增加从178.1%降至仅为45.5%。此外,我们提供内插对抗性培训的数学分析,以确认其效率,并在鲁棒性和泛化方面展示其优势。
translated by 谷歌翻译
深度学习(DL)在许多与人类相关的任务中表现出巨大的成功,这导致其在许多计算机视觉的基础应用中采用,例如安全监控系统,自治车辆和医疗保健。一旦他们拥有能力克服安全关键挑战,这种安全关键型应用程序必须绘制他们的成功部署之路。在这些挑战中,防止或/和检测对抗性实例(AES)。对手可以仔细制作小型,通常是难以察觉的,称为扰动的噪声被添加到清洁图像中以产生AE。 AE的目的是愚弄DL模型,使其成为DL应用的潜在风险。在文献中提出了许多测试时间逃避攻击和对策,即防御或检测方法。此外,还发布了很少的评论和调查,理论上展示了威胁的分类和对策方法,几乎​​没有焦点检测方法。在本文中,我们专注于图像分类任务,并试图为神经网络分类器进行测试时间逃避攻击检测方法的调查。对此类方法的详细讨论提供了在四个数据集的不同场景下的八个最先进的探测器的实验结果。我们还为这一研究方向提供了潜在的挑战和未来的观点。
translated by 谷歌翻译