许多机器学习问题在表格域中使用数据。对抗性示例可能对这些应用尤其有害。然而,现有关于对抗鲁棒性的作品主要集中在图像和文本域中的机器学习模型。我们认为,由于表格数据和图像或文本之间的差异,现有的威胁模型不适合表格域。这些模型没有捕获该成本比不可识别更重要,也不能使对手可以将不同的价值归因于通过部署不同的对手示例获得的效用。我们表明,由于这些差异,用于图像的攻击和防御方法和文本无法直接应用于表格设置。我们通过提出新的成本和公用事业感知的威胁模型来解决这些问题,该模型量身定制了针对表格域的攻击者的攻击者的约束。我们介绍了一个框架,使我们能够设计攻击和防御机制,从而导致模型免受成本或公用事业意识的对手的影响,例如,受到一定美元预算约束的对手。我们表明,我们的方法在与对应于对抗性示例具有经济和社会影响的应用相对应的三个表格数据集中有效。
translated by 谷歌翻译
随着深度神经网络(DNNS)的进步在许多关键应用中表现出前所未有的性能水平,它们的攻击脆弱性仍然是一个悬而未决的问题。我们考虑在测试时间进行逃避攻击,以防止在受约束的环境中进行深入学习,其中需要满足特征之间的依赖性。这些情况可能自然出现在表格数据中,也可能是特定应用程序域中功能工程的结果,例如网络安全中的威胁检测。我们提出了一个普通的基于迭代梯度的框架,称为围栏,用于制定逃避攻击,考虑到约束域和应用要求的细节。我们将其应用于针对两个网络安全应用培训的前馈神经网络:网络流量僵尸网络分类和恶意域分类,以生成可行的对抗性示例。我们广泛评估了攻击的成功率和绩效,比较它们对几个基线的改进,并分析影响攻击成功率的因素,包括优化目标和数据失衡。我们表明,通过最少的努力(例如,生成12个其他网络连接),攻击者可以将模型的预测从恶意类更改为良性并逃避分类器。我们表明,在具有更高失衡的数据集上训练的模型更容易受到我们的围栏攻击。最后,我们证明了在受限领域进行对抗训练的潜力,以提高针对这些逃避攻击的模型弹性。
translated by 谷歌翻译
This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate adversarial risk as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as obscurity to an adversary, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
translated by 谷歌翻译
机器学习算法已被证明通过系统修改(例如,图像识别)中的输入(例如,对抗性示例)的系统修改(例如,对抗性示例)容易受到对抗操作的影响。在默认威胁模型下,对手利用了图像的无约束性质。每个功能(像素)完全由对手控制。但是,尚不清楚这些攻击如何转化为限制对手可以修改的特征以及如何修改特征的约束域(例如,网络入侵检测)。在本文中,我们探讨了受约束的域是否比不受约束的域对对抗性示例生成算法不那么脆弱。我们创建了一种用于生成对抗草图的算法:针对性的通用扰动向量,该向量在域约束的信封内编码特征显着性。为了评估这些算法的性能,我们在受约束(例如网络入侵检测)和不受约束(例如图像识别)域中评估它们。结果表明,我们的方法在约束域中产生错误分类率,这些域与不受约束的域(大于95%)相当。我们的调查表明,受约束域暴露的狭窄攻击表面仍然足够大,可以制作成功的对抗性例子。因此,约束似乎并不能使域变得健壮。实际上,只有五个随机选择的功能,仍然可以生成对抗性示例。
translated by 谷歌翻译
Strengthening the robustness of machine learning-based Android malware detectors in the real world requires incorporating realizable adversarial examples (RealAEs), i.e., AEs that satisfy the domain constraints of Android malware. However, existing work focuses on generating RealAEs in the problem space, which is known to be time-consuming and impractical for adversarial training. In this paper, we propose to generate RealAEs in the feature space, leading to a simpler and more efficient solution. Our approach is driven by a novel interpretation of Android malware properties in the feature space. More concretely, we extract feature-space domain constraints by learning meaningful feature dependencies from data and applying them by constructing a robust feature space. Our experiments on DREBIN, a well-known Android malware detector, demonstrate that our approach outperforms the state-of-the-art defense, Sec-SVM, against realistic gradient- and query-based attacks. Additionally, we demonstrate that generating feature-space RealAEs is faster than generating problem-space RealAEs, indicating its high applicability in adversarial training. We further validate the ability of our learned feature-space domain constraints in representing the Android malware properties by showing that (i) re-training detectors with our feature-space RealAEs largely improves model performance on similar problem-space RealAEs and (ii) using our feature-space domain constraints can help distinguish RealAEs from unrealizable AEs (unRealAEs).
translated by 谷歌翻译
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.
translated by 谷歌翻译
可行对抗示例的产生对于适当评估适用于受约束特征空间的模型是必要的。但是,它仍然是一个具有挑战性的任务,以强制执行用于计算机愿景的攻击。我们提出了一个统一的框架,以产生满足给定域约束的可行的对抗性示例。我们的框架支持文献中报告的使用情况,可以处理线性和非线性约束。我们将框架实例化为两种算法:基于梯度的攻击,引入损耗函数中的约束,以最大化,以及旨在错误分类,扰动最小化和约束满足的多目标搜索算法。我们展示我们的方法在不同域的两个数据集上有效,成功率高达100%,其中最先进的攻击无法生成单个可行的示例。除了对抗性再培训之外,我们还提出引入工程化的非凸起约束,以改善模型对抗性鲁棒性。我们证明这一新防御与对抗性再次一样有效。我们的框架构成了对受约束的对抗性攻击研究的起点,并提供了未来的研究可以利用的相关基线和数据集。
translated by 谷歌翻译
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS-and which illustrate a diverse set of defense strategies-can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result-showing that a defense was ineffective-this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. Some of our attack strategies are generalizable, but no single strategy would have been sufficient for all defenses. This underlines our key message that adaptive attacks cannot be automated and always require careful and appropriate tuning to a given defense. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
translated by 谷歌翻译
机器学习容易受到对抗的示例 - 输入,旨在使模型表现不佳。但是,如果对逆势示例代表建模域中的现实输入,则尚不清楚。不同的域,如网络和网络钓鱼具有域制约束 - 在对手必须满足攻击方面必须满足要实现的攻击(除了任何对手特定的目标)之间的特征之间的复杂关系。在本文中,我们探讨了域限制如何限制对抗性能力以及对手如何适应创建现实(符合限制)示例的策略。在此,我们开发从数据学习域约束的技术,并展示如何将学习的约束集成到对抗性制作过程中。我们评估我们在网络入侵和网络钓鱼数据集中的方法的功效,并发现:(1)最多82%的对抗实例由最先进的制作算法产生的违规结构域约束,(2)域约束对对抗性鲁棒例子;强制约束产生模型精度的增加高达34%。我们不仅观察到对手必须改变投入以满足领域约束,但这些约束使得产生有效的对抗例子的产生远远挑战。
translated by 谷歌翻译
对抗性学习的研究主要集中在均匀的非结构化数据集上,这些数据集通常自然地映射到问题空间中。将功能空间攻击在异质数据集中倒入问题空间更具挑战性,尤其是找到要执行的扰动的任务。这项工作提出了一种正式的搜索策略:“特征重要的指导攻击”(FIGA),它在异质表格数据集的特征空间中发现扰动以产生逃避攻击。我们首先在特征空间中以及在问题空间中演示FIGA。 FIGA不对捍卫模型的学习算法没有任何先验知识,也不需要任何梯度信息。 FIGA假定对特征表示形式的知识和辩护模型数据集的平均特征值。通过在目标类方向上扰动输入的最重要特征,FIGA利用具有重要的排名。虽然FIGA在概念上与使用特征选择过程(例如模仿攻击)的其他作品相似,但我们将具有三个可调参数的攻击算法形式化,并在表格数据集上研究FIGA的强度。我们通过在四个不同的表网络钓鱼数据集中训练的网络钓鱼检测模型和一个平均成功率为94%的金融数据集来证明FIGA的有效性。我们通过限制可能在网络钓鱼域中有效且可行的扰动,将FIGA扩展到网络钓鱼问题空间。我们生成有效的对抗网站,这些网站在视觉上与其不受干扰的对应物相同,并使用它们来攻击六个表格的ML模型,达到13.05%的平均成功率。
translated by 谷歌翻译
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
translated by 谷歌翻译
Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to normbounded attacks. However these defenses either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.
translated by 谷歌翻译
在本讨论文件中,我们调查了有关机器学习模型鲁棒性的最新研究。随着学习算法在数据驱动的控制系统中越来越流行,必须确保它们对数据不确定性的稳健性,以维持可靠的安全至关重要的操作。我们首先回顾了这种鲁棒性的共同形式主义,然后继续讨论训练健壮的机器学习模型的流行和最新技术,以及可证明这种鲁棒性的方法。从强大的机器学习的这种统一中,我们识别并讨论了该地区未来研究的迫切方向。
translated by 谷歌翻译
Designing powerful adversarial attacks is of paramount importance for the evaluation of $\ell_p$-bounded adversarial defenses. Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries. The search space of PGD is dictated by the steepest ascent directions of an objective. Despite the plethora of objective function choices, there is no universally superior option and robustness overestimation may arise from ill-suited objective selection. Driven by this observation, we postulate that the combination of different objectives through a simple loss alternating scheme renders PGD more robust towards design choices. We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $\ell_{\infty}$-robust models and 3 datasets. The performance improvement is consistent, when compared to the single loss counterparts. In the CIFAR-10 dataset, our strongest adversarial attack outperforms all of the white-box components of AutoAttack (AA) ensemble, as well as the most powerful attacks existing on the literature, achieving state-of-the-art results in the computational budget of our study ($T=100$, no restarts).
translated by 谷歌翻译
从外界培训的机器学习模型可能会被数据中毒攻击损坏,将恶意指向到模型的培训集中。对这些攻击的常见防御是数据消毒:在培训模型之前首先过滤出异常培训点。在本文中,我们开发了三次攻击,可以绕过广泛的常见数据消毒防御,包括基于最近邻居,训练损失和奇异值分解的异常探测器。通过增加3%的中毒数据,我们的攻击成功地将Enron垃圾邮件检测数据集的测试错误从3%增加到24%,并且IMDB情绪分类数据集从12%到29%。相比之下,没有明确占据这些数据消毒防御的现有攻击被他们击败。我们的攻击基于两个想法:(i)我们协调我们的攻击将中毒点彼此放置在彼此附近,(ii)我们将每个攻击制定为受限制的优化问题,限制旨在确保中毒点逃避检测。随着这种优化涉及解决昂贵的Bilevel问题,我们的三个攻击对应于基于影响功能的近似近似这个问题的方式; minimax二元性;和karush-kuhn-tucker(kkt)条件。我们的结果强调了对数据中毒攻击产生更强大的防御的必要性。
translated by 谷歌翻译
We propose the Square Attack, a score-based black-box l2and l∞-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized squareshaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least 1.8 and up to 3 compared to the recent state-ofthe-art l∞-attack of Al-Dujaili & OReilly (2020). Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.
translated by 谷歌翻译
许多最先进的ML模型在各种任务中具有优于图像分类的人类。具有如此出色的性能,ML模型今天被广泛使用。然而,存在对抗性攻击和数据中毒攻击的真正符合ML模型的稳健性。例如,Engstrom等人。证明了最先进的图像分类器可以容易地被任意图像上的小旋转欺骗。由于ML系统越来越纳入安全性和安全敏感的应用,对抗攻击和数据中毒攻击构成了相当大的威胁。本章侧重于ML安全的两个广泛和重要的领域:对抗攻击和数据中毒攻击。
translated by 谷歌翻译
Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10 30 . We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.
translated by 谷歌翻译
最近的自然语言处理(NLP)技术在基准数据集中实现了高性能,主要原因是由于深度学习性能的显着改善。研究界的进步导致了最先进的NLP任务的生产系统的巨大增强,例如虚拟助理,语音识别和情感分析。然而,随着对抗性攻击测试时,这种NLP系统仍然仍然失败。初始缺乏稳健性暴露于当前模型的语言理解能力中的令人不安的差距,当NLP系统部署在现实生活中时,会产生问题。在本文中,我们通过以各种维度的系统方式概述文献来展示了NLP稳健性研究的结构化概述。然后,我们深入了解稳健性的各种维度,跨技术,指标,嵌入和基准。最后,我们认为,鲁棒性应该是多维的,提供对当前研究的见解,确定文学中的差距,以建议值得追求这些差距的方向。
translated by 谷歌翻译
时间序列数据在许多现实世界中(例如,移动健康)和深神经网络(DNNS)中产生,在解决它们方面已取得了巨大的成功。尽管他们成功了,但对他们对对抗性攻击的稳健性知之甚少。在本文中,我们提出了一个通过统计特征(TSA-STAT)}称为时间序列攻击的新型对抗框架}。为了解决时间序列域的独特挑战,TSA-STAT对时间序列数据的统计特征采取限制来构建对抗性示例。优化的多项式转换用于创建比基于加性扰动的攻击(就成功欺骗DNN而言)更有效的攻击。我们还提供有关构建对抗性示例的统计功能规范的认证界限。我们对各种现实世界基准数据集的实验表明,TSA-STAT在欺骗DNN的时间序列域和改善其稳健性方面的有效性。 TSA-STAT算法的源代码可在https://github.com/tahabelkhouja/time-series-series-attacks-via-statity-features上获得
translated by 谷歌翻译